I try to extract the content of the right side on this page:

When we take a look on the html, the information is stored in this table:
With my code snippet, I can´t reach the text I want to.

def getDescriptionDNB():
    description = f''
    response = requests.get(description)
    soupedDescription = BeautifulSoup(response.content, "html.parser")
    text = soupedDescription.find(class_="amount").text
    if text == "Treffer 1 von 1":
        autor = soupedDescription.find_all("tr")
        for i in autor:
            test = i.findNext("td").text

The problem is, I don´t know how to get down to the inner <td> tag to get the information I want to.

Do you know, how I can solve this Problem?



  1. Main issue is – HTML of page is broken, there are some tr without td and without closing tag.

    Try to select your elements more specific or try to store info in dict and pick by key.

    Create a dict with css selectors:

        for row in'tr:has(td:not([colspan]))')

    Create a dict with pandas.read_html():

    import pandas as pd
    url = f''

    Based on url of your snippet.

    {'Link zu diesem Datensatz': '',
     'Titel': 'Learning English - Password red:Teil: Reformierte Rechtschreibung / 3. / [Hauptw.].',
     'Ausgabe': '1. Aufl., 1. Dr.',
     'Verlag': 'Stuttgart ; Düsseldorf ; Leipzig : Klett',
     'Zeitliche Einordnung': 'Erscheinungsdatum: 1997',
     'Umfang/Format': '172 S. ; 25 cm',
     'ISBN/Einband/Preis': '978-3-12-546630-2 Pp. : DM 29.60:3-12-546630-X Pp. : DM 29.60:3-12-54663-0 (falsch) Pp. : DM 29.60',
     'Sprache(n)': 'Englisch (eng), Deutsch (ger)',
     'Frankfurt': 'Signatur: 1997 A 10551:Bereitstellung  in Frankfurt',
     'Leipzig': 'Signatur: 1997 A 10551:Bereitstellung  in Leipzig'}
  2. You need to break apart the key/value pairs as pointed out. Sticking with BeautifulSoup (your tool of choice) –

            teilen = i.find_all('td')
            if len(teilen)==2:
                  print(teilen[0].text.strip(), ' : ', teilen[1].text.strip())

    There are some other things. Improve on this yourself. Instead if selecting all the ‘tr’s in the document select the table, and then select the table:

    table id="fullRecordTable"

    and then move on to selecting the rows (‘tr’) in there.

