skip to Main Content

I am trying to read all elements in the following html table and convert it to a dataframe but all the numerical values are not being recorded by my get_attribute function. I have also tried with .get_attribute('td') ,.get_attribute('tr') and .get_attribute('outerHTML') but still get the result below.
I have tried using the following code

bond_totals_table = driver.find_element(By.XPATH,'/html/body/form[2]/table/tbody/tr/td/table/body').get_attribute('td')
bond_totals_table = pd.read_html(bond_totals_table, flavor = 'bs4')
0   Increment Number    Action  Current Acres   Add Delete  Acres for Calculation   Adjusted Amount Status  Bond?
1   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
2   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
3   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
4   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
5   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
6   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
7   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No

It seems the table used to be adjustable but not anymore and the get attributefunction is somehow not getting at the displayed values in the grey cells.

enter image description here

2

Answers


  1. To read all values of HTML table you need to target the <table> element inducing WebDriverWait for the visibility_of_element_located() and extract the outerHTML as follows:

    import pandas as pd
    
    bond_totals_table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//html/body/form[2]/table/tbody/tr/td/table"))).get_attribute('outerHTML')
    bond_totals_table = pd.read_html(bond_totals_table_data)
    print(bond_totals_table)
    

    References

    You can find a couple of relevant detailed discussions in:

    Login or Signup to reply.
  2. You can use Beautiful Soup w/ Panda. Here’s an example of reading from a CDC table:

    with webdriver.Firefox() as browser:
        browser.get("https://www.cdc.gov/nchs/nhis/shs/tables.htm")
        html = browser.page_source
        soup = BeautifulSoup(html, "html.parser")
        tbl = soup.select_one("#example")
        df = pd.read_html(str(tbl))
        print(df[0])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search