Read all values of HTML table with Selenium

ArthurLanglois
February 27, 2023
283 views
1 vote
2 Answers

I am trying to read all elements in the following html table and convert it to a dataframe but all the numerical values are not being recorded by my get_attribute function. I have also tried with .get_attribute('td') ,.get_attribute('tr') and .get_attribute('outerHTML') but still get the result below.
I have tried using the following code

bond_totals_table = driver.find_element(By.XPATH,'/html/body/form[2]/table/tbody/tr/td/table/body').get_attribute('td')
bond_totals_table = pd.read_html(bond_totals_table, flavor = 'bs4')

0   Increment Number    Action  Current Acres   Add Delete  Acres for Calculation   Adjusted Amount Status  Bond?
1   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
2   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
3   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
4   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
5   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
6   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
7   NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No

It seems the table used to be adjustable but not anymore and the get attributefunction is somehow not getting at the displayed values in the grey cells.

Answers

- undetectedSelenium
- February 27, 2023 at 9:30 pm
- 0 votes
0
To read all values of HTML table you need to target the <table> element inducing WebDriverWait for the visibility_of_element_located() and extract the outerHTML as follows:
```
import pandas as pd

bond_totals_table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//html/body/form[2]/table/tbody/tr/td/table"))).get_attribute('outerHTML')
bond_totals_table = pd.read_html(bond_totals_table_data)
print(bond_totals_table)
```
References

You can find a couple of relevant detailed discussions in:
- Can’t find table selenium python
- BeautifulSoup and Pandas read_html is not pulling all of the rows in a table
Login or Signup to reply.

- jwill
- February 27, 2023 at 10:37 pm
- 0 votes
0
You can use Beautiful Soup w/ Panda. Here’s an example of reading from a CDC table:
```
with webdriver.Firefox() as browser:
    browser.get("https://www.cdc.gov/nchs/nhis/shs/tables.htm")
    html = browser.page_source
    soup = BeautifulSoup(html, "html.parser")
    tbl = soup.select_one("#example")
    df = pd.read_html(str(tbl))
    print(df[0])
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Read all values of HTML table with Selenium

Answers

References