I am trying to read all elements in the following html table and convert it to a dataframe but all the numerical values are not being recorded by my get_attribute
function. I have also tried with .get_attribute('td')
,.get_attribute('tr')
and .get_attribute('outerHTML')
but still get the result below.
I have tried using the following code
bond_totals_table = driver.find_element(By.XPATH,'/html/body/form[2]/table/tbody/tr/td/table/body').get_attribute('td')
bond_totals_table = pd.read_html(bond_totals_table, flavor = 'bs4')
0 Increment Number Action Current Acres Add Delete Acres for Calculation Adjusted Amount Status Bond?
1 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
2 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
3 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
4 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
5 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
6 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
7 NaN Existing Modify New Closed Reactivate Reconcile NaN NaN NaN NaN NaN ACT INA PH1 PH2 PH3 TRM Yes No
It seems the table used to be adjustable but not anymore and the get attribute
function is somehow not getting at the displayed values in the grey cells.
2
Answers
To read all values of HTML table you need to target the
<table>
element inducing WebDriverWait for the visibility_of_element_located() and extract theouterHTML
as follows:References
You can find a couple of relevant detailed discussions in:
You can use Beautiful Soup w/ Panda. Here’s an example of reading from a CDC table: