I’m using the python chrome webdriver to extract all the text from the description section of a geocaching website (Here’s a sample website if anyone wants to take a look). The text is stored in different <p> elements inside one <span> element. I cannot figure out how to take all the <p> elements and save them as one string separated with spaces.
I tried using both of the solutions underneath, the first one only outputted the text from the first <p> element and the second one sometimes outputted the first one, sometimes more (but not all). I couldn’t figure out why the second one is inconsistent with the number of elements.
desc_span = driver.find_element(By.XPATH, '/html/body/form[1]/main/div/div/div[2]/div[9]/span')
p_elements = desc_span.find_elements(By.TAG_NAME, 'p')
desc = ' '.join(p_element.text for p_element in p_elements)
print(desc)
desc_div = driver.find_element(By.XPATH, '/html/body/form[1]/main/div/div/div[2]/div[9]')
all_elements = desc_div.find_elements(By.XPATH, '*')
desc = ' '.join(element.text for element in all_elements)
print(desc)
2
Answers
I think, you should wait for visibility of all elements located by selector, and, probably, change selector.
Try code below:
Using your link (I am not logged in), output is
It's a "W" Thang. This is my first cache that I have submitted. Placed with permission. Magnetic
that corresponding allp
tags in description section.So, you’re on right way, just need to wait until all elements are rendered.
The desired texts are within
<p>
tags which have an ancestor<div class="UserSuppliedContent">
Solution
To extract all the text from the description section of the geocaching website and put into a list you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategy:
Console Output:
Further, if you want to take all the
<p>
elements and save them as one string separated with spaces you need to usejoin()
and you can use the following solution:Console Output:
Note : You have to add the following imports :