Html - How to write out the whole text from elements inside a element with python chrome webdriver?

EverythingerTruten
August 7, 2023
382 views
1 vote
2 Answers

I’m using the python chrome webdriver to extract all the text from the description section of a geocaching website (Here’s a sample website if anyone wants to take a look). The text is stored in different elements inside one element. I cannot figure out how to take all the elements and save them as one string separated with spaces.

I tried using both of the solutions underneath, the first one only outputted the text from the first element and the second one sometimes outputted the first one, sometimes more (but not all). I couldn’t figure out why the second one is inconsistent with the number of elements.

desc_span = driver.find_element(By.XPATH, '/html/body/form[1]/main/div/div/div[2]/div[9]/span')
        p_elements = desc_span.find_elements(By.TAG_NAME, 'p')
        desc = ' '.join(p_element.text for p_element in p_elements)
        print(desc)

desc_div = driver.find_element(By.XPATH, '/html/body/form[1]/main/div/div/div[2]/div[9]')
        all_elements = desc_div.find_elements(By.XPATH, '*') 
        desc = ' '.join(element.text for element in all_elements)
        print(desc)

Answers

- Yaroslavm
- August 7, 2023 at 7:44 pm
- 0 votes
0
I think, you should wait for visibility of all elements located by selector, and, probably, change selector.

Try code below:
```
wait = WebDriverWait(driver, 10)
p_elements = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '.UserSuppliedContent p')))
desc = ' '.join(p_element.text for p_element in p_elements)
print(desc)
```
Using your link (I am not logged in), output is It's a "W" Thang. This is my first cache that I have submitted. Placed with permission. Magnetic that corresponding all p tags in description section.
So, you’re on right way, just need to wait until all elements are rendered.
Login or Signup to reply.

- undetectedSelenium
- August 7, 2023 at 9:49 pm
- 0 votes
0
The desired texts are within  tags which have an ancestor <div class="UserSuppliedContent">

Solution

To extract all the text from the description section of the geocaching website and put into a list you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategy:
```
driver.get(url='https://www.geocaching.com/geocache/GC4ZJ9R')
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.UserSuppliedContent p")))])
```
Console Output:
```
['It's a "W" Thang. This is my first cache that I have submitted. Placed with permission.', 'Magnetic']
```
Further, if you want to take all the  elements and save them as one string separated with spaces you need to use join() and you can use the following solution:
```
driver.get(url='https://www.geocaching.com/geocache/GC4ZJ9R')
print("".join(my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.UserSuppliedContent p")))))
```
Console Output:
```
It's a "W" Thang. This is my first cache that I have submitted. Placed with permission.Magnetic
```
Note : You have to add the following imports :
```
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
```
Login or Signup to reply.