Parse a dynamic HTML Page in Python

Alez
June 13, 2023
289 views
0 votes
2 Answers

I would like to scrape an HTML page where content is not static but loaded with javascript.

I downgrade Selenium to version 3.3.0 in order to be able to support PhantomJS (v4.9.x does not support PhantomJS anymore) and wrote this code:

from selenium import webdriver
driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
p_element = driver.find_element_by_id(id_='my-id')
print(p_element)

The error I’m getting is:

selenium.common.exceptions.NoSuchElementException: Message:
"errorMessage":"Unable to find element with id ‘my-id’"

The element I want to return is tag <section> with a certain id and all its subtags. The HTML content is like that:

<section id="my-id" class="my-class">...</section>

Thank you

Answers

- AymenKrifa
- June 13, 2023 at 6:53 pm
- 0 votes
0
This could be due to various reasons, such as the element not being present at the time the code is executed or the element having a different ID, but in case you double-checked the ID presence. I think you have to make sure that the page has finished loading before attempting to find the element. In certain cases, JS-based content may take a bit longer to load. You can add delays or an explicit wait to ensure that the element is available before accessing it
```
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
delay = 10  # Wait up to 10 seconds for the element to be present

try:
    wait = WebDriverWait(driver, delay)
    p_element = wait.until(EC.presence_of_element_located((By.ID, 'my-id')))
    print(p_element.text)
except TimeoutException:
    print("Timeout!")
```
Hope this helps!
Login or Signup to reply.

- undetectedSelenium
- June 13, 2023 at 10:04 pm
- 0 votes
0
This error message…
```
selenium.common.exceptions.NoSuchElementException: Message: "errorMessage":"Unable to find element with id 'my-id'
```
…implies that the element wasn’t found within the HTML DOM.

The possible reason is that the desired WebElement didn’t render within the Viewport as phantomjs by default initializes with a minimized viewport.

Solution

You need to initialize PhantomJS with the maximized viewport inducing WebDriverWait for the visibility_of_element_located() while locating it as follows:
```
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
driver.maximize_window()
p_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "my-id")))
print(p_element)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Parse a dynamic HTML Page in Python

Answers

Solution