Javascript - Can't scrape links from the dynamic website using Selenium

faintglimmer
June 4, 2024
126 views
0 votes
2 Answers

I’m trying to collect links to personal profiles and contacts from the following website:
https://www.dlapiper.com/en-us/people#t=All&sort=relevancy&numberOfResults=100&f:CountriesID=[United%20Kingdom]

I’m using Selenium to do scraping via chromedriver and normally it works just fine – however, for this particular website I can’t get to the source html where all the links to people’s profiles would be visible.

I wrote a standard script which would normally work for any other dynamic website.

links = []
driver = webdriver.Chrome()
driver.get('https://www.dlapiper.com/en-gb/people#t=All&sort=%40lastname%20ascending&f:CountriesID=[United%20Kingdom]')
time.sleep(5)
cookies_button = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookies_button.click()
time.sleep(5)
html = driver.page_source
time.sleep(5)
soup = BeautifulSoup(html, 'html.parser')
parse = soup.find_all('a')
for item in parse:
    links.append(item.get('href'))
print(links)

However, links from the people search block can’t get into the driver.page_source – even though I can find all the link elements when I press "inspect" in Chrome. I have tried increasing the time.sleep(), did not help.

I understand that there are lots of javascripts being executed on this page – maybe I need to activate some of them manually? Help would be much appreciated as I don’t know Javascript.

Answers

The lawyer’s contact details are in an iframe…

1   Frame ID    myIframe
2   Frame Name  Unused
3   Frame Title People Index Hosted Search Page
4   Frame Source    https://www.dlapiper.com/en-US/coveosearchpages/people%20index%20hosted%20search%20page#t=All&sort=relevancy&f:CountriesID=[United%20States]
5   Frame Domain    www.dlapiper.com
6   Type    text/html
7   Mode    CSS1Compat
8   Language    en
9   Encoding    UTF-8
10  Modified    06/03/2024 19:25:06
11  Load Time   2.52 seconds
12  Source Size 361 bytes
13  Position    0 - 291 pixels
14  Viewport    1903 x 1500 pixels

Is your script scanning inside child frames?

- PassThru
- June 4, 2024 at 11:53 am
- 0 votes
0
Can you elaborate a bit on the extract you have just pasted? This
looks new to me

That table comes from the Classic browser which exposes iframes and offers lots of tech info among many other things… including custom tags, hidden scripts, shadow doms, global identifiers, functions, objects, and so forth.

It can scrape pretty much anything upon request, but there’s no automated bot as such to set up… still though it’s very powerful for inspecting websites and making the right decisions to set up your bot.

Up until last year I had it for public download on a home site but I got tired of website maintenance after 6-7 years of hard work and pulled the plug!

Goof luck with everything.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Can't scrape links from the dynamic website using Selenium

Answers