I’m trying to collect links to personal profiles and contacts from the following website:
https://www.dlapiper.com/en-us/people#t=All&sort=relevancy&numberOfResults=100&f:CountriesID=[United%20Kingdom]
I’m using Selenium to do scraping via chromedriver and normally it works just fine – however, for this particular website I can’t get to the source html where all the links to people’s profiles would be visible.
I wrote a standard script which would normally work for any other dynamic website.
links = []
driver = webdriver.Chrome()
driver.get('https://www.dlapiper.com/en-gb/people#t=All&sort=%40lastname%20ascending&f:CountriesID=[United%20Kingdom]')
time.sleep(5)
cookies_button = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookies_button.click()
time.sleep(5)
html = driver.page_source
time.sleep(5)
soup = BeautifulSoup(html, 'html.parser')
parse = soup.find_all('a')
for item in parse:
links.append(item.get('href'))
print(links)
However, links from the people search block can’t get into the driver.page_source – even though I can find all the link elements when I press "inspect" in Chrome. I have tried increasing the time.sleep(), did not help.
I understand that there are lots of javascripts being executed on this page – maybe I need to activate some of them manually? Help would be much appreciated as I don’t know Javascript.
2
Answers
The lawyer’s contact details are in an iframe…
Is your script scanning inside child frames?
That table comes from the Classic browser which exposes iframes and offers lots of tech info among many other things… including custom tags, hidden scripts, shadow doms, global identifiers, functions, objects, and so forth.
It can scrape pretty much anything upon request, but there’s no automated bot as such to set up… still though it’s very powerful for inspecting websites and making the right decisions to set up your bot.
Up until last year I had it for public download on a home site but I got tired of website maintenance after 6-7 years of hard work and pulled the plug!
Goof luck with everything.