skip to Main Content

I’m trying to collect links to personal profiles and contacts from the following website:
https://www.dlapiper.com/en-us/people#t=All&sort=relevancy&numberOfResults=100&f:CountriesID=[United%20Kingdom]

I’m using Selenium to do scraping via chromedriver and normally it works just fine – however, for this particular website I can’t get to the source html where all the links to people’s profiles would be visible.

I wrote a standard script which would normally work for any other dynamic website.

links = []
driver = webdriver.Chrome()
driver.get('https://www.dlapiper.com/en-gb/people#t=All&sort=%40lastname%20ascending&f:CountriesID=[United%20Kingdom]')
time.sleep(5)
cookies_button = driver.find_element(By.ID, "onetrust-reject-all-handler")
cookies_button.click()
time.sleep(5)
html = driver.page_source
time.sleep(5)
soup = BeautifulSoup(html, 'html.parser')
parse = soup.find_all('a')
for item in parse:
    links.append(item.get('href'))
print(links)

However, links from the people search block can’t get into the driver.page_source – even though I can find all the link elements when I press "inspect" in Chrome. I have tried increasing the time.sleep(), did not help.

I understand that there are lots of javascripts being executed on this page – maybe I need to activate some of them manually? Help would be much appreciated as I don’t know Javascript.

2

Answers


  1. The lawyer’s contact details are in an iframe…

    1   Frame ID    myIframe
    2   Frame Name  Unused
    3   Frame Title People Index Hosted Search Page
    4   Frame Source    https://www.dlapiper.com/en-US/coveosearchpages/people%20index%20hosted%20search%20page#t=All&sort=relevancy&f:CountriesID=[United%20States]
    5   Frame Domain    www.dlapiper.com
    6   Type    text/html
    7   Mode    CSS1Compat
    8   Language    en
    9   Encoding    UTF-8
    10  Modified    06/03/2024 19:25:06
    11  Load Time   2.52 seconds
    12  Source Size 361 bytes
    13  Position    0 - 291 pixels
    14  Viewport    1903 x 1500 pixels
    

    Is your script scanning inside child frames?

    Login or Signup to reply.
  2. Can you elaborate a bit on the extract you have just pasted? This
    looks new to me

    That table comes from the Classic browser which exposes iframes and offers lots of tech info among many other things… including custom tags, hidden scripts, shadow doms, global identifiers, functions, objects, and so forth.

    It can scrape pretty much anything upon request, but there’s no automated bot as such to set up… still though it’s very powerful for inspecting websites and making the right decisions to set up your bot.

    Up until last year I had it for public download on a home site but I got tired of website maintenance after 6-7 years of hard work and pulled the plug!

    Goof luck with everything.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search