skip to Main Content
 <div class="comments-post-meta__profile-info-wrapper display-flex">
    <a class="app-aware-link  inline-flex overflow-hidden t-16 t-black t-bold tap-target" target="_self" href="https://www.linkedin.com/in/ACoAAAAg-vkBuoZD8xeJW57GlPMiPRWUe-jvvSM" data-test-app-aware-link="">
      <h3 class="comments-post-meta__actor display-flex flex-column overflow-hidden t-12 t-normal t-black--light">
        <span class="comments-post-meta__name text-body-small-open t-black">
          <span class="comments-post-meta__name-text hoverable-link-text mr1">
            <span dir="ltr"><span aria-hidden="true"><!---->Nathan Greenhut<!----></span>
            <span class="visually-hidden"><!---->View Nathan Greenhut’s profile<!----></span>
          </span>
        </span>
</div>

I’m Trying to scrape the name of the people that commented on a particular LinkedIn post.

I tried this code:

for i in soup.find_all("span",attrs = {"class" : "comments-post-meta__name-text hoverable-link-text mr1"}):
    print(i.find('span').get_text())

The output I got is:

Nathan GreenhutView Nathan Greenhut’s profile

But the Output I want is:

Nathan Greenhut

2

Answers


  1. You can use the following solution by going down the html structure using find_next()

    from bs4 import BeautifulSoup
    
    
    def main():
        html =' <div class="comments-post-meta__profile-info-wrapper display-flex"><a class="app-aware-link  inline-flex overflow-hidden t-16 t-black t-bold tap-target" target="_self" href="https://www.linkedin.com/in/ACoAAAAg-vkBuoZD8xeJW57GlPMiPRWUe-jvvSM" data-test-app-aware-link=""><h3 class="comments-post-meta__actor display-flex flex-column overflow-hidden t-12 t-normal t-black--light"><span class="comments-post-meta__name text-body-small-open t-black"><span class="comments-post-meta__name-text hoverable-link-text mr1"><span dir="ltr"><span aria-hidden="true"><!---->Nathan Greenhut<!----></span><span class="visually-hidden"><!---->View Nathan Greenhut’s profile<!----></span></span></span></div>'
        soup = BeautifulSoup(html, "lxml")
        comments_post = soup.find("span", "comments-post-meta__name-text hoverable-link-text mr1")
        comments_post_aria_hidden = comments_post.find("span", dir="ltr").find_next("span").text
        print(comments_post_aria_hidden)
    
    
    if __name__ == '__main__':
        main()
    

    Result:

    Nathan Greenhut
    
    Login or Signup to reply.
  2. You could select the element directly by its attribute:

    soup.find('span', {'aria-hidden': 'true'}).get_text(strip=True)
    

    or by css selector

    soup.select_one('[aria-hidden="true"]').get_text(strip=True)
    

    or if there are other elements with that kind of attribute being mor specific with:

    soup.select_one('.comments-post-meta__profile-info-wrapper [aria-hidden="true"]').get_text(strip=True)
    
    
    from bs4 import BeautifulSoup
    
    html = '''
    <div class="comments-post-meta__profile-info-wrapper display-flex">
        <a class="app-aware-link  inline-flex overflow-hidden t-16 t-black t-bold tap-target" target="_self" href="https://www.linkedin.com/in/ACoAAAAg-vkBuoZD8xeJW57GlPMiPRWUe-jvvSM" data-test-app-aware-link="">
          <h3 class="comments-post-meta__actor display-flex flex-column overflow-hidden t-12 t-normal t-black--light">
            <span class="comments-post-meta__name text-body-small-open t-black">
              <span class="comments-post-meta__name-text hoverable-link-text mr1">
                <span dir="ltr"><span aria-hidden="true"><!---->Nathan Greenhut<!----></span>
                <span class="visually-hidden"><!---->View Nathan Greenhut’s profile<!----></span>
              </span>
            </span>
    </div>
    '''
    soup = BeautifulSoup(html)
    
    soup.select_one('[aria-hidden="true"]').get_text(strip=True)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search