skip to Main Content

I’m trying to scrape clap data from medium let’s say this is the link. When I inspect it looks like in this photo.

webScraper medium button

My code looks like this :

URL = "https://medium.com/@xdxxxx4713/basic-settings-of-nginx-aeace532534f"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

There’s only — in the output where there should be the value of the clap. If it’s possible how can I scrape the clap value without using Selenium? After getting the value with HTML request "requests.get(URL)" I can do the rest. The html request returns empty at where the clap value should be.

Output

  • I tried to use urllib library but I have Non-ASCII characters on my links
  • I tried using BeautifulSoup’s findChildren library.
  • I tried using BeautifulSoup’s descendants traverse algorithm.

2

Answers


  1. Chosen as BEST ANSWER

    As @esqew mentioned on commands. There's an API for that but It didn't work for me. But I was inspired by the API code. Here's my code :

        aditionalPage = requests.get(pages).content.decode("utf-8")
        claps = aditionalPage.split("clapCount":")[1]
        endIndex = claps.index(",")
        claps = int(claps[0:endIndex])
    

  2. It’s possible, try the code below:

    import requests
    
    data = [{"operationName":"ClapCountQuery","variables":{"postId":"aeace532534f"},"query":"query ClapCountQuery($postId: ID!) {n  postResult(id: $postId) {n    __typenamen    ... on Post {n      idn      clapCountn      __typenamen    }n  }n}n"}]
    r = requests.post('https://medium.com/_/graphql', json=data)
    print(r.json()[0]['data']['postResult']['clapCount'])
    

    This will return:

    4
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search