skip to Main Content

I have the following code

from bs4 import BeautifulSoup
import requests

URL = 'https://www.youtube.com/gaming/games'

response = requests.get(URL).text
soup = BeautifulSoup(response, 'html.parser')

elem = soup.find_all('a', class_ = 'yt-simple-endpoint focus-on-expand style-scope ytd-game-details-renderer')

print(elem)

I am trying to isolate all the individual games on https://www.youtube.com/gaming/games.

I would like to just get the game name and how many people are watching. My issue is that I just can’t find the right " ", class_ = '' " combo.

I’ve tried the following:
soup.find_all:

('a', class_ = 'yt-simple-endpoint focus-on-expand style-scope ytd-game-details-renderer')
('game', class_ = 'style-scope ytd-game-card-renderer')
(class_ = 'style-scope ytd-grid-renderer')
(id = 'items')

And many different variations.

If I just use find_all('div') I get random data. I really think (id = 'items') is my solution, but aside from 'div' I get the same response every time, a pair of brackets []. I’ve also tried searching the individual div class objects I get in the results, but so far I’m getting the same [] results or random data that I don’t need.

If I use find instead of find_all (elem = soup.find(id='items')) I get "None" as a response.

I’m looking at the subscriber count, with an id of 'live-viewers-count', and it still prints [].

What I’m looking at:

2

Answers


  1. You can’t really do this because this page is loaded dynamically with javascript.

    BeautifulSoup doesn’t run javascript.

    See, when right-clicking in the page and selecting show page source, there is mostly just compiled javascript.

    To scrape youtube, I’d either use Selenium to run a headless web-browser, or Js2Py if you need performance.

    … or simply use youtube APIs : https://developers.google.com/youtube/v3/docs ^_^’

    Login or Signup to reply.
  2. Update
    Here’s how to traverse the game data JSON elements.

    First, narrow down to game_data, which is a list of JSON elements.

    game_data = (
        json.loads(main[20:-1])
        ['contents']
        ['twoColumnBrowseResultsRenderer']
        ['tabs'][0]
        ['tabRenderer']
        ['content']
        ['sectionListRenderer']
        ['contents'][0]
        ['itemSectionRenderer']
        ['contents'][0]
        ['shelfRenderer']
        ['content']
        ['gridRenderer']
        ['items']
    )
    

    Now iterate over the list. For each element, there’s a section of the data packet we’ll call details, which contains game name and views.

    Then use the paths I showed in my original answer to capture name and view count for each game.

    for game in game_data:
        details = (
            game
            ['gameCardRenderer']
            ['game']
            ['gameDetailsRenderer']
        )
        game_name = details['title']['simpleText']
        
        view_ct = details['liveViewersText']['runs'][0]['text']
        
        print(f"Game: {game_name} / Views: {view_ct}")
    

    Output

    Game: Valorant / Views: 100K
    Game: Grand Theft Auto V / Views: 61K
    Game: Dota 2 / Views: 57K
    Game: Minecraft / Views: 50K
    # ...
    

    Original answer

    All of the data you need is stored as JSON in one of the <script> tags, it’s just a pain to follow down the nested object to the fields you need. You can see it’s all there if you just look at soup.body.

    I had a few spare minutes just now, this should get you started – shows you how to get to the Game and Live Viewers count for the first game listed currently (‘Valorant’)

    import json
    
    # buried as JSON in a <script> inside <body>
    main = soup.body.find_all('script')[13].contents[0]
    

    This is how you get to game name (you can iterate instead of indexing [0] to get all the games):

    # Game name
    print('Game:', json.loads(main[20:-1])
     ['contents']
     ['twoColumnBrowseResultsRenderer']
     ['tabs'][0]
     ['tabRenderer']
     ['content']
     ['sectionListRenderer']
     ['contents'][0]
     ['itemSectionRenderer']
     ['contents'][0]
     ['shelfRenderer']
     ['content']
     ['gridRenderer']
     ['items'][0]
     ['gameCardRenderer']
     ['game']
     ['gameDetailsRenderer']
     ['title']
     ['simpleText']
    )
    

    Output

    Game: Valorant
    

    And this is Viewer Count:

    print('Live Viewers:', json.loads(main[20:-1])
     ['contents']
     ['twoColumnBrowseResultsRenderer']
     ['tabs'][0]
     ['tabRenderer']
     ['content']
     ['sectionListRenderer']
     ['contents'][0]
     ['itemSectionRenderer']
     ['contents'][0]
     ['shelfRenderer']
     ['content']
     ['gridRenderer']
     ['items'][0]
     ['gameCardRenderer']
     ['game']
     ['gameDetailsRenderer']
     ['liveViewersText']
     ['runs'][0]
     ['text'])
    

    Output

    Live Viewers: 100K
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search