skip to Main Content

I do some telegram bot, and i need to get links from html.
I want to take href for Matches from this website https://www.hltv.org/matches

My previous code is

     elif message.text == "Matches":
        url_news = "https://www.hltv.org/matches"
        response = requests.get(url_news)
        soup = BeautifulSoup(response.content, "html.parser")
        match_info = []
        match_items = soup.find("div", class_="upcomingMatchesSection")
        print(match_items)
        for item in match_items:
            match_info.append({
                    "link": item.find("div", class_="upcomingMatch").text,
                    "title": item["href"]

            })

And i dont know how i can get links from this body.Appreciate any help

2

Answers


  1. What happens?

    You try to iterate over match_items but there is nothing to iterate, cause you only selected the section including the matches but not the matches itself.

    How to fix?

    Select the upcomingMatches instead and iterate over them:

    match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
    

    Getting the url you have to select an <a>:

    item.a["href"]
    

    Example

    from bs4 import BeautifulSoup as bs
    import requests
    
    
    url_news = "https://www.hltv.org/matches"
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
    
    response = requests.get(url_news, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    match_info = []
    match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
    
    for item in match_items:
        match_info.append({
                "title": item.get_text('|', strip=True),
                "link": item.a["href"]
    
        })
    match_info
    

    Output

    [{'title': '09:00|bo3|1WIN|K23|Pinnacle Fall Series 2|Odds',
      'link': '/matches/2352066/1win-vs-k23-pinnacle-fall-series-2'},
     {'title': '09:00|bo3|INDE IRAE|Nemiga|Pinnacle Fall Series 2|Odds',
      'link': '/matches/2352067/inde-irae-vs-nemiga-pinnacle-fall-series-2'},
     {'title': '10:00|bo3|OPAA|Nexus|Malta Vibes Knockout Series 3|Odds',
      'link': '/matches/2352207/opaa-vs-nexus-malta-vibes-knockout-series-3'},
     {'title': '11:00|bo3|Checkmate|TBC|Funspark ULTI 2021 Asia Regional Series 3|Odds',
      'link': '/matches/2352092/checkmate-vs-tbc-funspark-ulti-2021-asia-regional-series-3'},
     {'title': '11:00|bo3|ORDER|Alke|ESEA Premier Season 38 Australia|Odds',
      'link': '/matches/2352122/order-vs-alke-esea-premier-season-38-australia'},...]
    
    Login or Signup to reply.
  2. You can try this out.

    • All the match information is present inside a <div> with classname as upcomingMatch
    • Select all those <div> and from each <div>, extract the match link which is present inside the <a> tag with class name as match.

    Here is the code:

    import requests
    from bs4 import BeautifulSoup
    
    url_news = "https://www.hltv.org/matches"
    headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
    response = requests.get(url_news,headers=headers)
    soup = BeautifulSoup(response.text, "lxml")
    match_items = soup.find_all("div", class_="upcomingMatch")
    
    for match in match_items:
        link = match.find('a', class_='match a-reset')['href']
        print(f'Link: {link}')
    
    Link: /matches/2352235/malta-vibes-knockout-series-3-quarter-final-1-malta-vibes-knockout-series-3
    Link: /matches/2352098/pinnacle-fall-series-2-quarter-final-2-pinnacle-fall-series-2
    Link: /matches/2352236/malta-vibes-knockout-series-3-quarter-final-2-malta-vibes-knockout-series-3
    Link: /matches/2352099/pinnacle-fall-series-2-quarter-final-3-pinnacle-fall-series-2
    .
    .
    .
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search