skip to Main Content

I have a for loop in python which extracts data using beautifulsoup from a website and appends them into a list.
I am trying to scrape tags from event names ex: AI, Big Data, ML etc.

My code:

import requests
from bs4 import BeautifulSoup

URL = "https://aiml.events/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'lxml')

# Scrape Event Tags
event_tags_list = []
event_tag_div = soup.find_all('div', class_ = 'card-body')
for event_div in event_tag_div:
  event_span = event_div.find_all('span', class_  = 'badge badge-light badge-pill')
  for event_tags in event_span:
    print(event_tags.text)
     

Tags I want to fetch

I am able to fetch the tags but they are all independent. I want to be able to group them together.
Currently my list is like this:

tag_list = ['Artificial Intelligence', 'Artificial Intelligence','Machine Learning', 'Healthcare', 'Artificial Intelligence','Public Sector' ] 

My expectation:

tag_list = ['Artificial Intelligence', 'Artificial Intelligence,Machine Learning, Healthcare', 'Artificial Intelligence,Public Sector' ] 

Any help is appreciated.
Sorry if the question is too basic.

2

Answers


  1. Replace the inner loop with a generator that you join into a string.

    for event_div in event_tag_div:
        event_span = event_div.find_all('span', class_  = 'badge badge-light badge-pill')
        event_tag_list.append(','.join(event_tag.text for event_tag in event_span))
    
    Login or Signup to reply.
  2. Here is the working solution:

    Code:

    import requests
    from bs4 import BeautifulSoup
    
    URL = "https://aiml.events/"
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'lxml')
    
    # Scrape Event Tags
    event_tags_list = []
    event_tag_div = soup.find_all('div', class_ = 'card-body')
    for event_div in event_tag_div:
        
        event_span = event_div.find_all('span', class_  = 'badge badge-light badge-pill')
      #for event_tags in event_span:
        event_tags_list.append(','.join(event_tags.text for event_tags in event_span))
          
        print(event_tags_list)
    

    OUTPUT:

    ['Artificial Intelligence', 'Artificial Intelligence,Machine Learning,Healthcare', 'Artificial Intelligence,Public Sector', 'Artificial Intelligence,Machine Learning,Data Analytics,Customer Experience,Chatbots,Automation', 'Artificial Intelligence,Machine Learning', 'Data Analytics,Finance', '', 'Artificial Intelligence,Big Data,Blockchain', 'Artificial Intelligence,Machine Learning,Data Analytics,Blockchain,Customer Experience,Chatbots,IoT,Automation,Digital Transformation,Privacy,5G']
    ['Artificial Intelligence', 'Artificial Intelligence,Machine Learning,Healthcare', 'Artificial Intelligence,Public Sector', 'Artificial Intelligence,Machine Learning,Data Analytics,Customer Experience,Chatbots,Automation', 'Artificial Intelligence,Machine Learning', 'Data Analytics,Finance', '', 'Artificial Intelligence,Big Data,Blockchain', 'Artificial Intelligence,Machine Learning,Data Analytics,Blockchain,Customer Experience,Chatbots,IoT,Automation,Digital Transformation,Privacy,5G', 'Artificial Intelligence,Machine Learning,Blockchain,Customer Experience,Chatbots,IoT,Digital Transformation,Privacy,5G']
    ['Artificial Intelligence', 'Artificial Intelligence,Machine Learning,Healthcare', 'Artificial Intelligence,Public Sector', 'Artificial Intelligence,Machine Learning,Data Analytics,Customer Experience,Chatbots,Automation', 'Artificial Intelligence,Machine Learning', 'Data Analytics,Finance', '', 'Artificial Intelligence,Big Data,Blockchain', 'Artificial Intelligence,Machine Learning,Data Analytics,Blockchain,Customer Experience,Chatbots,IoT,Automation,Digital Transformation,Privacy,5G', 'Artificial Intelligence,Machine Learning,Blockchain,Customer Experience,Chatbots,IoT,Digital Transformation,Privacy,5G', 'Computer Vision']
    

    … so on

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search