I want to create a catalog of courses by scraping data from a website. I want to go to each item in the url https://www.coursicle.com/harvard/courses/
and pull all the course names within each item. I am using the below code
import requests
from bs4 import BeautifulSoup
def scrape_harvard_course_names():
url = "https://www.coursicle.com/harvard/courses/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
tile_container = soup.find("div", id="tileContainer")
links = tile_container.find_all("a")
course_names = []
for link in links:
course_name = link.text
course_names.append(course_name)
# Click the link to get the course name
response = requests.get(link["href"])
soup = BeautifulSoup(response.content, "html.parser")
course_name_from_link = soup.find("h1", class_="course-name").text
course_names.append(course_name_from_link)
return course_names
course_names = scrape_harvard_course_names()
for course_name in course_names:
print(course_name)
in this code soup.find("div", id="tileContainer") doesn’t return anything. Hence this code doesn’t work. Is there a way to scrape this data?
2
Answers
you need add header to request
your code has another problem is the path of href is relative, you need to get absolute path to visit (you should import urljoin first: from urllib.parse import urljoin)
you can use this :