I’m trying to scrape all the info on an artist’s billboard page as it relates to their singles and how they performed. I’m trying to re-engineer a solution I’ve seen elsewhere.. It works up to a certain point but once I get past "peak pos" I don’t know how to include "peak date" and "wks" from the table. I’m basically trying to capture all the info as it appears in the table on the website and eventually put that in a dataframe but can’t get the last two columns. Any pointers will be greatly appreciated. Thanks!
import requests
from bs4 import BeautifulSoup
url = requests.get('https://www.billboard.com/artist/john-lennon/chart-history/hsi/')
soup = BeautifulSoup(url.content, 'html.parser')
result = soup.find_all('div','o-chart-results-list-row')
for res in result:
song = res.find('h3').text.strip()
artist = res.find('h3').find_next('span').text.strip()
debute = res.find('span').find_next('span').text.strip()
peak = res.find('a').find_next('span').text.strip()
#peak_date = ?
#wks = ?
print("song: "+str(song))
print("artist: "+ str(artist))
print("debute: "+ str(debute))
print("peak: "+ str(peak))
print("___________________________________________________")
song: (Just Like) Starting Over
artist: John Lennon
debute: 11.01.80
peak: 1
peak_date:
wks:
3
Answers
Try:
Prints:
I would check the source code from the page to check where is each column located and taking advantage of the class (in the case of peak_date you can find the value in the next
<a>
, and in the case of the weeks you can find it in the next<span>
with "artist-chart-row-week-on-chart" as the specific class name).The whole code to get what you want is the below:
There are generally several options to access elements from the html document. One is chaining find/find_next like you did. This works and can be adopted to get the weeks and peak date that you are looking for.
However, a much better solution would be to look for the elements directly by their class name. This will allow your script to work even when the order of the elements is changed, as long as the class names stay the same. It may look like this:
The complete code would then be: