I have this completed for one website, then the html code on the second website is much different.
I have tried to find the information online in how to write the code differently but not luck.
I am wanting to pull the top 3 results from each URL, append them to .csv, then save. The website code is in the image, my code is below. All help is appreciated
from requests_html import HTMLSession
import pandas as pd
urls = [ 'https://www.businesswire.com/portal/site/home/search/?searchType=all&searchTerm=delix&searchPage=1',
]
for url in urls:
r = s.get(url)
content = r.html.find('div.bw-news-list li')
for item in content:
try:
title = item.find('h3 a', first=True).text
except Exception as e:
print(f"Error getting title: {e}")
title = ''
try:
date = item.find('span.bw-date', first=True).text
except Exception as e:
print(f"Error getting date: {e}")
date = ''
try:
summary = item.find('span.bw-news-item-summary', first=True).text
except Exception as e:
print(f"Error getting summary: {e}")
summary = ''
try:
url = item.find('h3 a', first=True).absolute_links.pop()
except Exception as e:
print(f"Error getting URL: {e}")
url = ''
entry_dict = {
'Title': title,
'Date': date,
'Summary': summary,
'URL': url
}
data.append(entry_dict)
df = pd.DataFrame(data)
df.to_csv('BusinessWire_Information.csv', index=False)
print('Finished')
This is what the code looks like on the website:
Results in a blank csv
2
Answers
You don’t see anything because the data is loaded from different URL via JavaScript. Here is an example how you can load the titles:
Prints:
I think for this kind of stuff is better to also use selenium to load dynamic contents before scraping.
Here is an example code I wrote for amazon scraping: