import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'}
questionlist = []
url = "https://seekingalpha.com/market-news?page=20"
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
questions = soup.find_all('article', {'class': 'mT-jA ga-jA Q-b8 R-cS R-df ks-IX R-cG R-dJ ks-IX R-cG R-dJ ks-I0 ks-I0 mT-NM'})
for page in range(1, 10):
for item in questions:
question = {
'title': item.find('h3', {'class': 'km-X R-cw Q-cs km-IM V-gT V-g9 V-hj km-IO V-hY V-ib V-ip km-II R-fZ'}).text,
'link': 'https://seekingalpha.com/market-news' + item.find('a', {'class': 'hq-ox R-fu'})['href'],
'date': item.find('span', {'class': 'mU-uO mU-gE'}),
}
questionlist.append(question)
print(questionlist)
why my loop is not working i am scrapping for multiple pages but output is coming for single page multiple times
2
Answers
The pagination is implemented with request to external URL via JavaScript (so
beautifulsoup
doesn’t see the new pages). To simulate this request you can do for example:Prints:
As has been said, you need to have a way to tell your loop to actually scrape different web pages. For example having all the links in a list, or updating the link each time when the update is a simple page number change, or telling your code to press a button to change page.
I recommend you to go through following link to learn more. Specifically part titled
How to Scrape Multiple Web Pages
Web Scraping tutorial