skip to Main Content
import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'}

questionlist = []

url = "https://seekingalpha.com/market-news?page=20"

r = requests.get(url, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

questions = soup.find_all('article', {'class': 'mT-jA ga-jA Q-b8 R-cS R-df ks-IX R-cG R-dJ ks-IX R-cG R-dJ ks-I0 ks-I0 mT-NM'})

for page in range(1, 10):
    for item in questions:
        question = {
        'title': item.find('h3', {'class': 'km-X R-cw Q-cs km-IM V-gT V-g9 V-hj km-IO V-hY V-ib V-ip km-II R-fZ'}).text,
        'link': 'https://seekingalpha.com/market-news' + item.find('a', {'class': 'hq-ox R-fu'})['href'],
        'date': item.find('span', {'class': 'mU-uO mU-gE'}),
        }
        questionlist.append(question)
    
print(questionlist)

why my loop is not working i am scrapping for multiple pages but output is coming for single page multiple times

2

Answers


  1. The pagination is implemented with request to external URL via JavaScript (so beautifulsoup doesn’t see the new pages). To simulate this request you can do for example:

    import requests
    
    api_url = "https://seekingalpha.com/api/v3/news"
    
    params = {
        "filter[category]": "market-news::all",
        "filter[since]": "0",
        "filter[until]": "0",
        "include": "author,primaryTickers,secondaryTickers",
        "isMounting": "true",
        "page[size]": 25,
        "page[number]": 22,
    }
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0",
        'Referer': 'https://seekingalpha.com/market-news?page=22'
    }
    
    with requests.session() as s:
        s.headers = headers
        # set cookies
        s.get('https://seekingalpha.com/market-news')
    
        for p in range(1, 5):  # <-- increase this range for more pages
            params['page[number]'] = p
    
            data = s.get(api_url, params=params).json()
            # print sample data
            for d in data["data"]:
                print(d["attributes"]["title"])
    

    Prints:

    
    ...
    
    Thermo Fisher tests for preeclampsia risk gets FDA nod
    QCR Holdings declares $0.06 dividend
    RingCentral repurchases ~$461M senior notes
    Amphenol slips as Credit Suisse downgrades on 'weakness' in certain markets
    Innovid receives NYSE notice on non-compliance
    Investors were net buyers of fund assets for the fourth consecutive week, adding $4.6B
    4 stocks to watch on Friday: Deere, Applied Materials and more
    TIO, PHIO and ALIM among pre-market losers
    
    ...
    
    Login or Signup to reply.
  2. As has been said, you need to have a way to tell your loop to actually scrape different web pages. For example having all the links in a list, or updating the link each time when the update is a simple page number change, or telling your code to press a button to change page.

    I recommend you to go through following link to learn more. Specifically part titled
    How to Scrape Multiple Web Pages
    Web Scraping tutorial

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search