skip to Main Content

I am trying to scrap data from a webpage that shows a limited amount of data, and requires the user to click a button to navigate to the next set of records. The webpage achieves that by sending GET requests to itself.

I tried to write a code in Python that would send a GET request to the page hoping to get the next set of results, and write a for loop to retrieve subsequent results, but I am always getting the initial sets (apparently the website is ignoring my params)

This is the website I am targeting:
https://portaltransparencia.procempa.com.br/portalTransparencia/despesaLancamentoPesquisa.do?viaMenu=true&entidade=PROCEMPA

This is my code:

url = "https://portaltransparencia.procempa.com.br/portalTransparencia/despesaLancamentoPesquisa.do?viaMenu=true&entidade=PROCEMPA"

r_params = {
    "perform": "view",
    "actionForward": "success",
    "validate": True,
    "pesquisar": True,
    "defaultSearch.pageSize":23,
    "defaultSearch.currentPage": 2
    }
page = requests.get(url, params=r_params)

I expected that this generated a response with data from the 2nd page, but it is responding that from the first page.

2

Answers


  1. Chosen as BEST ANSWER

    I just edited the previous answer with the code that effectively worked for this example.

    (Just had to adjust the pointer to the button)

    # Find the "Next Page" button and click it
    next_button = driver.find_element(By.ID, "cmdProximo")
    next_button.click()
    

  2. The issue you’re facing is likely because the website you’re trying to scrape uses some client-side JavaScript to handle the pagination and retrieve the next set of records. Sending a GET request with query parameters may not be sufficient in this case because the website might not respond to those parameters the way you expect.

    To scrape data from such websites that rely on JavaScript to load content dynamically, you would typically need to use a tool like Selenium, which can automate interactions with a web page, including clicking buttons and handling JavaScript events.

    Here’s an example of how you might use Selenium to interact with the website and retrieve data:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    
    # Initialize the WebDriver
    driver = webdriver.Chrome()  # You'll need to have ChromeDriver installed
    
    # Open the URL
    url = "https://portaltransparencia.procempa.com.br/portalTransparencia/despesaLancamentoPesquisa.do?viaMenu=true&entidade=PROCEMPA"
    driver.get(url)
    
    # Find the "Next Page" button and click it
    next_button = driver.find_element(By.XPATH, "//button[@title='Próxima Página']")
    next_button.click()
    
    # Now you are on the next page with the next set of records
    # You can scrape the data from this page as needed
    
    # When you are done, you can close the WebDriver
    driver.quit()
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search