I am trying to scrap data from a webpage that shows a limited amount of data, and requires the user to click a button to navigate to the next set of records. The webpage achieves that by sending GET requests to itself.
I tried to write a code in Python that would send a GET request to the page hoping to get the next set of results, and write a for loop to retrieve subsequent results, but I am always getting the initial sets (apparently the website is ignoring my params)
This is the website I am targeting:
https://portaltransparencia.procempa.com.br/portalTransparencia/despesaLancamentoPesquisa.do?viaMenu=true&entidade=PROCEMPA
This is my code:
url = "https://portaltransparencia.procempa.com.br/portalTransparencia/despesaLancamentoPesquisa.do?viaMenu=true&entidade=PROCEMPA"
r_params = {
"perform": "view",
"actionForward": "success",
"validate": True,
"pesquisar": True,
"defaultSearch.pageSize":23,
"defaultSearch.currentPage": 2
}
page = requests.get(url, params=r_params)
I expected that this generated a response with data from the 2nd page, but it is responding that from the first page.
2
Answers
I just edited the previous answer with the code that effectively worked for this example.
(Just had to adjust the pointer to the button)
The issue you’re facing is likely because the website you’re trying to scrape uses some client-side JavaScript to handle the pagination and retrieve the next set of records. Sending a GET request with query parameters may not be sufficient in this case because the website might not respond to those parameters the way you expect.
To scrape data from such websites that rely on JavaScript to load content dynamically, you would typically need to use a tool like Selenium, which can automate interactions with a web page, including clicking buttons and handling JavaScript events.
Here’s an example of how you might use Selenium to interact with the website and retrieve data: