I am trying to scrape the first ten items from this website. I am using Python Selenium/BeautifulSoup. It seems the table is loading using some jquery script. I am honestly stumped where to start as the tutorials and guides aren’t matching up with this website.
EX: A lot of them are saying check the Network tab in inspect element to find the XHR data. This website however doesn’t have anything worth value load in the XHR tab but rather in the JS tab. I found the request URl https://www.anime-planet.com/dist/3p/jquery.min.js?t=1657108207
but it doesn’t seem to do me any justice.
Am I overthinking things and should scrape from the html directly? Any advice would be very appreciated.
2
Answers
This table is NOT loaded from jQuery. It is server-rendered and easily scrapable. You only need
requests
andbeautifulsoup
; Selenium is unnecessary.With some quick DOM inspection, this should be pretty simple. You can do something like this:
Here is a solution based on pandas & requests:
Result printed in terminal:
Relevant pandas docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html
And for requests: https://requests.readthedocs.io/en/latest/