I’m trying to scrape this site. I used the following code:
import requests
import json
from bs4 import BeautifulSoup
api_url ='https://seniorcarefinder.com/Providers/List'
headers= {
"Content-Type":"application/json; charset=utf-8",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}
body_first_page={"Services":["Independent Living","Assisted Living","Long-Term Care / Skilled Nursing","Home Care (Non-Medical)","Home Health Care (Medicare-Certified)","Hospice","Adult Day Services","Active Adult Living"],"StarRatings":[],"PageNumber":1,"Location":"Colorado Springs, CO","Geography":{"Latitude":38.833882,"Longitude":-104.821363},"ProximityInMiles":30,"SortBy":"Verified"}
res = requests.post(api_url,data=json.dumps(body_first_page),headers=headers)
soup = BeautifulSoup(res.text,'html.parser')
However, the resulting soup
is in json, so I cannot parse it using .find methods of Beatifulsoup. How can I have it in the normal html, so that I can parse it using bs4 .find() and .find_all() methods?
2
Answers
I’d recommend actually just using the JSON and converting that to a dict since that’s basically the structure that BS4 uses for HTML.
With the
json
library, you can convert JSON to a dict and then use regular .get() methods to find the info you’re looking forhttps://www.w3schools.com/python/python_json.asp
Why not using this structured data? Using
pandas
you can simply create a dataframe:Example