Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Beatifulsoup output is Json not HTML, so I cannot parse it using .find methods of bs4

MohamedHedeya
December 14, 2022
151 views
1 vote
2 Answers

I’m trying to scrape this site. I used the following code:

import requests
import json
from bs4 import BeautifulSoup

api_url ='https://seniorcarefinder.com/Providers/List'

headers= {
    "Content-Type":"application/json; charset=utf-8",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}

body_first_page={"Services":["Independent Living","Assisted Living","Long-Term Care / Skilled Nursing","Home Care (Non-Medical)","Home Health Care (Medicare-Certified)","Hospice","Adult Day Services","Active Adult Living"],"StarRatings":[],"PageNumber":1,"Location":"Colorado Springs, CO","Geography":{"Latitude":38.833882,"Longitude":-104.821363},"ProximityInMiles":30,"SortBy":"Verified"}
res = requests.post(api_url,data=json.dumps(body_first_page),headers=headers)
soup = BeautifulSoup(res.text,'html.parser')

However, the resulting soup is in json, so I cannot parse it using .find methods of Beatifulsoup. How can I have it in the normal html, so that I can parse it using bs4 .find() and .find_all() methods?

Answers

- coniferous
- December 14, 2022 at 9:02 pm
- 0 votes
0
I’d recommend actually just using the JSON and converting that to a dict since that’s basically the structure that BS4 uses for HTML.

With the json library, you can convert JSON to a dict and then use regular .get() methods to find the info you’re looking for

https://www.w3schools.com/python/python_json.asp

Login or Signup to reply.

Why not using this structured data? Using pandas you can simply create a dataframe:

pd.DataFrame(
    requests.post(api_url,data=json.dumps(body_first_page),headers=headers)
    .json()['Results']
)

Example

import pandas as pd
import requests
import json
api_url ='https://seniorcarefinder.com/Providers/List'

headers= {
    "Content-Type":"application/json; charset=utf-8",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}

body_first_page={"Services":["Independent Living","Assisted Living","Long-Term Care / Skilled Nursing","Home Care (Non-Medical)","Home Health Care (Medicare-Certified)","Hospice","Adult Day Services","Active Adult Living"],"StarRatings":[],"PageNumber":1,"Location":"Colorado Springs, CO","Geography":{"Latitude":38.833882,"Longitude":-104.821363},"ProximityInMiles":30,"SortBy":"Verified"}
pd.DataFrame(
    requests.post(api_url,data=json.dumps(body_first_page),headers=headers).json()['Results']
)

Please signup or login to give your own answer.

Click here to cancel reply.