I have scraped a JSON from a website. When trying to iterate through the JSON I get a KeyError
, but I’m unsure why. The loop is within the length of the JSON. Any ideas as to what is going on?
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://employment.ucsd.edu/jobs?page_size=250&page_number=1&keyword=clinical%20lab%20scientist&location_city"
"=Remote&location_city=San%20Diego&location_city=Encinitas&location_city=Murrieta&location_city=La%20Jolla"
"&location_city=Not%20Specified&location_city=Vista&sort_by=score&sort_order=DESC "
request = requests.get(url, headers=headers)
response = BeautifulSoup(request.text, "html.parser")
all_data = response.find_all("script", {"type": "application/ld+json"})
df = pd.DataFrame(columns=("Title", "Department", "Salary Range", "Appointment Percent", "URL"))
for data in all_data:
jsn = json.loads(data.string)
jsn_length = len(jsn['itemListElement'])
# print(json.dumps(jsn, indent=4))
n = 0
while n < jsn_length:
# print(jsn['itemListElement'][n])
print(n)
df['URL'] = jsn['itemListElement'][n]
n += 1
Edit: response
Traceback (most recent call last):
File "C:Program FilesJetBrainsPyCharm 2022.1pluginspythonhelperspydevpydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:Program FilesJetBrainsPyCharm 2022.1pluginspythonhelperspydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "C:/Users/Will/PycharmProjects/UCSD_JOB_SCRAPE/main.py", line 19, in <module>
jsn_length = len(jsn['itemListElement'])
KeyError: 'itemListElement'
2
Answers
Element number 250 in the JSON you referenced really doesn’t seem to have an
itemListElement
key:The safest thing is probably to explicitly check against it. E.g.:
To get list of URLs into a DataFrame you can use next example:
Prints: