I’m trying to retrieve values from different layers of a json file, I’m using a quite silly way — get the values from one dictionary inside another dictionary through for looping. I want to get all the "title" and "question" and put them in a list or a pandas dataframe. How can I retrieve the values needed in a simpler way? How to handle json files efficiently in general?
Thanks a lot for anyone who answers the question:)
here’s a piece of the json:
{
"contact": "xxx",
"version": 1.0,
"data": [
{
"title": "anges-musiciens-(national-gallery)",
"paragraphs": [
{
"qas": [
{
"answers": [{
"text": "La Vierge aux rochers"
}
],
"question": "Que concerne principalement les documents ?"
}
}
]
}
]
}
titles = []
questions = []
for i in data["data"]:
titles.append(i["title"])
for p in i["paragraphs"]:
for q in p["qas"]:
questions.append(q["question"])
print(titles)
print(questions)
3
Answers
You can use recursion to perform a depth-first-search on the nested structure:
If the structure is regular (i.e. always the same hierarchy patterns and no missing keys when a dictionary is present), then you can obtain your results with nested list comprehensions:
If the structure is not regular, you will need to keep track of new entries as you progress deeper and deeper in the structure. You can do this with a list (or a queue):
output:
If you want to return a DataFrame