I was wondering why I get a key error when i use record_path on this data set. It comes from the FDA api and I grabbed the first two results.
data = {
"meta": {
"disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.",
"terms": "https://open.fda.gov/terms/",
"license": "https://open.fda.gov/license/",
"last_updated": "2024-11-15",
"results": {
"skip": 0,
"limit": 2,
"total": 118943
}
},
"results": [
{
"product_ndc": "73647-062",
"generic_name": "MENTHOL, CAMPHOR",
"labeler_name": "Just Brands LLC",
"brand_name": "JUST CBD - CBD AND THC ULTRA RELIEF",
"active_ingredients": [
{
"name": "CAMPHOR (SYNTHETIC)",
"strength": "2 g/100g"
},
{
"name": "MENTHOL",
"strength": "6 g/100g"
}
],
"finished": true,
"packaging": [
{
"package_ndc": "73647-062-04",
"description": "113 g in 1 BOTTLE, PUMP (73647-062-04)",
"marketing_start_date": "20230314",
"sample": false
}
],
"listing_expiration_date": "20251231",
"openfda": {
"manufacturer_name": [
"Just Brands LLC"
],
"spl_set_id": [
"f664eb79-8897-3a49-e053-2995a90a37b4"
],
"is_original_packager": [
true
],
"unii": [
"5TJD82A1ET",
"L7T10EIP3A"
]
},
"marketing_category": "OTC MONOGRAPH DRUG",
"dosage_form": "GEL",
"spl_id": "16c906dd-6989-9a79-e063-6394a90afa71",
"product_type": "HUMAN OTC DRUG",
"route": [
"TOPICAL"
],
"marketing_start_date": "20230314",
"product_id": "73647-062_16c906dd-6989-9a79-e063-6394a90afa71",
"application_number": "M017",
"brand_name_base": "JUST CBD - CBD AND THC ULTRA RELIEF"
},
{
"product_ndc": "0591-4039",
"marketing_end_date": "20250930",
"generic_name": "CLOBETASOL PROPIONATE",
"labeler_name": "Actavis Pharma, Inc.",
"brand_name": "CLOBETASOL PROPIONATE",
"active_ingredients": [
{
"name": "CLOBETASOL PROPIONATE",
"strength": ".05 g/mL"
}
],
"finished": true,
"packaging": [
{
"package_ndc": "0591-4039-46",
"description": "1 BOTTLE in 1 CARTON (0591-4039-46) / 59 mL in 1 BOTTLE",
"marketing_start_date": "20150828",
"marketing_end_date": "20250930",
"sample": false
},
{
"package_ndc": "0591-4039-74",
"description": "1 BOTTLE in 1 CARTON (0591-4039-74) / 125 mL in 1 BOTTLE",
"marketing_start_date": "20150828",
"marketing_end_date": "20250930",
"sample": false
}
],
"openfda": {
"manufacturer_name": [
"Actavis Pharma, Inc."
],
"rxcui": [
"861512"
],
"spl_set_id": [
"907e425a-720a-4180-b97c-9e25008a3658"
],
"is_original_packager": [
true
],
"unii": [
"779619577M"
]
},
"marketing_category": "NDA AUTHORIZED GENERIC",
"dosage_form": "SPRAY",
"spl_id": "33a56b8b-a9a6-4287-bbf4-d68ad0c59e07",
"product_type": "HUMAN PRESCRIPTION DRUG",
"route": [
"TOPICAL"
],
"marketing_start_date": "20150828",
"product_id": "0591-4039_33a56b8b-a9a6-4287-bbf4-d68ad0c59e07",
"application_number": "NDA021835",
"brand_name_base": "CLOBETASOL PROPIONATE",
"pharm_class": [
"Corticosteroid Hormone Receptor Agonists [MoA]",
"Corticosteroid [EPC]"
]
}
]
}
under the results key there are 2 nested keys named ‘active_ingredients’ and ‘packaging’
when i normalize i get
result = pd.json_normalize(data['results'], record_path=["packaging"],meta=['product_ndc'])
the expected columns
package_ndc description marketing_start_date sample marketing_end_date product_ndcs
but when i add active_ingredients to the record_path list i get a key error. The same goes for meta as well. When i add the other columns like ‘brand_name’ and ‘generic_name’ to the meta list, I get a key error. to see the keys
this doesnt work
result = pd.json_normalize(data['results'], record_path=["packaging","active_ingredients"],meta=['product_ndc','brand_name','generic_name'])
Thanks for any help
2
Answers
When you specify multiple record_path entries (like
"packaging"
and"active_ingredients"
), pandas expects that the secondrecord_path
("active_ingredients"
) exists within every element of the first record_path ("packaging"
), but, in your data,active_ingredients
is not a nested property of packagingDo this to solve this
which gives
Reading through the documentation for pandas.json_normalize the record_path takes a str variable.
Thus you would need to create two dataframes for each record path and then merge them together based on a common field.