I’m working on a project for a shipping container company where we need to process data from an external API that provides information about container shipments. The JSON data structure returned by the API is highly dynamic, with certain keys and nested objects that can change depending on the type of shipment or the data being provided.
{
"container_id": "ABC123",
"status": "In Transit",
"location": {
"latitude": "40.7128",
"longitude": "-74.0060",
"city": "New York"
},
"cargo": {
"type": "Electronics",
"weight": "500kg"
},
"events": [
{
"event_type": "Departure",
"timestamp": "2024-08-29T08:00:00Z"
},
{
"event_type": "Customs Clearance",
"timestamp": "2024-08-30T12:00:00Z"
}
]
}
In another response, the structure might vary, with some keys missing or new ones added:
{
"container_id": "DEF456",
"status": "Delivered",
"location": {
"city": "Los Angeles"
},
"events": [
{
"event_type": "Arrival",
"timestamp": "2024-08-31T14:00:00Z"
}
]
}
I need a Pythonic way to handle this kind of dynamic JSON data efficiently. Specifically, I’m looking for strategies to:
- Safely access deeply nested keys without causing errors if they don’t exist.
- Iterate over the JSON to extract specific information related to shipping status, location, and events, regardless of the structure’s variability.
- Ensure that the code remains readable and performant, especially when processing large amounts of shipment data.
What are some best practices or libraries that can help manage this type of complex and dynamic JSON parsing in Python, particularly in the context of a shipping or logistics company?
2
Answers
You can use Python’s
dict.get()
method to safely access deeply nested keys.Example:
Output:
I would use Pydantic for strongly typed validated models (the data model here is derived using QuickType, but you could do it by hand or from a spec you have), so instead of working with dicts, you’d work with actual Python objects, and things will break during parsing, not when you try to work with the data.
prints out
and you could access those objects with e.g.
cont.location.city
orfor ev in cont.events: print(ev.timestamp.year)
.