skip to Main Content

I’m working on a project for a shipping container company where we need to process data from an external API that provides information about container shipments. The JSON data structure returned by the API is highly dynamic, with certain keys and nested objects that can change depending on the type of shipment or the data being provided.

{
    "container_id": "ABC123",
    "status": "In Transit",
    "location": {
        "latitude": "40.7128",
        "longitude": "-74.0060",
        "city": "New York"
    },
    "cargo": {
        "type": "Electronics",
        "weight": "500kg"
    },
    "events": [
        {
            "event_type": "Departure",
            "timestamp": "2024-08-29T08:00:00Z"
        },
        {
            "event_type": "Customs Clearance",
            "timestamp": "2024-08-30T12:00:00Z"
        }
    ]
}

In another response, the structure might vary, with some keys missing or new ones added:

{
    "container_id": "DEF456",
    "status": "Delivered",
    "location": {
        "city": "Los Angeles"
    },
    "events": [
        {
            "event_type": "Arrival",
            "timestamp": "2024-08-31T14:00:00Z"
        }
    ]
}

I need a Pythonic way to handle this kind of dynamic JSON data efficiently. Specifically, I’m looking for strategies to:

  • Safely access deeply nested keys without causing errors if they don’t exist.
  • Iterate over the JSON to extract specific information related to shipping status, location, and events, regardless of the structure’s variability.
  • Ensure that the code remains readable and performant, especially when processing large amounts of shipment data.

What are some best practices or libraries that can help manage this type of complex and dynamic JSON parsing in Python, particularly in the context of a shipping or logistics company?

2

Answers


  1. You can use Python’s dict.get() method to safely access deeply nested keys.

    Example:

    data1 = {
        "container_id": "ABC123",
        "location": {
            "latitude": "40.7128",
            "longitude": "-74.0060",
            "city": "New York"
        }
    }
    
    data2 = {
        "container_id": "DEF456",
        "location": {
            "city": "Los Angeles"
        }
    }
    
    print("Data 1:")
    latitude = data1.get('location', {}).get('latitude')
    print("Latitude: " + latitude) if latitude else print('Latitude not found')
    
    print("Data 2:")
    longitude = data2.get('location', {}).get('longitude')
    print(longitude) if longitude else print('Longitude not found')
    
    

    Output:

    Data 1:
    Latitude: 40.7128
    Data 2:
    Longitude not found
    
    Login or Signup to reply.
  2. I would use Pydantic for strongly typed validated models (the data model here is derived using QuickType, but you could do it by hand or from a spec you have), so instead of working with dicts, you’d work with actual Python objects, and things will break during parsing, not when you try to work with the data.

    from datetime import datetime
    from typing import Optional
    
    import pydantic
    
    examples = [
        # (from OP's question, elided here)
    ]
    
    
    class Cargo(pydantic.BaseModel):
        type: str
        weight: str
    
    
    class Event(pydantic.BaseModel):
        event_type: str
        timestamp: datetime
    
    
    class Location(pydantic.BaseModel):
        city: str
        latitude: Optional[str] = None
        longitude: Optional[str] = None
    
    
    class ShippingContainer(pydantic.BaseModel):
        container_id: str
        status: str
        location: Location
        events: list[Event]
        cargo: Cargo | None = None
    
    
    for example in examples:
        cont = ShippingContainer.model_validate(example)
        print(cont)
    

    prints out

    container_id='ABC123' status='In Transit' location=Location(city='New York', latitude='40.7128', longitude='-74.0060') events=[Event(event_type='Departure', timestamp=datetime.datetime(2024, 8, 29, 8, 0, tzinfo=TzInfo(UTC))), Event(event_type='Customs Clearance', timestamp=datetime.datetime(2024, 8, 30, 12, 0, tzinfo=TzInfo(UTC)))] cargo=Cargo(type='Electronics', weight='500kg')
    
    container_id='DEF456' status='Delivered' location=Location(city='Los Angeles', latitude=None, longitude=None) events=[Event(event_type='Arrival', timestamp=datetime.datetime(2024, 8, 31, 14, 0, tzinfo=TzInfo(UTC)))] cargo=None
    

    and you could access those objects with e.g. cont.location.city or for ev in cont.events: print(ev.timestamp.year).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search