skip to Main Content

The data is a long list of dictionaries from a JSON file. Each dictionary has the same keys but different values of multiple types, and sometimes these values are null. I need to know the type of each value so i can initialize the appropriate variables elsewhere.

An example of data would be like:

[{"Name": "null, "Age": 23, "Wage": 16.5},
{"Name": "jason", "Age": null, "Wage": 22.5},
{"Name": "blake", "Age": null, "Wage": 23.8},
{"Name": null, "Age": 26, "Wage": null}]

And im trying to get the resulting types of each which would be
<string, int, float>.

Since the JSON can often be 100,000+ different elements opposed to the 4 in the example, I was not sure if it makes sense to just do something like iterating until all types are determined or if there was a more efficient way. I am currently working in both python and c++.

2

Answers


  1. So to start with working with JSON in python you want to import the json library and you can convert a string to json this will automatically handle datatype conversions. If you are using something like requests library for requesting your data you can use the .json() method like demoed in the link below.

    result = json.loads('{"Name": null, "Age": 23, "Wage": 16.5}')
    print(result)
    
    # {'Name': None, 'Age': 23, 'Wage': 16.5}
    

    https://www.geeksforgeeks.org/response-json-python-requests/#

    As for determining data types in python you can use the built in method type(). You can find all the built in datatypes below.

    print(type('example'))
    
    # <class 'str'>
    

    https://www.w3schools.com/python/python_datatypes.asp

    Login or Signup to reply.
  2. I’m using pandas until it doesn’t work.

    data = [{"Name": None, "Age": 23, "Wage": 16.5},
            {"Name": "jason", "Age": None, "Wage": 22.5},
            {"Name": "blake", "Age": None, "Wage": 23.8},
            {"Name": None, "Age": 26, "Wage": None}]
    
    import pandas as pd
    
    df = pd.DataFrame(data)
    print(df.dtypes)
    
    > Name     object  
    > Age     float64  
    > Wage    float64  
    > dtype: object
    

    Or you can use polars, which will return

    [Utf8, Int64, Float64]
    

    100,000 rows should be able to be handled in these tools pretty easily.


    You probably want to read your file using the library’s read function, rather than read the JSON to python list of dicts. Try

    import polars as pl
    
    df = pl.read_json("your_file.json")
    
    print(df.dtypes)
    

    If you’re using newline delimited JSON then you can do a lazy read which should (?) avoid any memory concerns for large files.

    pl.scan_ndjson("your_file.json").dtypes
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search