skip to Main Content

I have a dataframe that is coming from an api, the string version looks like this:

0

2023-04-17 4.82

2023-04-18 4.82

2023-04-19 4.82

2023-04-20 4.82

2023-04-21 4.81

when I call df.to_json(orient = ‘records’) it looks like this (no good at all):

[{"0":4.82},{"0":4.82},{"0":4.82},{"0":4.82},{"0":4.81}]

What I’d really like is for it to look like this:

[{Date:"2023-04-17", "Rate":4.82},{Date:"2023-04-18", "Rate":4.82},{Date:"2023-04-19", "Rate":4.82},{Date:"2023-04-20", "Rate":4.82},{Date:"2023-04-21", "Rate":4.81}]

There could potentially be many rows of data so I need the conversion to perform well

2

Answers


  1. You can achieve the desired output by first converting the dataframe to a list of dictionaries and then using the json module to convert it to JSON format with the desired structure.

    Here’s an example code snippet that should work:

    import json
    
    # assuming your dataframe is named `df`
    data = df.to_dict(orient='records')
    
    # iterate over the records and format the date and rate values
    formatted_data = []
    for record in data:
        formatted_data.append({"Date": record["0"].split()[0], "Rate": float(record["0"].split()[1])})
    
    # convert the formatted data to JSON
    json_data = json.dumps(formatted_data)
    
    # print the resulting JSON string
    print(json_data)
    

    This code first converts the dataframe to a list of dictionaries using the to_dict method with the orient parameter set to ‘records’. It then iterates over each record in the list and formats the date and rate values as required by creating a new dictionary with "Date" and "Rate" keys. Finally, it uses the json.dumps method to convert the formatted data to a JSON string.

    This approach should perform well even with large dataframes since it avoids using slow loops and leverages the built-in json module for efficient serialization.

    Login or Signup to reply.
  2. Do the conversion while parsing the API response. This allows you to take advantage of any vectorization which pandas provides, particularly if you end up doing some more interesting things with the data.

    import io
    import pandas as pd
    data = '''0
    
    2023-04-17 4.82
    
    2023-04-18 4.82
    
    2023-04-19 4.82
    
    2023-04-20 4.82
    
    2023-04-21 4.81'''
    df = pd.read_csv(
        io.StringIO(data),
        sep=' ',
        names=['Date', 'Rate'],
        dtype={'Date': str, 'Rate': float},
        skiprows=[0]
    )
    df.to_json(orient='records')
    
    '[{"Date":"2023-04-17","Rate":4.82},{"Date":"2023-04-18","Rate":4.82},{"Date":"2023-04-19","Rate":4.82},{"Date":"2023-04-20","Rate":4.82},{"Date":"2023-04-21","Rate":4.81}]'
    

    Explanation of arguments to pd.read_csv():

    1. io.StringIO(data): data from API response, should be obvious
    2. sep=' ': separator between cells, should be obvious
    3. names=['Date', 'Rate']: column headers
    4. dtype={'Date': str, 'Rate': float}: types to cast each column’s values to
    5. skiprows=[0]: which rows to omit, this omits the first row since it’s just a 0 and we don’t want it in the result

    Empty rows are skipped by default.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search