I have a dataframe that is coming from an api, the string version looks like this:
0
2023-04-17 4.82
2023-04-18 4.82
2023-04-19 4.82
2023-04-20 4.82
2023-04-21 4.81
when I call df.to_json(orient = ‘records’) it looks like this (no good at all):
[{"0":4.82},{"0":4.82},{"0":4.82},{"0":4.82},{"0":4.81}]
What I’d really like is for it to look like this:
[{Date:"2023-04-17", "Rate":4.82},{Date:"2023-04-18", "Rate":4.82},{Date:"2023-04-19", "Rate":4.82},{Date:"2023-04-20", "Rate":4.82},{Date:"2023-04-21", "Rate":4.81}]
There could potentially be many rows of data so I need the conversion to perform well
2
Answers
You can achieve the desired output by first converting the dataframe to a list of dictionaries and then using the json module to convert it to JSON format with the desired structure.
Here’s an example code snippet that should work:
This code first converts the dataframe to a list of dictionaries using the to_dict method with the orient parameter set to ‘records’. It then iterates over each record in the list and formats the date and rate values as required by creating a new dictionary with "Date" and "Rate" keys. Finally, it uses the json.dumps method to convert the formatted data to a JSON string.
This approach should perform well even with large dataframes since it avoids using slow loops and leverages the built-in json module for efficient serialization.
Do the conversion while parsing the API response. This allows you to take advantage of any vectorization which
pandas
provides, particularly if you end up doing some more interesting things with the data.Explanation of arguments to
pd.read_csv()
:io.StringIO(data)
: data from API response, should be obvioussep=' '
: separator between cells, should be obviousnames=['Date', 'Rate']
: column headersdtype={'Date': str, 'Rate': float}
: types to cast each column’s values toskiprows=[0]
: which rows to omit, this omits the first row since it’s just a0
and we don’t want it in the resultEmpty rows are skipped by default.