I have a data frame which has 2 columns Id and Loan_Info. Loan_Info column stores data into JSON format which I need to convert it into data frame where every key would be new column and corresponding key value would represent the data for that column. My data frame looks like this.
Id | Loan_Info |
---|---|
1 | {‘111222-aaabbb’: [{‘loan_amt’: ‘100000’, ‘credit_amt’: ‘291000’, ‘loan_date’: ’01-11-2019′, ‘Bank_Name’: ‘HDFC’}, {‘loan_amt’: ‘200000’, ‘credit_amt’: ‘5850000’, ‘loan_date’: ’02-11-2019′, ‘Bank_Name’: ‘ICICI’}], ‘111333-cccddd’: [{‘loan_amt’: ‘14000’, ‘credit_amt’: ‘3080000’, ‘loan_date’: ’26-04-2023′, ‘Bank_Name’: ‘HDFC’}]} |
2 | {‘2226473-andis’: [{‘loan_amt’: ‘14000’, ‘credit_amt’: ‘3080000’, ‘loan_date’: ’26-04-2023′, ‘Bank_Name’: ‘IDFC’}]} |
So my final output data should appear like this as shown below:
Id | id_test | loan_amt | credit_amt | loan_date | Bank_Name |
---|---|---|---|---|---|
1 | 111222-aaabbb | 100000 | 291000 | 01-11-2019 | HDFC |
1 | 111222-aaabbb | 200000 | 5850000 | 02-11-2019 | ICICI |
1 | 111333-cccddd | 14000 | 3080000 | 26-04-2023 | HDFC |
2 | 2226473-andis | 14000 | 3080000 | 26-04-2023 | IDFC |
If JSON format column is in presented in dictionary type then I am able to create above format data however this only works with Dictionary type data and not when data frame is given which is given below
data = {'111222-aaabbb': [
{'loan_amt': '100000', 'credit_amt': '291000', 'loan_date': '01-11-2019', 'Bank_Name': 'HDFC'},
{'loan_amt': '200000', 'credit_amt': '5850000', 'loan_date': '02-11-2019', 'Bank_Name': 'ICICI'}],
'111333-cccddd': [{'loan_amt': '14000', 'credit_amt': '3080000', 'loan_date': '26-04-2023', 'Bank_Name': 'HDFC'}],
'2226473-andis': [{'loan_amt': '14000', 'credit_amt': '3080000', 'loan_date': '26-04-2023', 'Bank_Name': 'IDFC'}]}
# Convert JSON to DataFrame
dfs = []
for key in data.keys():
df = pd.json_normalize(data[key])
df['entity'] = key
dfs.append(df)
# Concatenate DataFrames
result = pd.concat(dfs, ignore_index=True)
result.head()
However the above code does not work in a data frame. I have 10000+ observation data frame and need to achieve above result. Kindly help me with this.
Any help is appreciated.
3
Answers
If I understood correctly you are trying to transform a DataFrame with JSON data in the "Loan_Info" column into a new DataFrame where each key-value pair in the JSON is extracted into separate columns. To achieve this, you can follow a similar approach as the dictionary-based one, but you need to apply this transformation to each row of the original DataFrame.
You can do this using the pandas library:
Replace the sample data with your actual DataFrame, and this code should extract the necessary information from the "Loan_Info" column and create the desired DataFrame with separate columns for each key-value pair within the JSON structure.
Use list with dict comprehension for list of dictionaries with add
id
andid_test
keys and last pass toDataFrame
constructor:EDIT: You can convert strings values to dictionaries by
ast.literal_eval
:Performance in repeated sample data with
concat
andjson_normalize
is slow:EDIT1: If there is multiple columns in DataFrame use:
You can use a variation of your original approach:
If you have strings:
Output: