I have a .csv log file containing three json lines for each client. For example:
{"name":"John","phone":"08847","politic":"on","ville":"LA","isTest":"false","source":"t3_1"}
{"data":{"name":"John","phone":"+8847","city":"LA","source":"t3_1","cameF":"a1"},"token":"bd67a","isTest":false}
{"data":{"responseId":"R_2hs","city":"LA","cameF":"cpl_agency2","source":"t3_1"},"success":true,"ts":1721394844,"message":null}
I don’t understand how you can combine three separate lines into one and then read the json.
Starting reading the file:
import csv
import datetime
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/in/post.csv',
delimiter=',',
header=None,
skiprows=range(0, 42), # skip test row
names=['t1', 't2', 't3', 't4', 't5', 't6', 't7', 't8', 't9']
)
display(df)
t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | |
---|---|---|---|---|---|---|---|---|---|
0 | "name":"John" | "phone":"08847" | "politic":"on" | "ville":"LA" | "isTest":"false" | "source":"t3_1" | Nan | Nan | Nan |
1 | {"data":{"name":"John" | phone:"+08847" | city:"LA" | source:"t3_1" | cameF:"a1"} | token:"bd67a" | isTest:false} | NaN | NaN |
2 | {"data:{"responseId":"R_2hs" | city:"LA" | cameF:"a1" | source:"t3_1"} | success:true | ts:1721394844 | message:null | NaN | NaN |
…
… | … | … | … | … | … | … | … | … |
---|---|---|---|---|---|---|---|---|
5001 | "name":"Michael" | "phone":"09947" | "politic":"on" | "ville":"FE" | "isTest":"false" | "source":"t3_1" | Nan | Nan |
5002 | {"data":{"name":"Michael" | phone:"+09947" | city:"FE" | source:"t3_1" | cameF:"a1"} | token:"bd90d" | isTest:false} | NaN |
5003 | {"data:{"responseId":"R_3gn" | city:"FE" | cameF:"a1" | source:"t3_1"} | success:true | ts:1822404844 | message:null} | NaN |
I want to get a dataframe like this. Combine three lines into one:
name | phone | politic | ville | isTest | source | name | phone | city | source | cameF | token | rId | city | cameF | source | success | ts | message |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
John | 08847 | on | LA | false | t3_1 | John | +08847 | LA | t3_1 | a1 | bd67a | R_2hs | LA | a1 | t3_1 | true | 1721394844 | null |
2
Answers
I wasn’t sure about this, but assuming your csv file has one column that is effectively the three lines you showed at the beginning of your post repeating (with a new client every three rows), this code should work: