skip to Main Content

I have a .csv log file containing three json lines for each client. For example:

{"name":"John","phone":"08847","politic":"on","ville":"LA","isTest":"false","source":"t3_1"}
{"data":{"name":"John","phone":"+8847","city":"LA","source":"t3_1","cameF":"a1"},"token":"bd67a","isTest":false}
{"data":{"responseId":"R_2hs","city":"LA","cameF":"cpl_agency2","source":"t3_1"},"success":true,"ts":1721394844,"message":null}

I don’t understand how you can combine three separate lines into one and then read the json.

Starting reading the file:

import csv
import datetime
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/in/post.csv',
                          delimiter=',',
                          header=None,
                          skiprows=range(0, 42), # skip test row
                          names=['t1', 't2', 't3', 't4', 't5', 't6', 't7', 't8', 't9']
                 )

display(df)
t1 t2 t3 t4 t5 t6 t7 t8 t9
0 "name":"John" "phone":"08847" "politic":"on" "ville":"LA" "isTest":"false" "source":"t3_1" Nan Nan Nan
1 {"data":{"name":"John" phone:"+08847" city:"LA" source:"t3_1" cameF:"a1"} token:"bd67a" isTest:false} NaN NaN
2 {"data:{"responseId":"R_2hs" city:"LA" cameF:"a1" source:"t3_1"} success:true ts:1721394844 message:null NaN NaN

5001 "name":"Michael" "phone":"09947" "politic":"on" "ville":"FE" "isTest":"false" "source":"t3_1" Nan Nan
5002 {"data":{"name":"Michael" phone:"+09947" city:"FE" source:"t3_1" cameF:"a1"} token:"bd90d" isTest:false} NaN
5003 {"data:{"responseId":"R_3gn" city:"FE" cameF:"a1" source:"t3_1"} success:true ts:1822404844 message:null} NaN

I want to get a dataframe like this. Combine three lines into one:

name phone politic ville isTest source name phone city source cameF token rId city cameF source success ts message
John 08847 on LA false t3_1 John +08847 LA t3_1 a1 bd67a R_2hs LA a1 t3_1 true 1721394844 null

2

Answers


  1. import json
    
    with open('/content/drive/MyDrive/Colab Notebooks/in/post.csv') as f:
        data = dict()
        data.update(json.loads(next(f)))
        data.update(json.loads(next(f)))
        data.update(json.loads(next(f)))
        print(data)
    
    
    Login or Signup to reply.
  2. I wasn’t sure about this, but assuming your csv file has one column that is effectively the three lines you showed at the beginning of your post repeating (with a new client every three rows), this code should work:

    import json
    
    import pandas as pd
    from IPython.display import display
    
    df = pd.DataFrame(
        columns=[
            "name",
            "phone",
            "politic",
            "ville",
            "isTest",
            "source",
            "name",
            "phone",
            "city",
            "source",
            "cameF",
            "token",
            "isTest",
            "rId",
            "city",
            "cameF",
            "source",
            "success",
            "ts",
            "message",
        ]
    )
    
    with open("test.csv") as f:
        l = []
        count = 0
        for row in f.readlines():
            d = json.loads(row)
            for k, v in d.items():
                if k == "data":
                    for _, data_v in v.items():
                        l.append(data_v)
                else:
                    l.append(v)
            count += 1
            if count == 3:
                df = df._append(pd.Series(l, index=df.columns), ignore_index=True)
                l, count = [], 0
    
    display(df)
    
       name  phone politic ville isTest source  name  phone city source cameF  token isTest    rId city        cameF source success          ts message
    0  John  08847      on    LA  false   t3_1  John  +8847   LA   t3_1    a1  bd67a  False  R_2hs   LA  cpl_agency2   t3_1    True  1721394844    None
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search