skip to Main Content

I have a DataFrame with these columns: id, jan_x, jan_y, feb_x, feb_y, ..., dec_x, dec_y that I would like export as a json that is structured like this:

{
    "id1": [
        [jan_x, feb_x, ..., dec_x],
        [jan_y, feb_y, ..., dec_y]
    ],
    "id2": [
        [jan_x, feb_x, ..., dec_x],
        [jan_y, feb_y, ..., dec_y]
    ]
}

The initial keys e.g., id1 correspond to an id in column id of my dataframe. Without any custom parsing function is there a straight forward functional way of achieving this? I have tried dumping it as a json but the desired list structure isn’t captured.

Here is a sample data frame with just two months.

data = {'id': ['1', '2', '3', '4'],
        'jan_x': [1, 2, 3, 4],
        'jan_y': [5, 6, 7, 8],
        'feb_x': [9, 10, 11 12],
        'feb_y': [13 14, 15, 16]}
  
df = pd.DataFrame(data)

Sample Output:

{
    "1": [
        [1, 9],
        [5, 13]
    ],
    "2": [
        [2, 10],
        [6, 14]
    ],
    "3": [
        [3, 11],
        [7, 15]
    ],
    "4": [
        [4, 12],
        [8, 16]
    ]
}

2

Answers


  1. Update:

    Use to_dict('list') after aggregate by common suffix (‘x’, ‘y’, …):

    import json
    
    out = (df.rename(columns=lambda x: x.split('_')[-1])
             .set_index('id').groupby(level=0, axis=1, sort=False)
             .agg(lambda x: x.values.tolist()).T.to_dict('list'))
    jd = json.dumps(out, indent=4)
    

    Output:

    >>> out
    {'1': [[1, 9], [5, 13]],
     '2': [[2, 10], [6, 14]],
     '3': [[3, 11], [7, 15]],
     '4': [[4, 12], [8, 16]]}
    
    Login or Signup to reply.
  2. I am not sure what you mean by "any custom parsing function", but here is my attempt.

    The solution uses iterrows go iterate through rows. For each row, I take the columns that ends with _x using x_columns variable that I defined above, and the same for columns that ends with _y. row[x_columns] gives you the values as a pandas Series, therefore you need to convert it to a list before adding to fit your desired format. I append all those to a result variable and print the result variable at the end.

    import pandas as pd
    
    # Create a dataframe for demo
    data={"id":[15,12,13,22,32,11], "jan_x":[15,12,13,22,32,11],"jan_y":[12,21,23,22,56,11], "feb_x":[15,12,13,22,32,11],"feb_y":[12,21,23,22,56,11]}
    df=pd.DataFrame(data)
    
    # Get columns that ends with _x and _y
    x_columns = [col for col in df if col.endswith('_x')]
    y_columns = [col for col in df if col.endswith('_y')]
    
    # Iterate through rows
    results = []
    for index, row in df.iterrows():
      results.append({
        "id"+str(row['id']): [
            list(row[x_columns]),
            list(row[y_columns])
            ]
        })
        
    print(result)
    

    Here is my example input:

       id  jan_x  jan_y  feb_x  feb_y
    0  15     15     12     15     12
    1  12     12     21     12     21
    2  13     13     23     13     23
    3  22     22     22     22     22
    4  32     32     56     32     56
    5  11     11     11     11     11
    

    and here is the output:

    [{'id15': [[15, 15], [12, 12]]}, 
     {'id12': [[12, 12], [21, 21]]}, 
     {'id13': [[13, 13], [23, 23]]}, 
     {'id22': [[22, 22], [22, 22]]}, 
     {'id32': [[32, 32], [56, 56]]}, 
     {'id11': [[11, 11], [11, 11]]}
     ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search