skip to Main Content

I am bit rusty in pandas to json transforms.
I have a pandas data frame like this for a fictional library:

userId visitId readingTime Books BookType
u1 1 300 book1,book2,book3 Fiction
u2 1 400 book4,book5 Horror
u2 2 250 book6 Romance

Need to create a json that is like:

{
  "visitSummary": {
    "u1": [
      {
        "readingTime": 300,
        "Books": [
          "book1",
          "book2",
          "book3"
        ],
        "BookType": "Fiction"
      }
    ],
    "u2": [
      {
        "readingTime": 400,
        "Books": [
          "book4",
          "book5"
        ],
        "BookType": "Horror"
      },
      {
        "readingTime": 250,
        "Books": [
          "book6"
        ],
        "BookType": "Romance"
      }
    ]
  }
}

I was thinking to do it using nested loops and processing each row. I am hoping, there is a simpler pythonic way to do it.

Using Python 3.10 and pandas 2.1.4

4

Answers


  1. You don’t need nested loops, just one loop would do. Here’s one way to do this:

    import pandas as pd
    import json
    
    # Assuming this is your DataFrame
    data = {
        'userId': ['u1', 'u2', 'u2'],
        'visitId': [1, 1, 2],
        'readingTime': [300, 400, 250],
        'Books': ['book1,book2,book3', 'book4,book5', 'book6'],
        'BookType': ['Fiction', 'Horror', 'Romance']
    }
    
    df = pd.DataFrame(data)
    
    json_data = {}
    for _, row in df.iterrows():
        user_id = row['userId']
        if user_id not in json_data:
            json_data[user_id] = []
    
        books = row['Books'].split(',')
        visit_data = {
            'readingTime': row['readingTime'],
            'Books': books,
            'BookType': row['BookType']
        }
        json_data[user_id].append(visit_data)
    
    # Wrap the data with your main key
    final_json = {'visitSummary': json_data}
    
    with open('output.json', 'w') as json_file:
        json.dump(final_json, json_file, indent=2)
    

    Output:

    {
      "visitSummary": {
        "u1": [
          {
            "readingTime": 300,
            "Books": [
              "book1",
              "book2",
              "book3"
            ],
            "BookType": "Fiction"
          }
        ],
        "u2": [
          {
            "readingTime": 400,
            "Books": [
              "book4",
              "book5"
            ],
            "BookType": "Horror"
          },
          {
            "readingTime": 250,
            "Books": [
              "book6"
            ],
            "BookType": "Romance"
          }
        ]
      }
    }
    
    Login or Signup to reply.
  2. Try:

    import json
    
    # if values in Books are strings:
    df["Books"] = df["Books"].str.split(",")
    
    out = {}
    for user_id, group in df.groupby("userId"):
        out[user_id] = []
        for reading_time, books, book_type in zip(
            group["readingTime"], group["Books"], group["BookType"]
        ):
            out[user_id].append(
                {"readingTime": reading_time, "Books": books, "BookType": book_type}
            )
    
    out = {"visitSummary": out}
    print(out)
    
    
    with open("data.json", "w") as f_out:
        json.dump(out, f_out, indent=4)
    

    Prints:

    {
        "visitSummary": {
            "u1": [
                {
                    "readingTime": 300,
                    "Books": ["book1", "book2", "book3"],
                    "BookType": "Fiction",
                }
            ],
            "u2": [
                {"readingTime": 400, "Books": ["book4", "book5"], "BookType": "Horror"},
                {"readingTime": 250, "Books": ["book6"], "BookType": "Romance"},
            ],
        }
    }
    

    and saves data.json file.

    Login or Signup to reply.
  3. With split, groupby & to_dict :

    out = {
        "visitSummary":
        (df.assign(Books=df["Books"].str.split(","))
           .groupby("userId").apply(lambda g: g.drop(
             columns=["userId", "visitId"]).to_dict("records")).to_dict())
    }
    

    Output :

    import json; print(json.dumps(out, indent=4))
    {
        "visitSummary": {
            "u1": [
                {
                    "readingTime": 300,
                    "Books": [
                        "book1",
                        "book2",
                        "book3"
                    ],
                    "BookType": "Fiction"
                }
            ],
            "u2": [
                {
                    "readingTime": 400,
                    "Books": [
                        "book4",
                        "book5"
                    ],
                    "BookType": "Horror"
                },
                {
                    "readingTime": 250,
                    "Books": [
                        "book6"
                    ],
                    "BookType": "Romance"
                }
            ]
        }
    }
    
    Login or Signup to reply.
  4. Split the Books column into a list of strings using Series.str.split(). You can then use .groupby(), .apply() and .to_json() to get the desired output:

    df["Books"] = df["Books"].str.split(",")
    groupby_func = lambda x: json.loads(x.drop(["userId", "visitId"], axis=1).to_json(orient="records", index=False)
    groupby_result = df.groupby("userId").apply(groupby_func)).to_json(orient="index")
    {"visitSummary": json.loads(groupby_result)}
    

    This outputs:

    {
        "visitSummary": {
            "u1": [
                {
                    "readingTime": "300",
                    "Books": [
                        "book1",
                        "book2",
                        "book3"
                    ],
                    "BookType": "Fiction"
                }
            ],
            "u2": [
                {
                    "readingTime": "400",
                    "Books": [
                        "book4",
                        "book5"
                    ],
                    "BookType": "Horror"
                },
                {
                    "readingTime": "250",
                    "Books": [
                        "book6"
                    ],
                    "BookType": "Romance"
                }
            ]
        }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search