Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How do you convert this Excel table/sheet into an array of JSON objects using Python3?

ALXD
July 12, 2023
157 views
0 votes
2 Answers

So I have a situation where I need to convert an Excel sheet into an array of Json objects using Python3.

The Excel sheet looks like the following:

CARTS	ITEMS
Cart A	Lemons
	Apples
	Strawberries
Cart B	Chocolate
	Cake
	Biscuits

etc etc…

Now, the expectation is that items can be added to each "Cart". So for example, someone will go in and add an additional row into Cart A where the cell in the CARTS column will remain empty, and the respective cell in the ITEMS column will contain the new item (i.e Banana). The row will shift down so it doesn’t overlap Cart B of course.

What would need to be done is to convert this sheet/table into an array of JSON objects that look like this:

[ 
    {
       "CARTS": "Cart A",

       "ITEMS": ["Lemons", "Apples", "Strawberries"]        
    },
    
    {
       "CARTS": "Cart B",

       "ITEMS": ["Chocolate", "Cakes", "Biscuits"]
    }

]

Using Python3. What is the best approach/solution for this? Sadly I’m a beginner in Python at this point so not aware of all its functionality with excel etc.

I attempted a solution but it’s not working. Where I’m struggling is the empty cells and converting items into a JSON array etc

Answers

There is probably a faster and more pythonic way but this one should do and is quite understandable:

import pandas as pd

df = pd.read_excel("sample.xlsx")

sample = []
for i, line in df.iterrows():
    if isinstance(line[0], str):  # to handle empty rows, read as float by pandas by default
        sample.append({"CARTS": line[0], "ITEMS": [line[1]]})
    else:
        sample[-1]["ITEMS"].append(line[1])
print(sample)

# [
#     {"CARTS": "Cart A", "ITEMS": ["Lemons", "Apples", "Strawberries"]},
#     {"CARTS": "Cart B", "ITEMS": ["Chocolate", "Cake", "Biscuits"]},
# ]

You need to install pandas library and openpyxl to be able to read excel files.

pip install pandas openpyxl

Using pandas groupby:

import pandas as pd
import pprint

df = pd.read_excel(r"...")
# Fill to group easily
df.fillna(method="ffill", inplace=True)

d = []

# Iterate over the rows of the DataFrame
for name, group in df.groupby("CARTS"):
    d.append({
        "CARTS": name,
        "ITEMS": list(group["ITEMS"])
    })

# Print the output list in the desired format
pprint.pprint(d, indent=2)

Output:

$ python3 parser.py
>>>
[ {'CARTS': 'Cart A', 'ITEMS': ['Lemons', 'Apples', 'Strawberries']},
  {'CARTS': 'Cart B', 'ITEMS': ['Chocolate', 'Cake', 'Biscuits']}]

Please signup or login to give your own answer.

Click here to cancel reply.