skip to Main Content

So I have a situation where I need to convert an Excel sheet into an array of Json objects using Python3.

The Excel sheet looks like the following:

CARTS ITEMS
Cart A Lemons
Apples
Strawberries
Cart B Chocolate
Cake
Biscuits

etc etc…

Now, the expectation is that items can be added to each "Cart". So for example, someone will go in and add an additional row into Cart A where the cell in the CARTS column will remain empty, and the respective cell in the ITEMS column will contain the new item (i.e Banana). The row will shift down so it doesn’t overlap Cart B of course.

What would need to be done is to convert this sheet/table into an array of JSON objects that look like this:

[ 
    {
       "CARTS": "Cart A",

       "ITEMS": ["Lemons", "Apples", "Strawberries"]        
    },
    
    {
       "CARTS": "Cart B",

       "ITEMS": ["Chocolate", "Cakes", "Biscuits"]
    }

]  
        

Using Python3. What is the best approach/solution for this? Sadly I’m a beginner in Python at this point so not aware of all its functionality with excel etc.

I attempted a solution but it’s not working. Where I’m struggling is the empty cells and converting items into a JSON array etc

2

Answers


  1. There is probably a faster and more pythonic way but this one should do and is quite understandable:

    import pandas as pd
    
    df = pd.read_excel("sample.xlsx")
    
    sample = []
    for i, line in df.iterrows():
        if isinstance(line[0], str):  # to handle empty rows, read as float by pandas by default
            sample.append({"CARTS": line[0], "ITEMS": [line[1]]})
        else:
            sample[-1]["ITEMS"].append(line[1])
    print(sample)
    
    # [
    #     {"CARTS": "Cart A", "ITEMS": ["Lemons", "Apples", "Strawberries"]},
    #     {"CARTS": "Cart B", "ITEMS": ["Chocolate", "Cake", "Biscuits"]},
    # ]
    

    You need to install pandas library and openpyxl to be able to read excel files.

    pip install pandas openpyxl
    
    Login or Signup to reply.
  2. Using pandas groupby:

    import pandas as pd
    import pprint
    
    df = pd.read_excel(r"...")
    # Fill to group easily
    df.fillna(method="ffill", inplace=True)
    
    d = []
    
    # Iterate over the rows of the DataFrame
    for name, group in df.groupby("CARTS"):
        d.append({
            "CARTS": name,
            "ITEMS": list(group["ITEMS"])
        })
    
    # Print the output list in the desired format
    pprint.pprint(d, indent=2)
    

    Output:

    $ python3 parser.py
    >>>
    [ {'CARTS': 'Cart A', 'ITEMS': ['Lemons', 'Apples', 'Strawberries']},
      {'CARTS': 'Cart B', 'ITEMS': ['Chocolate', 'Cake', 'Biscuits']}]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search