skip to Main Content

I have a spreadsheet like the following:
enter image description here

As you can see, there are multiple "tags" columns like this: "tags_0", "tags_1", "tags_2".
And they can be more.

I’m trying to find all the "tags", and put them inside a list using panda’s data frame. And eventually, put them inside an array of "tags" inside a json file.

I thought of using regex, but I can’t find a way to apply it.

This is the function I’m using to output the json file. I added the tags array for reference:

def convert_products():
    read_exc = pd.read_excel('./data/products.xlsx')
    df = pd.DataFrame(read_exc)
    all_data = []

    for i in range(len(df)):
        js = {
            "sku": df['sku'][i],
            "brand": df['brand'][i],
            "tags": [?]
        }

        all_data.append(js)

    json_object = json.dumps(all_data, ensure_ascii=False, indent=2)

    with open("./data/products.json", "w", encoding='utf-8') as outfile:
        outfile.write(json_object)

How can I achieve this?

Thanks

3

Answers


  1. First You can get all the columns as a list

    list(df.columns.values)
    

    Now you can search for all columns names which contains tags_ inside this list, once you get all the columns names which is for tags, you can loop through this list and retrieve specific tag value for specific row and put inside a list
    And can pass into json object.

    For each row in dataframe:
        tagList =[]
        for tagColumn in tagColumnList:
            tagList.append(df[tagColumn][i])
        
       .... Your code for creating json object...
     Pass this tagList for tags key in json object
            
    
    Login or Signup to reply.
  2. You can achieve that in a much easier way by doing something like this…

    df = pd.read_excel('your_file.xlsx')
    
    tags_columns = [col for col in df.columns if col.startswith("tags_")]
    
    df["tags"] = df[tags_columns].values.tolist()
    
    df[["sku","brand","tags"]].to_json("test.json",orient="records")
    

    You can try other json orientation if you want: ["index","columns","split","records","values","table"]. Check them in pandas documentation

    Login or Signup to reply.
  3. You are probably looking for filter:

    out = pd.concat([df[['sku', 'brand']], 
                     df.filter(regex='^tags_').agg(list, axis=1).rename('tags')],
                    axis=1).to_json(orient='records', indent=2)
    print(out)
    
    # Output
    [
      {
        "sku":"ADX112",
        "brand":"ADX",
        "tags":[
          "art",
          "frame",
          "painting"
        ]
      }
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search