skip to Main Content

I have a folder structure that looks like this:

1. data
  1.1. ABC
    1.1.1 monday_data
      monday.json
    1.1.2 tuesday_data
      tuesday.json
  1.2. YXZ
    1.2.1 wednesday_data
      wednesday.json
    1.2.2
      etc

I want to unpack all of these json files into a pandas dataframe in python.

I have spend alot of time trying to get this to work, but without success.

What would be the most efficient way to do this?

2

Answers


  1. You can use rglob from pathlib.Path to get the path of all files under a directory that end with a certain extension

    from pathlib import Path
    
    for path in Path('data').rglob('*.json'):
        print(path)
    

    Outputs

    directoryABCmonday_datamonday.json
    directoryABCtuesday_datatuesday.json
    directoryXYZwednesday_datawednesday.json
    

    Now you can simple read this data into a dataframe according to your requirements

    Login or Signup to reply.
  2. import os
    import glob
    import pandas as pd
    
    # set the path to the directory where the JSON files are located
    path = 'data/'
    
    # use glob to find all the JSON files in the directory + its subdirectories
    json_files = glob.glob(os.path.join(path, '**/*.json'), recursive=True)
    

    This is how you can get all paths to your JSON files.
    I am not sure how you want to load all of them in a dataframe.

    You can try something like this.

    # create an empty list to store the dataframes
    dfs = []
    
    # loop over the JSON files and read each file into a dataframe
    for file in json_files:
         df = pd.read_json(file)
         dfs.append(df)
    
    # concatenate the dataframes into a single dataframe
    df = pd.concat(dfs, ignore_index=True)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search