skip to Main Content

I’m very new to Python and currently undertaking a personal project using downloaded JSON files from Facebook, all from one chat on messenger. I keep getting an error saying df is not defined, despite me defining df.

I have multiple JSON files that I am trying to read into one dataframe. I created a loop so I could do this, with my defining the df within that loop. When I then call df to see it, it says its not defined. My code is below:

import pandas as pd
import json, glob, os
import numpy as np

file_path = "work/Desktop/fb data/jsonmssg/message_1.json"

file_dir = "work/Desktop/fb data/jsonmssg"

json_pattern = os.path.join(file_dir, '*.json')

file_list = glob.glob(json_pattern)

dfs = []

for f in file_list:
    with open(file) as file:
        chat_history = json.loads(file.read()) 
        
        json_data = pd.json_normalize(chat_history['messages'])
        
        dfs.append(json_data)
        
        df = pd.concat(dfs)

       
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-74-00cf07b74dcd> in <module>
----> 1 df

NameError: name 'df' is not defined

Does anyone know how I can fix this?

I tried creating a loop that would look through all JSON files in the same directory, expecting it would concatenate them into one dataframe. It didnt work.

2

Answers


  1. First of all, when trying to get a list of path names to files that are in different directories, including directories inside directories, you should use the recursive flag. I’m unsure if that is your case, though.

    glob.glob('mypath/**/*.json', recursive=True)
    

    Secondly, because of the way you define your path, the pathname itself that you put into glob.glob() could be invalid. That’s because different operating systems have different path name conventions, which could affect os.path.join(). For example, on my OS os.path.join('work/Desktop/fb data/jsonmssg', '*.json') results in 'work/Desktop/fb data/jsonmssg\*.json'. Note the incompatible '/' and '\'.

    You could relay on os.path.join() for building proper pathname for your OS. For example

    os.path.join('work', 'Desktop', 'fb data', 'jsonmssg', '*.json')
    

    Before proceeding, be sure to check whether the path is correct with os.path.exists(json_pattern).

    Login or Signup to reply.
  2. First of all, I’m not sure glob is your best bet here. At the very least, it’s not what I would have done… you only need glob to "**" directories.

    How about:

    import os
    import fnmatch
    import json
    import pandas as pd
    
    file_dir = "work/Desktop/fb data/jsonmssg"
    
    json_file_list = [direntry.name for direntry in os.scandir("file_dir") if direntry.stat().size >0 and fnmatch.fnmatch(direntry.name, "*.json")]
    for json_filename in json_file_list:
      deserialized = json.load(json_filename)
      if deserialized: // Will be null if json is invalid, I think
        df = pd.read_json(json_filename)
        df.to_string() // Just to prove it worked :)
    

    Hope that helps simplify your life 🙂

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search