skip to Main Content

I want to load a json mined from twitter api into python. Attached is sample of json object:

{"created_at":"Mon Apr 22 18:17:09 +0000 2019","id":1120391103813910529,"id_str":"1120391103813910529","text":"On peut dire que la base de cette 8e saison est en place ud83dude4c #GOTS8E2","source":"u003ca href="http://twitter.com/download/iphone" rel="nofollow"u003eTwitter for iPhoneu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":243071138,"id_str":"243071138","name":"Mr B","screen_name":"skeyos","location":"Namur","url":null,"description":null,"translator_type":"none","protected":false,"verified":false,"followers_count":197,"friends_count":1811,"listed_count":6,"favourites_count":7826,"statuses_count":8044,"created_at":"Wed Jan 26 06:49:05 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/493833348167770112/aGLGemZ5_normal.jpeg","profile_image_url_https":"https://pbs.twimg.com/profile_images/493833348167770112/aGLGemZ5_normal.jpeg","profile_banner_url":"https://pbs.twimg.com/profile_banners/243071138/1406574068","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"GOTS8E2","indices":[59,67]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1555957029666"}

{"created_at":"Mon Apr 22 18:17:14 +0000 2019","id":1120391124722565123,"id_str":"1120391124722565123","text":"...

I am trying the following code:

with open('tweets.json') as tweet_data:
    json_data = json.load(tweet_data)

But get the following error:

JSONDecodeError: Extra data: line 3 column 1 (char 2149)

Unfortunately it is not possible for me to edit the json object too much, as it is really big. I need to figure out how to read this into Python. Any help would be greatly appreciated!

Edit: It works with the following code:

dat=list()
with open ('data_tweets_E2.json', 'r') as f:
    for l in f.readlines():
        if not l.strip (): # skip empty lines
            continue

        json_data = json.loads (l)
        dat.append(json_data)

3

Answers


  1. Every line contains a new object, so try parsing them line by line.

    import json
    
    with open ('tweets.json', 'r') as f:
        for l in f.readlines():
            if not l.strip (): # skip empty lines
                continue
    
            json_data = json.loads (l)
            print (json_data)
    
    Login or Signup to reply.
  2. Each line contains a separate json object, parse and store them into a list:

    with open('tweets.json', 'r') as tweet_data:
        values = [json.loads(line) for line in tweet_data.readlines() 
                  if not line.strip()]
    
    Login or Signup to reply.
  3. Here is the code.You need to install Pandas first of course. If the solution helped you please mark this answer with the green check.

    import json
    import pandas as pd
    
    with open('tweets.json') as json_file:
        data_list = json.load(json_file)
    
    tweet_data_frame = pd.DataFrame.from_dict(data_list)
    print(tweet_data_frame)
    print(data_list)
    

    So as you can see print(data_list) prints out a list and print(tweet_data_frame) prints out dataframe.

    If you want to see the types of these variables just use type() print(type(data_list))

    Important: What I tried to tell you is that your JSON file has bad format and a lot of mistakes. If you have more JSON objects they need to be in array [{"example":"value"},{"example":"value"}] . Your JSON file has errors. Try it with different JSON file.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search