I am trying to get a data frame over the below list of tweets. However, none of the solutions online do help.
- When I try to save the searched results on a json file I get
‘dict’ object has no attribute ‘_json
def write_tweets(tweets, filename):
”’ Function that appends tweets to a file. ”’
using the below code:
def write_tweets(tweets, filename):
''' Function that appends tweets to a file. '''
with open(filename, 'a') as f:
for tweet in tweets:
json.dump(tweet._json, f)
f.write('n')
write_tweets(searched_tweets,"data.json")
-
trying to transform my results to a dataframe also fails:
DataSet[‘tweetID’] = [tweet.id for tweet in searched_tweets]
My full code is the below and returns the researched_results which is a list.
import tweepy
import pandas as pd
import json
df= pd.read_excel(dataNLP.xlsx")
IDs = df["TweetID"].tolist()
def load_api():
''' Function that loads the twitter API after authorizing
the user. '''
# ApI Keys
consumer_key = "--"
consumer_secret = "--"
access_token = "-----"
access_token_secret = "-"
#Pass our consumer key and consumer secret to Tweepy's user authentication handler
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
#Pass our access token and access secret to Tweepy's user authentication handler
auth.set_access_token(access_token, access_token_secret)
# load the twitter API via tweepy
return tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True,parser=tweepy.parsers.JSONParser())
#Connect
api = load_api()
#Creating a twitter API wrapper using tweepy
i = 0
#jump 188 as per tweeter api
step = 100
searched_tweets=[]
cant_find_tweets_for_those_ids = []
cant_find_tweets_for_those_ids_whole =[]
while i <= len(IDs):
for each_id in IDs[i:(i+step)]:
try:
new_tweets = api.api.statuses_lookup(IDs[i:(i+step)])
print( "found", len(new_tweets),"tweets")
searched_tweets.extend(new_tweets)
print( "added", len(searched_tweets),"in searched_tweets")
i= i + step +1
except Exception as e:
cant_find_tweets_for_those_ids.append(each_id)
cant_find_tweets_for_those_ids_whole.extend(cant_find_tweets_for_those_ids)
Example IDs :597576902212063232, 565586175864610817.
Expected dataframe result could be something with the following fields:
ID, text, user_location, hastags, followers count, friends count, re tweet count.
Could someone explain what I am doing wrong or how I can get a workable daframe from the searched_tweets list with json objects?
An element of the searched_tweets list:
{'truncated': False, 'in_reply_to_user_id': 297535251, 'place': None, 'retweet_count': 0, 'created_at': 'Mon Feb 23 20:28:36 +0000 2015', 'in_reply_to_screen_name': 'OutworldDOTA2', 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'favorited': False, 'contributors': None, 'is_quote_status': False, 'geo': None, 'id': 569957017655226369, 'in_reply_to_status_id_str': '569956825057120256', 'in_reply_to_status_id': 569956825057120256, 'coordinates': None, 'in_reply_to_user_id_str': '297535251', 'id_str': '569957017655226369', 'lang': 'en', 'user': {'description': 'Founder, Online Abuse Prevention Initiative. v (gaming account: @grandma_kj)', 'default_profile': False, 'profile_sidebar_border_color': '181A1E', 'name': 'needlessly obscenity-laced', 'time_zone': 'Pacific Time (US & Canada)', 'profile_banner_url': 'link', 'screen_name': 'randileeharper', 'favourites_count': 66157, 'translator_type': 'regular', 'contributors_enabled': False, 'created_at': 'Sat Feb 23 07:27:19 +0000 2008', 'protected': False, 'notifications': False, 'profile_background_color': '1A1B1F', 'following': False, 'id_str': '13857342', 'location': 'Portland, OR', 'entities': {'description': {'urls': [{'url': 'link, 'expanded_url': 'link', 'indices': [45, 68], 'display_url': 'patreon.com/freebsdgirl'}]}, 'url': {'urls': [{'url': 'link': 'link', 'indices': [0, 23], 'display_url': 'blog.randi.io'}]}}, 'id': 13857342, 'utc_offset': -28800, 'has_extended_profile': True, 'profile_sidebar_fill_color': '252429', 'profile_image_url': 'link', 'friends_count': 787, 'verified': True, 'link': 'link', 'profile_background_image_url': 'link', 'profile_link_color': '2FC2EF', 'profile_text_color': '666666', 'is_translator': False, 'lang': 'en', 'geo_enabled': True, 'statuses_count': 123525, 'profile_image_url_link', 'default_profile_image': False, 'url': 'link', 'listed_count': 901, 'followers_count': 20638, 'follow_request_sent': False, 'profile_use_background_image': True, 'profile_background_tile': False, 'is_translation_enabled': False}, 'text': '@OutworldDOTA2 i'm very entertained that all it takes is "155 IQ" for me to know precisely who is being discussed.', 'retweeted': False, 'entities': {'hashtags': [], 'urls': [], 'symbols': [], 'user_mentions': [{'id_str': '297535251', 'screen_name': 'OutworldDOTA2', 'name': 'Follow Your Leader', 'indices': [0, 14], 'id': 297535251}]}, 'favorite_count': 0}
2
Answers
Thank you for taking the time. Both solutions give the same error. 'list' object has no attribute '_json'. Searched_tweets is a list of tweets. i want to be able to save it as a json format and then transform it to a dataframe. the only that seems to work isfrom pandas.io.json import
I am not sure thogh how I can trasform it and save it as a json file.
Your code:
tweet._json returns a dictionary. Hence the error message. Personally, that’s the way I prefer to do it.
If you just want a list of tweets then declare a list and add each tweet dictionary to it.
You can then access the tweet attributes by list index, and dictionary key.
if you want to save the list to file you can use pickle.
and when you want to load the files, you can load it straight back into a list of dictionaries.
Not sure this was exactly what you were looking for but I hope you can use something.