I would like to extract any rows that contain an author.description with the keyword "doctor". I think something like .iloc could work for this, but am unsure how I would select this particular column?
Any help is appreciated
Note: I am using Twitter API V2, If anyone know any hacks for this that avoid opening the file and removing columns let me know, ive attempted the following within the query_param..
-bio:doctor and -bio_contains:doctor and they do not work
import requests
import expansions
import os
import json
import pandas as pd
import csv
import sys
import time
bearer_token = "bearer token"
search_url = "https://api.twitter.com/2/tweets/search/all"
query_params = {'query': 'vaccine -is:retweet -is:verified -baby -lotion -shampoo lang:en has:geo place_country:US',
'tweet.fields':'created_at,lang,text,geo,author_id,id,public_metrics,referenced_tweets',
'expansions':'geo.place_id,author_id',
'place.fields':'contained_within,country,country_code,full_name,geo,id,name,place_type',
'user.fields':'description,username,id',
'start_time':'2021-01-20T00:00:01.000Z',
'end_time':'2021-02-17T23:30:00.000Z',
'max_results':'10'}
def create_headers(bearer_token):
headers = {"Authorization": "Bearer {}".format(bearer_token)}
return headers
def connect_to_endpoint(url, headers, params):
response = requests.request("GET", search_url, headers=headers, params=params)
if response.status_code != 200:
raise Exception(response.status_code, response.text)
return response.json()
def main():
headers = create_headers(bearer_token)
json_response = connect_to_endpoint(search_url, headers, query_params)
json_response = expansions.flatten(json_response)
df = pd.json_normalize(json_response['data'])
df.to_csv("myfile.csv", encoding="utf-8-sig")
if __name__ == "__main__":
main()
2
Answers
I think something like this should be what you are looking for?
Result
Since you use pandas to create a dataframe (df) of normalized json data, you can do something like:
This stores every row that’s column ‘author.description’ contains the keyword ‘doctor’ into it’s own df, so it does not modify the original dataset but you can separately filter or further analyze the new df.
For further explanation of what’s going on above:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html