skip to Main Content

I would like to extract any rows that contain an author.description with the keyword "doctor". I think something like .iloc could work for this, but am unsure how I would select this particular column?
Any help is appreciated

Note: I am using Twitter API V2, If anyone know any hacks for this that avoid opening the file and removing columns let me know, ive attempted the following within the query_param..
-bio:doctor and -bio_contains:doctor and they do not work

enter image description here

import requests
import expansions
import os
import json
import pandas as pd
import csv
import sys
import time

bearer_token = "bearer token"

search_url = "https://api.twitter.com/2/tweets/search/all"

query_params = {'query': 'vaccine -is:retweet -is:verified -baby -lotion -shampoo lang:en has:geo place_country:US',
                'tweet.fields':'created_at,lang,text,geo,author_id,id,public_metrics,referenced_tweets',
                'expansions':'geo.place_id,author_id', 
                'place.fields':'contained_within,country,country_code,full_name,geo,id,name,place_type',
                'user.fields':'description,username,id',
                'start_time':'2021-01-20T00:00:01.000Z',
                'end_time':'2021-02-17T23:30:00.000Z',
                'max_results':'10'}


def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def connect_to_endpoint(url, headers, params):
    response = requests.request("GET", search_url, headers=headers, params=params)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()


def main():
    headers = create_headers(bearer_token)
    json_response = connect_to_endpoint(search_url, headers, query_params)
    json_response = expansions.flatten(json_response) 
    df = pd.json_normalize(json_response['data'])
    df.to_csv("myfile.csv", encoding="utf-8-sig")


if __name__ == "__main__":
    main()

2

Answers


  1. I think something like this should be what you are looking for?

    import pandas as pd
    
    
    my_data = pd.DataFrame(
        {'geo.id': ['lkajsdf', 'alksjdf', 'assssddf'], 
         'author.description': ['Hey, I am a doctor', 'I am also a doctor', 'Me? I am a lawyer']})
    
    drop = my_data['author.description'].str.contains('doctor|hospital')
    result = my_data[-drop]
    result
    

    Result

    geo.id author.description
    0 assssddf Me? I am a lawyer
    Login or Signup to reply.
  2. Since you use pandas to create a dataframe (df) of normalized json data, you can do something like:

    authorDescptionContainingDoctor = df[df['author.description'].str.contains('doctor')]
    

    This stores every row that’s column ‘author.description’ contains the keyword ‘doctor’ into it’s own df, so it does not modify the original dataset but you can separately filter or further analyze the new df.

    For further explanation of what’s going on above:
    https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search