Python: Removing columns from a CSV - Twitter API

Poppy
May 17, 2021
196 views
0 votes
2 Answers

I would like to extract any rows that contain an author.description with the keyword "doctor". I think something like .iloc could work for this, but am unsure how I would select this particular column?
Any help is appreciated

Note: I am using Twitter API V2, If anyone know any hacks for this that avoid opening the file and removing columns let me know, ive attempted the following within the query_param..
-bio:doctor and -bio_contains:doctor and they do not work

import requests
import expansions
import os
import json
import pandas as pd
import csv
import sys
import time

bearer_token = "bearer token"

search_url = "https://api.twitter.com/2/tweets/search/all"

query_params = {'query': 'vaccine -is:retweet -is:verified -baby -lotion -shampoo lang:en has:geo place_country:US',
                'tweet.fields':'created_at,lang,text,geo,author_id,id,public_metrics,referenced_tweets',
                'expansions':'geo.place_id,author_id', 
                'place.fields':'contained_within,country,country_code,full_name,geo,id,name,place_type',
                'user.fields':'description,username,id',
                'start_time':'2021-01-20T00:00:01.000Z',
                'end_time':'2021-02-17T23:30:00.000Z',
                'max_results':'10'}


def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def connect_to_endpoint(url, headers, params):
    response = requests.request("GET", search_url, headers=headers, params=params)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()


def main():
    headers = create_headers(bearer_token)
    json_response = connect_to_endpoint(search_url, headers, query_params)
    json_response = expansions.flatten(json_response) 
    df = pd.json_normalize(json_response['data'])
    df.to_csv("myfile.csv", encoding="utf-8-sig")


if __name__ == "__main__":
    main()

Answers

- TylerRosacker
- May 17, 2021 at 2:54 am
- 0 votes
0
I think something like this should be what you are looking for?
```
import pandas as pd


my_data = pd.DataFrame(
    {'geo.id': ['lkajsdf', 'alksjdf', 'assssddf'], 
     'author.description': ['Hey, I am a doctor', 'I am also a doctor', 'Me? I am a lawyer']})

drop = my_data['author.description'].str.contains('doctor|hospital')
result = my_data[-drop]
result
```
Result

geo.id author.description

0 assssddf Me? I am a lawyer
Login or Signup to reply.

- TristanZaleski
- May 17, 2021 at 3:37 am
- 0 votes
0
Since you use pandas to create a dataframe (df) of normalized json data, you can do something like:
```
authorDescptionContainingDoctor = df[df['author.description'].str.contains('doctor')]
```
This stores every row that’s column ‘author.description’ contains the keyword ‘doctor’ into it’s own df, so it does not modify the original dataset but you can separately filter or further analyze the new df.

For further explanation of what’s going on above:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Login or Signup to reply.

	geo.id	author.description
0	assssddf	Me? I am a lawyer

Please signup or login to give your own answer.

Click here to cancel reply.

Python: Removing columns from a CSV – Twitter API

Answers