How to remove @user, hashtag, and links from tweet text and put it into dataframe in python - Twitter API

AbbiKRK
March 2, 2021
156 views
0 votes
2 Answers

I’m a begginer at python and I’m trying to gather data from twitter using the API. I want to gather username, date, and the clean tweets without @username, hashtags and links and then put it into dataframe.

I find a way to achieve this by using : ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z t])|(w+://S+)"," ",tweet.text).split()) but when I implement it on my codes, it returns NameError: name 'tweet' is not defined

Here is my codes

tweets = tw.Cursor(api.search, q=keyword, lang="id", since=date).items()

raw_tweet = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z t])|(w+://S+)"," ",tweet.text).split())

data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet] for tweet in tweets]

dataFrame = pd.DataFrame(data=data_tweet, columns=['user', "date", "tweet"])

I know the problem is in the data_tweet, but I don’t know how to fix it. Please help me

Thank you.

Tags: python twitter

Answers

- IsmailHafeez
- March 2, 2021 at 10:43 am
- 0 votes
0
The problem is actually in the second line:
```
raw_tweet = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z t])|(w+://S+)"," ",tweet.text).split())
```
Here, you are using tweet.text. However, you have not defined what tweet is yet, only tweets. Also, from reading your third line where you actually define tweet:
```
for tweet in tweets
```
I’m assuming you want tweet to be the value you get while iterating through tweets.
So what you have to do is to run both lines through an iterator together, assuming my earlier hypothesis is correct.
So:
```
for tweet in tweets:
    raw_tweet = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z t])|(w+://S+)"," ",tweet.text).split())
    data_tweet = [[tweet.user.screen_name, tweet.created_at, raw_tweet]]
```
Login or Signup to reply.

- mpriya
- December 11, 2021 at 6:26 am
- 0 votes
0
You can also use reg-ex to remove any words the start with ‘@’ (usernames) or ‘http’ (links) in a pre-defined function and apply the function to the pandas data frame column
```
import re

def remove_usernames_links(tweet):
    tweet = re.sub('@[^s]+','',tweet)
    tweet = re.sub('http[^s]+','',tweet)
    return tweet
df['tweet'] = df['tweet'].apply(remove_usernames_links)
```
If you encounter, "expected string or byte-like object error", then just use
```
import re
    
    def remove_usernames_links(tweet):
        tweet = re.sub('@[^s]+','',str(tweet))
        tweet = re.sub('http[^s]+','',str(tweet))
        return tweet
    df['tweet'] = df['tweet'].apply(remove_usernames_links)
```
Credit: https://www.datasnips.com/59/remove-usernames-http-links-from-tweet-data/
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to remove @user, hashtag, and links from tweet text and put it into dataframe in python – Twitter API

Answers