I want to scrape tweets of only Urdu language for my project using python. I started researching how to scrape Twitter tweets. Three prominent ways I found so far.
- Tweepy Using Twitter API
- Twint Using Twitter API
- Selenium
However, I still can’t figure out how to specially target Urdu language tweets. I will be very highly grateful if anyone can provide any help, guidance, or lead in this regard. Thanks
2
Answers
After researching more on the topic: Two ways: One can use define the tweets language using Twint.Lang('tweet_language_code').
(Note: The above method didn`t worked for me. Thereby, I strived for the other methods)
Second, Using snscraper module. set the language in the query. (Working nicely)
The above snippet will give you 50K tweets in English.
*Note: To access tweets older than 1 week, you need Twitter API’s Academic Access, general API will only fetch you the past 1 week of data.