skip to Main Content

I want to scrape tweets of only Urdu language for my project using python. I started researching how to scrape Twitter tweets. Three prominent ways I found so far.

  1. Tweepy Using Twitter API
  2. Twint Using Twitter API
  3. Selenium

However, I still can’t figure out how to specially target Urdu language tweets. I will be very highly grateful if anyone can provide any help, guidance, or lead in this regard. Thanks

2

Answers


  1. Chosen as BEST ANSWER

    After researching more on the topic: Two ways: One can use define the tweets language using Twint.Lang('tweet_language_code').

    import twint
    c = twint.Config()
    c.Username = "elonmusk"
    c.Limit = 100
    c.Store_csv = True
    c.Output = "none3.csv"
    c.Lang = "en" # en code for english
    twint.run.Search(c)
    

    (Note: The above method didn`t worked for me. Thereby, I strived for the other methods)

    Second, Using snscraper module. set the language in the query. (Working nicely)

    import snscrape.modules.twitter as sntwitter
    query = 'lang:ur' #ur is code for urdu
    #limit = 10
    urduTweets = sntwitter.TwitterSearchScraper(query).get_items()
    

  2. for tweet in tweepy.Cursor(api.search_tweets, q=keyword, lang='en', count=450, since_id='2021-01-01').items(50000):
    

    The above snippet will give you 50K tweets in English.

    *Note: To access tweets older than 1 week, you need Twitter API’s Academic Access, general API will only fetch you the past 1 week of data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search