skip to Main Content

I have a dataset of twitter texts. Most of the tweets in this dataset are in Persian and some of them are in Arabic.
I want to find Arabic tweets.
Is there an API or a tool that can do it for me?
If I want to explain more, I want a language detection that classifies tweets in Persian and Arabic languages.
Thanks.

2

Answers


  1. you can try langdetect

    ! pip install langdetect
    from langdetect import detect 
    

    You can then create a function for the same like

    def detecting(x):
        y=detect(x)
        return y
    

    Then you can store the results in other column so you then get an idea of each tweet language

    df['detect']=df['tweet_language'].apply(detecting)
    

    Hope this helps!!!!

    Login or Signup to reply.
  2. There are several options that you can see in this post:

    https://stackoverflow.com/a/47106810/9204500

    If you are looking for Persian tweets, based on my experience, you will end up with some Dari, Pashto, Urdu, Arabic, Kurdish, and Azeri tweets. None of these tools recognize Persian clearly, specifically in the case of Dari, Azeri, and Kurdish tweets.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search