I have a dataset of twitter texts. Most of the tweets in this dataset are in Persian and some of them are in Arabic.
I want to find Arabic tweets.
Is there an API or a tool that can do it for me?
If I want to explain more, I want a language detection that classifies tweets in Persian and Arabic languages.
Thanks.
Question posted in Twitter API
The official Twitter API documentation can be found here.
The official Twitter API documentation can be found here.
2
Answers
you can try langdetect
You can then create a function for the same like
Then you can store the results in other column so you then get an idea of each tweet language
Hope this helps!!!!
There are several options that you can see in this post:
https://stackoverflow.com/a/47106810/9204500
If you are looking for Persian tweets, based on my experience, you will end up with some Dari, Pashto, Urdu, Arabic, Kurdish, and Azeri tweets. None of these tools recognize Persian clearly, specifically in the case of Dari, Azeri, and Kurdish tweets.