I’m using text classification to classify dialects. First I need a large manually annotated tweets, and I have read a research paper that says:
We have collected tweets that were published during June 2015. Arabic
linguists manually annotated a small part of these tweets, so we got
51,589 tweets with correct dialectal labels. These tweets were
manually found in Twitter and annotated by the linguists.
So this researcher was able to extract those tweets, I wanted to contact him but their emails weren’t valid. He says those tweets were published during June 2015. How can I extract those tweets?
2
Answers
I would have to assume that the researcher did that in realtime during June 2015.
Today, the only way to do that would be to use the Full Archive Search API (a premium, paid offering from Twitter) to search for those Tweets. In terms of the annotations, those would have been part of their research; Twitter does not annotate Tweets with dialectal labels.
as I know researchers didn’t have permission to publish tweets that they collect with twitter APIs.