Given page A and page B in Twitter, I would like to find all users that follow both pages A and B.
Twitter does provide a method to find the followers: GET followers/ids
However, it returns not more than 5000 per request and you can send only as many as 15 per 15 minutes, averaging the speed at 5000 users per minute, which clearly won’t work for larger account with millions of followers.
Does anyone know of a better way to get such data, preferably using the dev API? I mean technically I can try to emulate browser scrolling, but it will be extremely slow, messy and chances are web client might be using the same API.
UPD:
Also, I think we can narrow down the amount of data we need to download. For example, the overlapping data will need to be filtered by age, gender or location, so if there’s a way to provide this info as parameters and therefore get less data — this would work just fine.
An example of such API: user.search method from VK. You can specify group_id (which is equivalent of Twitter account that is being followed) and search followers of this group, filtering by other parameters.
3
Answers
I don’t think that would be easy to achieve since the number of followers is limited by 64int which is 255,486,129,307. I don’t think that any API would be able to return that amount data without streaming it or batching. That is the reason why it comes in pieces and with limitations.
I would suggest using some streaming application ( e.x Kafka, Amazon Kinesis or Azure Event Hub ).
Twitter API support streaming ( Twitter API stream ) that means that you can request needed information stream in the producer ( App who gets data from source ) which will then send/stream data to a topic and from there you can take it by batch and display it.
Of course, there are two scenarios either you need to store followers in your database and update on changes or every time read the stream from the beginning which will lead to sensible delay.
I would suggest to save stream data in DB and update it on change. ( NoSQL would be a perfect solution for this )
My guess is that this is an intentional limit. Twitter isn’t really interested in letting you farm all their user data, and an interface like that would allow you to do so very quickly. Pulling tons and tons of follower data would be a heavy load on their servers, and you having all of it isn’t in their business interest unless you’re paying them quite a bit.
Your best bet without their help might be to get multiple API keys and pull from servers behind VPN’s, but they’ll probably figure you out eventually.
If you have a valid business reason that aids them to want so much data I would recommend contacting them and asking if you can have a direct JSON / API export for download. It’s probably a pretty heavy request to carry, though.
you are right finding common users between account with million of followers time consuming task
you can use pre-fetched user to check their connection, for instance you can see user A followers have connection with user B.
with this api call
other good thing that i found on web was http://tweepdiff.com which give some of Common between accounts but not all