auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
user_objs = []
name = "phungsuk wangdu"
id_strs = {}
page_no = 0
try:
for page in tweepy.Cursor(api.search_users, name).pages(3):
dup_count = 0
print("******* Page", str(page_no))
print("Length of page", len(page))
user_objs.extend(page)
for user_obj in page:
id_str = user_obj._json['id_str']
if id_str in id_strs:
# print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
dup_count += 1
else:
# print(id_str)
id_strs[id_str] = page_no
time.sleep(1)
print("Duplicates in page", str(page_no), str(dup_count))
page_no += 1
except Exception as ex:
print(ex)
With the above code, I am trying to get the search results for users using tweepy(Python 3.5.2, tweepy 3.5.0) cursor. The results are being duplicated with the pages parameter being passed. Is it the right way to query the search_users using the tweepy cursor? I am getting results for the above code with the following pattern:
1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):
******* Page 0
Length of page 2
Duplicates in page 0 0
******* Page 1
Length of page 2
Duplicates in page 1 2
******* Page 2
Length of page 2
Duplicates in page 2 2
******* Page 3
Length of page 2
Duplicates in page 3 2
2. for high search results (name = "jon snow")
******* Page 0
Length of page 20
Duplicates in page 0 0
******* Page 1
Length of page 20
Duplicates in page 1 20
******* Page 2
Length of page 20
Duplicates in page 2 0
******* Page 3
Length of page 20
Duplicates in page 3 0
2
Answers
There are two issues here.
I made a pull request to tweepy with both the fixes.
Try adding this attribute to the Cursor; it should reduce the duplicates.