Tweepy cursor .pages() with api.search_users returning same page again and again - Twitter API

manji369
October 26, 2017
221 views
0 votes
2 Answers

    auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    api = tweepy.API(auth)
    user_objs = []
    name = "phungsuk wangdu"
    id_strs = {}
    page_no = 0
    try:
        for page in tweepy.Cursor(api.search_users, name).pages(3):
            dup_count = 0
            print("*******  Page", str(page_no))
            print("Length of page", len(page))
            user_objs.extend(page)
            for user_obj in page:
                id_str = user_obj._json['id_str']
                if id_str in id_strs:
                    # print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
                    dup_count += 1
                else:
                    # print(id_str)
                    id_strs[id_str] = page_no
            time.sleep(1)
            print("Duplicates in page", str(page_no), str(dup_count))
            page_no += 1
    except Exception as ex:
        print(ex)

With the above code, I am trying to get the search results for users using tweepy(Python 3.5.2, tweepy 3.5.0) cursor. The results are being duplicated with the pages parameter being passed. Is it the right way to query the search_users using the tweepy cursor? I am getting results for the above code with the following pattern:

1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):

    *******  Page 0
    Length of page 2
    Duplicates in page 0 0
    *******  Page 1
    Length of page 2
    Duplicates in page 1 2
    *******  Page 2
    Length of page 2
    Duplicates in page 2 2
    *******  Page 3
    Length of page 2
    Duplicates in page 3 2

2. for high search results (name = "jon snow")

    *******  Page 0
    Length of page 20
    Duplicates in page 0 0
    *******  Page 1
    Length of page 20
    Duplicates in page 1 20
    *******  Page 2
    Length of page 20
    Duplicates in page 2 0
    *******  Page 3
    Length of page 20
    Duplicates in page 3 0

Tags: python-3.x tweepy

Answers

Chosen as BEST ANSWER
- manji369
- October 28, 2017 at 10:23 pm
- 0 votes
0
There are two issues here.
1. Tweepy's pageiterator for cursor starts pagenumber from 0 while python's page number starts from 1.
2. Python returns results from the last available page for page numbers that are greater than available results.
I made a pull request to tweepy with both the fixes.

(Edit)

- Sssssuppp
- July 8, 2018 at 5:59 am
- 0 votes
0
Try adding this attribute to the Cursor; it should reduce the duplicates.
```
q= <your query> +" -filter:retweets"
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Tweepy cursor .pages() with api.search_users returning same page again and again – Twitter API

Answers