skip to Main Content
    auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    api = tweepy.API(auth)
    user_objs = []
    name = "phungsuk wangdu"
    id_strs = {}
    page_no = 0
    try:
        for page in tweepy.Cursor(api.search_users, name).pages(3):
            dup_count = 0
            print("*******  Page", str(page_no))
            print("Length of page", len(page))
            user_objs.extend(page)
            for user_obj in page:
                id_str = user_obj._json['id_str']
                if id_str in id_strs:
                    # print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
                    dup_count += 1
                else:
                    # print(id_str)
                    id_strs[id_str] = page_no
            time.sleep(1)
            print("Duplicates in page", str(page_no), str(dup_count))
            page_no += 1
    except Exception as ex:
        print(ex)

With the above code, I am trying to get the search results for users using tweepy(Python 3.5.2, tweepy 3.5.0) cursor. The results are being duplicated with the pages parameter being passed. Is it the right way to query the search_users using the tweepy cursor? I am getting results for the above code with the following pattern:

1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):

    *******  Page 0
    Length of page 2
    Duplicates in page 0 0
    *******  Page 1
    Length of page 2
    Duplicates in page 1 2
    *******  Page 2
    Length of page 2
    Duplicates in page 2 2
    *******  Page 3
    Length of page 2
    Duplicates in page 3 2

2. for high search results (name = "jon snow")

    *******  Page 0
    Length of page 20
    Duplicates in page 0 0
    *******  Page 1
    Length of page 20
    Duplicates in page 1 20
    *******  Page 2
    Length of page 20
    Duplicates in page 2 0
    *******  Page 3
    Length of page 20
    Duplicates in page 3 0

2

Answers


  1. Chosen as BEST ANSWER

    There are two issues here.

    1. Tweepy's pageiterator for cursor starts pagenumber from 0 while python's page number starts from 1.
    2. Python returns results from the last available page for page numbers that are greater than available results.

    I made a pull request to tweepy with both the fixes.


  2. Try adding this attribute to the Cursor; it should reduce the duplicates.

    q= <your query> +" -filter:retweets"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search