I’m working with twitter ids which are strings because they are so huge.
Twitter’s api has a “Since_id” and I want to search tweets since the earliest tweet in a list.
For example:
tweet_ids = [u'1003659997241401843', u'1003659997241401234234', u'100365999724140136236'] # etc
since_id = min(tweet_ids)
So far min(tweet_ids)
works but I want to understand why it works because I want to know if it is just by chance that it worked on the few samples I gave it, or if it is guaranteed to always work.
Edit: To clarify I need to get the lowest tweet id. How do I get the lowest tweet id if they are strings that are > 2^32-1 and therefore can’t be represented as integers in python 2.7 on a 32 bit machine.
I am using python 2.7 if that matters
2
Answers
Python will compare these strings exactly as it compares any other strings; that is, it will compare them lexicographically.
Thus, it will put
12
before2
, which may be undesirable for you.Here’s a function that will compute the numerical minimum of strings representing integers for you.
From the Python Documentation, it implies that all Strings, including your case where the strings are large sequences of digits, are compared lexicographically.
2
is less than then “greater integer” string100
in this case."-1"
is greater than"99"
when compared this way because the minus hyphen is lexicographically greater than all digits."2"
and"02"
aren’t necessarily equal in terms of string comparison."02"
is less than"2"
string-wise because of the leading zero.It is better to convert the str into a long int, and then compare it. As in
tweet_ids = [long('1003659997241401843'), long('1003659997241401234234'), long('100365999724140136236')]
since_id = min(tweet_ids)
Since JSON does not allow 70-bit long ints, convert the smallest int back into a str. Replace the
since_id
line withsince_id = min(tweet_ids, key=int)