I am trying to understand the lambda and map functions in python, specifically with regard to the below code I have been following using the tweeps API. I have googled lambda and map but I’m struggling to understand it in the context of this script. As I understand Lambda passes an argument and an expression, thereby becoming a shortened function? Could you kindly take a look at the code below for me and indicate what map and lambda are doing in each line here?
#Reading the raw data collected from the Twitter Streaming API using Tweepy
tweets_data = []
tweets_data_path = 'output2.txt'
tweets_file = open(tweets_data_path, 'r')
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
print('The total number of Tweets is:', len(tweets_data))
#Create a function to see if the tweet is a retweet
def is_RT(tweet):
if 'retweeted_status' not in tweet:
return False
else:
return True
#Create a function to see if the tweet is a reply to a tweet of another user, if so return that user.
def is_Reply_to(tweet):
if 'in_reply_to_screen_name' not in tweet:
return False
else:
return tweet['in_reply_to_screen_name']
#Convert the Tweet JSON data to pandas Dataframe, and take the desired fields from the JSON.
tweets = pd.DataFrame()
tweets['text'] = list(map(lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], tweets_data))
tweets['Username'] = list(map(lambda tweet: tweet['user']['screen_name'], tweets_data))
tweets['Timestamp'] = list(map(lambda tweet: tweet['created_at'], tweets_data))
tweets['length'] = list(map(lambda tweet: len(tweet['text']) if 'extended_tweet' not in tweet else len(tweet['extended_tweet']['full_text']), tweets_data))
tweets['location'] = list(map(lambda tweet: tweet['user']['location'], tweets_data))
tweets['device'] = list(map(reckondevice, tweets_data))
tweets['RT'] = list(map(is_RT, tweets_data))
tweets['Reply'] = list(map(is_Reply_to, tweets_data))
I was following the guide fine but this threw me as I have never seen map or lambda before. I understand we are building a data frame in pandas I’m just not sure how it is happening?
Thanks!!
2
Answers
Syntactically a map function is like this:
In simple word, it iterates over the collection, and on each item, executes the callable, and replaces the item with the return value of callable, in the list. Well, technically is doesn’t modifies the list, nor it creates the new list, but you get the idea. You pass an
iterable
, andmap
returns a newiterable
, where each item is transformed usingcallable
.Now,
lambda
is a shorthand to create unnamed function.is similar to:
Now, given this code:
Let’s split that up:
Let’s convert
callable
to a normal function:So, what your code does is:
tweets_data (iterable)
.tweet
in that tweets_data, it callscallable (lambda)
, which takes single argument.The
list()
function convertsgenerator
tolist
, thus forcing alltweets
to transform at once.Now, you can try to understand other lambdas. Probably go through the documentation also, which is quite elaborate.
A simple way to understand lambda is, it takes an argument before
:
whatever after:
comes, gets returned. For ex, in your above code:tweets['text'] = list(map(lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], tweets_data))
lambda tweet: tweet['text']
simply takes adictionary
tweet
and returns value of the keytext
And, map is a
function
which simply applies a given function over aniterable
(list, tuple, etc.) and returns aniterable
Note: An iterable is something over which you can apply for loop
So, if we make a small function for your lambda expression
lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text']
, it would look like:Let us apply this to our map:
So, here, function
foo()
is being applied to each and every element oftweets_data
And the
list
function takes the returned value ofmap
one-by-one and converts them to a listHope you find the explanation helpful