skip to Main Content

I am trying to understand the lambda and map functions in python, specifically with regard to the below code I have been following using the tweeps API. I have googled lambda and map but I’m struggling to understand it in the context of this script. As I understand Lambda passes an argument and an expression, thereby becoming a shortened function? Could you kindly take a look at the code below for me and indicate what map and lambda are doing in each line here?

#Reading the raw data collected from the Twitter Streaming API using Tweepy
tweets_data = []
tweets_data_path = 'output2.txt'
tweets_file = open(tweets_data_path, 'r')
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

print('The total number of Tweets is:', len(tweets_data))

#Create a function to see if the tweet is a retweet
def is_RT(tweet):
    if 'retweeted_status' not in tweet:
        return False
    else:
        return True

#Create a function to see if the tweet is a reply to a tweet of another user, if so return that user.
def is_Reply_to(tweet):
    if 'in_reply_to_screen_name' not in tweet:
        return False
    else:
        return tweet['in_reply_to_screen_name']

#Convert the Tweet JSON data to pandas Dataframe, and take the desired fields from the JSON.

tweets = pd.DataFrame()
tweets['text'] = list(map(lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], tweets_data))

tweets['Username'] = list(map(lambda tweet: tweet['user']['screen_name'], tweets_data))

tweets['Timestamp'] = list(map(lambda tweet: tweet['created_at'], tweets_data))

tweets['length'] = list(map(lambda tweet: len(tweet['text']) if 'extended_tweet' not in tweet else len(tweet['extended_tweet']['full_text']), tweets_data))

tweets['location'] = list(map(lambda tweet: tweet['user']['location'], tweets_data))

tweets['device'] = list(map(reckondevice, tweets_data))

tweets['RT'] = list(map(is_RT, tweets_data))

tweets['Reply'] = list(map(is_Reply_to, tweets_data))

I was following the guide fine but this threw me as I have never seen map or lambda before. I understand we are building a data frame in pandas I’m just not sure how it is happening?

Thanks!!

2

Answers


  1. Syntactically a map function is like this:

    map(callable, <collection>)
    

    In simple word, it iterates over the collection, and on each item, executes the callable, and replaces the item with the return value of callable, in the list. Well, technically is doesn’t modifies the list, nor it creates the new list, but you get the idea. You pass an iterable, and map returns a new iterable, where each item is transformed using callable.

    Now, lambda is a shorthand to create unnamed function.

    lambda x: str(x)
    

    is similar to:

    def transform_to_str(x):
        return str(x)
    

    Now, given this code:

    tweets['text'] = list(map(lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], tweets_data))
    

    Let’s split that up:

    callable = lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text']
    
    iterable = tweets_data
    
    tweets['text'] = list(map(callable, iterable))
    

    Let’s convert callable to a normal function:

    def callable(tweet):
        return tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text']
    

    So, what your code does is:

    • It iterates over tweets_data (iterable).
    • For each tweet in that tweets_data, it calls callable (lambda), which takes single argument.
    • And takes it return value, and returns it, as a part of generator.

    The list() function converts generator to list, thus forcing all tweets to transform at once.

    Now, you can try to understand other lambdas. Probably go through the documentation also, which is quite elaborate.

    Login or Signup to reply.
  2. A simple way to understand lambda is, it takes an argument before : whatever after : comes, gets returned. For ex, in your above code:

    tweets['text'] = list(map(lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], tweets_data))

    lambda tweet: tweet['text'] simply takes a dictionary tweet and returns value of the key text

    And, map is a function which simply applies a given function over an iterable(list, tuple, etc.) and returns an iterable

    Note: An iterable is something over which you can apply for loop

    So, if we make a small function for your lambda expression lambda tweet: tweet['text'] if 'extended_tweet' not in tweet else tweet ['extended_tweet']['full_text'], it would look like:

    def foo(tweet):
        if 'extended_tweet' not in tweet:
            return tweet['text']
        else:
            return tweet ['extended_tweet']['full_text']
    

    Let us apply this to our map:

    map(foo, tweets_data)
    

    So, here, function foo() is being applied to each and every element of tweets_data

    And the list function takes the returned value of map one-by-one and converts them to a list

    Hope you find the explanation helpful

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search