skip to Main Content

I collected some twitter data doing this:

#connect to twitter API
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

#set radius and amount of requests
N=200  # tweets to request from each query
S=200  # radius in miles

lats=c(38.9,40.7)
lons=c(-77,-74)

roger=do.call(rbind,lapply(1:length(lats), function(i) searchTwitter('Roger+Federer',
                                                                lang="en",n=N,resultType="recent",
                                                              geocode=paste  (lats[i],lons[i],paste0(S,"mi"),sep=","))))

After this I’ve done:

rogerlat=sapply(roger, function(x) as.numeric(x$getLatitude()))
rogerlat=sapply(rogerlat, function(z) ifelse(length(z)==0,NA,z))  

rogerlon=sapply(roger, function(x) as.numeric(x$getLongitude()))
rogerlon=sapply(rogerlon, function(z) ifelse(length(z)==0,NA,z))  

data=as.data.frame(cbind(lat=rogerlat,lon=rogerlon))

And now I would like to get all the tweets that have long and lat values:

data=filter(data, !is.na(lat),!is.na(lon))
lonlat=select(data,lon,lat)

But now I only get NA values…. Any thoughts on what goes wrong here?

3

Answers


  1. Assuming that some tweets were downloaded, there are some geo-referenced tweets and some tweets without geographical coordinates:

    prod(dim(data)) > 1 & prod(dim(data)) != sum(is.na(data)) & any(is.na(data))
    # TRUE
    

    Let’s simulate data between your longitude/latitude points for simplicity.

    set.seed(123)
    data <- data.frame(lon=runif(200, -77, -74), lat=runif(200, 38.9, 40.7))
    data[sample(1:200, 10),] <- NA
    

    Rows with longitude/latitude data can be selected by removing the 10 rows with missing data.

    data2 <- data[-which(is.na(data[, 1])), c("lon", "lat")]
    nrow(data) - nrow(data2)
    # 10
    

    The last line replaces the last two lines of your code. However, note that this only works if the missing geographical coordinates are stored as NA.

    Login or Signup to reply.
  2. Not necessarily an answer, but more an observation too long for comment:

    First, you should look at the documentation of how to input geocode data. Using twitteR:

    setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
    
    #set radius and amount of requests
    N=200  # tweets to request from each query
    S=200  # radius in miles
    

    Geodata should be structured like this (lat, lon, radius):

    geo <- '40,-75,200km'
    

    And then called using:

    roger <- searchTwitter('Roger+Federer',lang="en",n=N,resultType="recent",geocode=geo)
    

    Then, I would instead use twListtoDF to filter:

    roger <- twListToDF(roger)
    

    Which now gives you a data.frame with 16 cols and 200 observations (set above).

    You could then filter using:

    setDT(roger) #from data.table
    roger[latitude > 38.9 & latitude < 40.7 & longitude > -77 & longitude < -74]
    

    That said (and why this is an observation vs. an answer) – it looks as though twitteR does not return lat and lon (it is all NA in the data I returned) – I think this is to protect individual users locations.

    That said, adjusting the radius does affect the number of results, so the code does have access to the geo data somehow.

    Login or Signup to reply.
  3. As Chris mentioned, searchTwitter does not return the lat-long of a tweet. You can see this by going to the twitteR documentation, which tells us that it returns a status object.

    Status Objects

    Scrolling down to the status object, you can see that 11 pieces of information are included, but lat-long is not one of them. However, we are not completely lost, because the user’s screen name is returned.

    If we look at the user object, we see that a user’s object at least includes a location.

    So I can think of at least two possible solutions, depending on what your use case is.

    Solution 1: Extracting a User’s Location

    # Search for recent Trump tweets #
    tweets <- searchTwitter('Trump', lang="en",n=N,resultType="recent",
                  geocode='38.9,-77,50mi')
    
    # If you want, convert tweets to a data frame #
    tweets.df <- twListToDF(tweets)
    
    # Look up the users #
    users <- lookupUsers(tweets.df$screenName)
    
    # Convert users to a dataframe, look at their location#
    users_df <- twListToDF(users)
    
    table(users_df[1:10, 'location'])
    
                                           ❤ Texas  ❤ ALT.SEATTLE.INTERNET.UR.FACE 
                       2                            1                            1 
                   Japan             Land of the Free                  New Orleans 
                       1                            1                            1 
      Springfield OR USA                United States                          USA 
                       1                            1                            1 
    
    # Note that these will be the users' self-reported locations,
    # so potentially they are not that useful
    

    Solution 2: Multiple searches with limited radius

    The other solution would be to conduct a series of repeated searches, increment your latitude and longitude with a small radius. That way you can be relatively sure that the user is close to your specified location.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search