skip to Main Content

Editing this to reflect addition work:

Situation

I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id.

I’m using the join method.

Code is:

dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')

Results

When I run that, everything comes out as expected, except that all the other dataframe (dfTW04uf) values come in as NaN. Including the values for the other dataframe’s author_id column.

Assessment

I’m not getting any error messages, but have to think it’s something about the datatypes. The other dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they’d all be unrecognized.

Any suggestions on how to troubleshoot this greatly appreciated.

2

Answers


  1. Chosen as BEST ANSWER

    Couldn't figure out the NaN issue using join, but was able to merge the databases with this:

    callingdf.merge(otherdf, on='author_id', how='left', indicator=True)

    Then did sort_values and drop_duplicates to get the final list I wanted.


  2. You can use merge instead of join since merge had everything join does but with more "power". (anything you can do with join you can do with merge)

    I am assuming the NaN is coming up since the results aren’t being discarded when you asked the first join to use on author ID and then include suffixes fo x an y. When you left join with merge you are discarding the non matches without any x and y suffixes.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search