`join` method importing `other` dataframe values as `NaN` - Twitter API

dsx
April 18, 2022
185 views
0 votes
2 Answers

Editing this to reflect addition work:

Situation

I have 2 pandas dataframes of Twitter search tweets API data in which I have a common data key, author_id.

I’m using the join method.

Code is:

dfTW08 = dfTW07.join(dfTW04uf, on='author_id', how='left', lsuffix='', rsuffix='4')

Results

When I run that, everything comes out as expected, except that all the other dataframe (dfTW04uf) values come in as NaN. Including the values for the other dataframe’s author_id column.

Assessment

I’m not getting any error messages, but have to think it’s something about the datatypes. The other dataframe is a mix of int64, object, bool, and datetime datatypes. So it seems odd they’d all be unrecognized.

Any suggestions on how to troubleshoot this greatly appreciated.

Answers

Chosen as BEST ANSWER
- dsx
- April 19, 2022 at 5:38 am
- 0 votes
0
Couldn't figure out the NaN issue using join, but was able to merge the databases with this:

callingdf.merge(otherdf, on='author_id', how='left', indicator=True)

Then did sort_values and drop_duplicates to get the final list I wanted.

(Edit)

- tylerjames
- April 19, 2022 at 6:08 am
- 0 votes
0
You can use merge instead of join since merge had everything join does but with more "power". (anything you can do with join you can do with merge)

I am assuming the NaN is coming up since the results aren’t being discarded when you asked the first join to use on author ID and then include suffixes fo x an y. When you left join with merge you are discarding the non matches without any x and y suffixes.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

`join` method importing `other` dataframe values as `NaN` – Twitter API

Answers