I have a pandas dataframe that has most of column Job_Code as unique text, however there are missing records recorded as nulls (ie "NaN"). To push this into a Database with Job_Code as a Primary Key I want to replace these missing records with new unique values. My current df is 265 rows × 41 columns.
For simplicity:
df = pd.DataFrame({'Client':['Microsoft', 'Google', 'Apple', 'StackOverflow', 'PostGres'],
'Job_Code': ['J001', 'J002', np.NaN, 'J003', np.NaN]})
My desired outcome is:
df = pd.DataFrame({'Client':['Microsoft', 'Google', 'Apple', 'StackOverflow', 'PostGres'],
'Job_Code': ['J001', 'J002', 'tempkey1', 'J003', 'tempkey2']})
My initial code to solve executed, but didn’t work:
c=1
for e in enumerate(df.Job_Code):
if pd.isnull(e):
e == "tempkey" + str(c)
c+1
I was looking and couldn’t find a solution to my problem, though more than once there was a case of "Don’t use iterrows!".
Feel free to comment on any methods I’ve used.
3
Answers
In the end I used this and it worked, but I'd accept a better answer, one that isn't iterrows reliant.
You can create mask by test missing values by
Series.isna
and then inDataFrame.loc
addrange
byRangeIndex
with count number ofTrue
s bysum
:Performance:
try this: