I am trying to add columns to the DataFrame based on the splitting of one column. With two rows, everything works and the empty column takes the value ‘None’. The problem when I only have one row and the DataFrame cannot be expanded, and I would expect it to also be assigned the value ‘None’.
Working example:
>>> import pandas as pd
>>> df = pd.DataFrame({'auth':['dbname_user','dbname']})
>>> df
auth
0 dbname_user
1 dbname
>>> df[['db','login']] = df['auth'].str.split('_', n=1, expand=True)
>>> df
auth db login
0 dbname_user dbname user
1 dbname dbname None <--- as expected, 'None' value is assigned
Problematic example:
>>> import pandas as pd
>>> df = pd.DataFrame({'auth':['dbname']})
>>> df
auth
0 dbname
>>> df[['db','login']] = df['auth'].str.split('_', n=1, expand=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3643, in __setitem__
self._setitem_array(key, value)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3685, in _setitem_array
check_key_length(self.columns, key, value)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/indexers/utils.py", line 428, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
I would expect the same as in the working example, where the value for the second column is ‘None’. Unfortunately I cannot dynamically expand the number of columns using list comprehension. The number of columns must be fixed.
2
Answers
You can try this piece of code which should work :
This should do the trick:
Explanation
The error you’re getting (
ValueError: Columns must be same length as key
), happens because, after splitting the column"auth"
, you end up with only one value of length = 1.expand=True
won’t help you here, because all the values from split have length = 1. Your first example works, because when pandas split the first valuedbname_user
, it had length = 2, therefore the remaining values got expanded to this same length. In other words,expand=True
makes all the returning values have the same length as the value with the greatest length: