skip to Main Content

I am trying to add columns to the DataFrame based on the splitting of one column. With two rows, everything works and the empty column takes the value ‘None’. The problem when I only have one row and the DataFrame cannot be expanded, and I would expect it to also be assigned the value ‘None’.

Working example:

>>> import pandas as pd
>>> df = pd.DataFrame({'auth':['dbname_user','dbname']})
>>> df
          auth
0  dbname_user
1       dbname
>>> df[['db','login']] = df['auth'].str.split('_', n=1, expand=True)
>>> df
          auth      db login
0  dbname_user  dbname  user
1       dbname  dbname  None   <--- as expected, 'None' value is assigned

Problematic example:

>>> import pandas as pd
>>> df = pd.DataFrame({'auth':['dbname']})
>>> df
     auth
0  dbname
>>> df[['db','login']] = df['auth'].str.split('_', n=1, expand=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3643, in __setitem__
    self._setitem_array(key, value)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3685, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/indexers/utils.py", line 428, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

I would expect the same as in the working example, where the value for the second column is ‘None’. Unfortunately I cannot dynamically expand the number of columns using list comprehension. The number of columns must be fixed.

2

Answers


  1. You can try this piece of code which should work :

    def underscore_split(auth):
        if "_" in auth:
            return auth.split("_", 1)
        else:
            return [auth, None]
    
    
    df[['db', 'login']] = [underscore_split(x) for x in df["auth"]]
    
    Login or Signup to reply.
  2. This should do the trick:

    import pandas as pd
    import numpy as np
    
    
    df_example = pd.DataFrame({'auth': ['dbname']})
    df_example[["db", "login"]] = (
        # This np.where adds a "_" to each string that don't have "_".
        pd.Series(
            np.where(
                df_example["auth"].str.contains("_"),
                df_example["auth"],
                df_example["auth"] + "_"
            )
        )
        # Normal `.str.split` like you were doing.
        .str.split("_", n=1, expand=True)
        # `.applymap` converts empty strings ("") into None.
        .applymap(lambda value: None if value == "" else value)
    )
    print(df_example)
    # Prints:
    #      auth      db login
    # 0  dbname  dbname  None
    
    

    Explanation

    The error you’re getting (ValueError: Columns must be same length as key), happens because, after splitting the column "auth", you end up with only one value of length = 1. expand=True won’t help you here, because all the values from split have length = 1. Your first example works, because when pandas split the first value dbname_user, it had length = 2, therefore the remaining values got expanded to this same length. In other words, expand=True makes all the returning values have the same length as the value with the greatest length:

    
    # Series with first value containing no "_",
    # second value containing one "_",
    # and third value containing two "_".
    example_2 = pd.Series(['dbname', 'dbname_user', 'dbname_user_2'])
    
    # Applying `.str.split`, without setting parameter `n`
    split = example_2.str.split("_", expand=True)
    # Get the maximum length of the split (max_len = 3)
    max_len = example_2.str.split("_").str.len().max()
    
    print('Max Length:', max_len)
    print('Number of columns:', split.shape[1])
    print('max_len == split.shape[1]:', max_len == split.shape[1])
    print('n', split)
    # Prints:
    # Max Length: 3
    # Number of columns: 3
    # max_len == split.shape[1]: True
    # 
    #         0     1     2
    # 0  dbname  None  None
    # 1  dbname  user  None
    # 2  dbname  user     2
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search