skip to Main Content

I am a new user of bigframes package from googleapis. I am trying to manipulate dataframe loaded from Bigquery.

I was trying to execute some code but I am facing a problem that i am not able to solve.

I am trying to use the apply function on a Dataframe with the parameter axis=1 , but it doesn’t seem to work. I always have an error message.

Can you please help me with this?

Thanks.

Code example

# example
def condition(row):
    print(row )
    if 1 <= row["month"] <= 6:
        return f"{row['year']:02}S1{row['CODPY']}{row['CODDE']}"
    else:
        return f"{row['year']:02}S2{row['CODPY']}{row['CODDE']}"

valodetail_df['IDT'] = valodetail_df.apply(condition,axis=1)

Stack trace

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in apply
    results = {name: func(col, *args, **kwargs) for name, col in self.items()}
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in <dictcomp>
    results = {name: func(col, *args, **kwargs) for name, col in self.items()}
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 3, in condition
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
>>> valodetail_df['IDTDCI'] = valodetail_df.apply(condition,axis=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in apply
    results = {name: func(col, *args, **kwargs) for name, col in self.items()}
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/qback/lib/python3.11/site-packages/bigframes/dataframe.py", line 3118, in <dictcomp>
    results = {name: func(col, *args, **kwargs) for name, col in self.items()}
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: condition() got an unexpected keyword argument 'axis'

2

Answers


  1. axis=1 is currently not supported: https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame#bigframes_dataframe_DataFrame_apply

    There is a feature request https://github.com/googleapis/python-bigquery-dataframes/issues/592 for the same.

    However for your particular use case it is possible to achieve by other means.

    Here is a guess of the kind of DataFrame you are working with:

    import bigframes.pandas as bpd
    
    df = bpd.DataFrame({
            "month": [1,3,6,7,12],
            "year":  ["8", "9", "10", "11", "12"],
            "CODPY": ["PY", "PY", "PY", "PY", "PY"],
            "CODDE": ["DE", "DE", "DE", "DE", "DE"],
         })
    df
    
        month   year    CODPY   CODDE
    0       1      8       PY      DE
    1       3      9       PY      DE
    2       6     10       PY      DE
    3       7     11       PY      DE
    4      12     12       PY      DE
    

    We can use other DataFrame and Series APIs to create the desired column:

    condition = (df["month"] >= 1) & (df["month"] <= 6)
    
    s1 = df["year"].str.pad(fillchar='0', width=2) + "S1" + df["CODPY"] + df["CODDE"]
    
    s2 = df["year"].str.pad(fillchar='0', width=2) + "S2" + df["CODPY"] + df["CODDE"]
    
    df['IDT'] = s1.where(condition, s2)
    df
    
       month    year    CODPY   CODDE        IDT
    0      1       8       PY      DE   08S1PYDE
    1      3       9       PY      DE   09S1PYDE
    2      6      10       PY      DE   10S1PYDE
    3      7      11       PY      DE   11S2PYDE
    4     12      12       PY      DE   12S2PYDE
    

    Hope this helps.

    Login or Signup to reply.
  2. As of BigQuery DataFrames (bigframes) version 1.6.0 there is now a preview of support for apply with axis=1 making use of BigQuery Remote Functions.

    If your function is doing something that couldn’t be expressed without an axis=1 function as shown in Shobhit’s workaound in https://stackoverflow.com/a/78331896/101923, you can now do the following:

    import bigframes
    
    bigframes.__version__
    # 1.6.0
    
    import bigframes.pandas as bpd
    
    df = bpd.DataFrame({
            "month": [1,3,6,7,12],
            "year":  ["8", "9", "10", "11", "12"],
            "CODPY": ["PY", "PY", "PY", "PY", "PY"],
            "CODDE": ["DE", "DE", "DE", "DE", "DE"],
         })
    
    
    # Note: input_types must be a Series.
    # Only scalar output types are currently supported.
    @bpd.remote_function(input_types=bpd.Series, output_type=str)
    def condition(row):
        print(row)
        if 1 <= row["month"] <= 6:
            return f"{row['year']:02}S1{row['CODPY']}{row['CODDE']}"
        else:
            return f"{row['year']:02}S2{row['CODPY']}{row['CODDE']}"
    
    
    df['IDT'] = df.apply(condition, axis=1)
    df
    
    month year CODPY CODDE IDT
    0 1 8 PY DE 80S1PYDE
    1 3 9 PY DE 90S1PYDE
    2 6 10 PY DE 10S1PYDE
    3 7 11 PY DE 11S2PYDE
    4 12 12 PY DE 12S2PYDE

    Note: There are currently (bigframes==1.6.0) limitations to the data types that are supported. Your row can only contain INT64, FLOAT64, BOOL, or STRING columns (source).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search