skip to Main Content
import io
import pandas as pd

csv_data = '''App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''

df = pd.read_csv(io.StringIO(csv_data), sep=",")
html_table = df.tohtml()

Is there a way to compare the values of columns in dataframe and use it in conditional formatting ? I want compare if the ‘prod’,’pre-prod’ and ‘stage’ values are mismatching, if yes then then its bg-color should be red. I have tired the following methods present in pandas but none of them works.

df.style.apply()

df.style.apply_index()

df.style.applymap()

Current Output:

enter image description here

Desired output:

enter image description here

3

Answers


  1. Chosen as BEST ANSWER
    import io
    import pandas as pd
    
    csv_data = '''
    App_name,pre-prod,prod,stage
    matching-image,nginx,nginx,nginx
    matching-image,nginx,nginx,nginx
    mismatching-image,nginx,nginx,nginx:1.23.3-alpine
    mismatching-image,nginx,nginx,nginx:1.23.3-alpine
    '''
    df = pd.read_csv(io.StringIO(csv_data), sep=",")
    def match_checker(row):
        if row['prod'] == row['pre-prod'] == row['stage']:
            return [''] * len(row)
        else:
            return ['background-color: red'] * len(row)
            
    df = df.style.apply(match_checker, axis=1)
    
    html_table = df.to_html()
    
    with open('testpandas.html','w+') as html_file:
        html_file.write(html_table)
        html_file.close()
    

    Updated @PeterSmith answer.


  2. You can add style conditionally by applying style to a subset of your dataframe like:

    import io
    import pandas as pd
    
    csv_data = '''App_name,pre-prod,prod,stage
    matching-image,nginx,nginx,nginx
    mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''
    
    def add_color(row):
        return ['background-color: red'] * len(row)
    
    df = pd.read_csv(io.StringIO(csv_data), sep=",")
    df.loc[(df["pre-prod"] == df["prod"]) & (df["prod"] == df["stage"])].style.apply(add_color, axis=1)
    
    Login or Signup to reply.
  3. It’s also possible to style the entire DataFrame in one go by passing axis=None to apply.

    We can identify rows which have differing values in the specified columns by comparing the first column (column 0) with the remaining columns (column 1-2) and identifying where there are unequal values using ne on axis=0.

    df[['prod', 'stage']].ne(df['pre-prod'], axis=0)
    
    #     prod  stage
    # 0  False  False
    # 1  False   True
    

    Then we can check across rows for any rows which have any True values (meaning there is something that’s not equal in the row).

    df[['prod', 'stage']].ne(df['pre-prod'], axis=0).any(axis=1)
    
    # 0    False
    # 1     True
    # dtype: bool
    

    We can then simply apply the styles anywhere there’s a True value in the resulting Series.


    Altogether this could look something like:

    def colour_rows_that_dont_match(df_: pd.DataFrame, comparison_cols: List[str]):
        # Sanity check that comparison_cols is what we expect
        assert isinstance(comparison_cols, list) and len(comparison_cols) > 1, 
            'Must be a list and provide at least 2 column to compare'
    
        # Create an Empty DataFrame to hold styles of the same shape as the original df
        styles_df = pd.DataFrame('', index=df_.index, columns=df_.columns)
        # Compare the first column's (col 0) values to the remaining columns. 
        # Find rows where any values are not equal (ne)
        rows_that_dont_match = df[comparison_cols[1:]].ne(df[comparison_cols[0]], axis=0).any(axis=1)
        # Apply styles to rows which meet the above criteria
        styles_df.loc[rows_that_dont_match, :] = 'background-color: red'
        return styles_df
    
    
    df.style.apply(
        colour_rows_that_dont_match,
        # This gets passed to the function
        comparison_cols=['pre-prod', 'prod', 'stage'],
        # Apply to the entire DataFrame at once
        axis=None
    ).to_html(buf='test_df.html')
    

    Which produces the following:
    HTML table with second row highlighted due to differing values in the 'prod' and 'stage' columns


    Setup, version, and imports:

    from typing import List
    
    import pandas as pd  # version 1.5.2
    
    df = pd.DataFrame({
        'App_name': ['matching-image', 'mismatching-image'],
        'pre-prod': ['nginx', 'nginx'],
        'prod': ['nginx', 'nginx'],
        'stage': ['nginx', 'nginx:1.23.3-alpine']
    })
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search