skip to Main Content

Description :
I have a list of column names which I need.
I want to check if all these columns names are present in dataframe.if some columns are present then use those columns and make a generic code like

Df1=df.select(df[column1],df(column2])

List=[column1,column2,column3,column4] Want to check if columns in list is present and whatever the columns are present in dataframe use it in select query

2

Answers


  1. You need to do it in an iterative fashion

    select_list = ['col1','col2','col3']
    df_columns = sparkDF.columns ### ['col1','col2','col5','col7']
    
    final_select_list = []
    
    for col in select_list:
        if col in df_columns:
           final_select_list += [col]
    
    ### final_select_list --> ['col1','col2']
    
    
    sparkDF.select(*final_select_list).show()
    
    Login or Signup to reply.
  2. The other answer(s) work perfectly. But it can also be written in a one liner.

    # predefined list of all required columns
    reqd_cols = ['id', 'dt', 'name', 'phone']
    
    data_sdf. 
        select(*[k for k in data_sdf.columns if k in reqd_cols])
    

    The list comprehension within the select() checks if any column from data_sdf dataframe is present in the reqd_cols list and keeps only the ones that are overlapping.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search