Here is my code:
with open(r'unique_columns.json', 'r') as f:
config = json.load(f)
unique_col_comb = config['Unique_Column_Combination']['TABLE_NAME']
df = pd.read_csv(f's3://path/to/file.csv', sep='|')
df_unique = df.set_index([unique_col_comb]).index.is_unique
print(df_unique)
My JSON looks like this:
{
"Unique_Column_Combination":
{
"TABLE_NAME": "COL1, COL2, COL3"
}
}
I get the error:
KeyError: "None of ['COL1, COL2, COL3'] are in the columns"
But when I actually write out the columns in df_unique
, the code works:
df_unique = df.set_index(['COL1', 'COL2', 'COL3']).index.is_unique
>>True
I think I need to add an extra quote (") to the end of each column name in my JSON file but then it won’t be in proper JSON format. Can I add it in the python code? Or do I need to convert my JSON dictionary to a python list?
2
Answers
"COL1, COL2, COL3"
is a single string. You should probably split it into three column names withunique_col_comb.split()
The json does not decompose into what would be a dataframe.
Here is an example of json that can be read:
I didn’t mention
df.read_json()
in the example because this part worked and it is the format of thejson
that failed.In your example this json would work if square brackets added for example:
to give this: