I have a python code that is presenting a different behavior when I run it on Windows and when I run it on CentOS.
Below is the partial code that is of interest for this issue with comments to explain what is the purpose. It basically process a bunch of CSV files (some of them with different columns from each other) and merge them into a single CSV that has all the columns:
#Get the name of CSV files of the current folder:
local_csv_files = glob("*.csv")
#Define the columns and the order they should appear on the final file:
global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']
#Dataframe list:
lista_de_dataframes=[]
#Loop to be executed for all the CSV files in the current folder.
for ficheiro_csv in local_csv_files:
df = pd.read_csv(ficheiro_csv)
#Store the CSV columns on a variable and collect the number of columns:
colunas_do_csv_aux= df.columns.values
global_number_of_columns = len(global_csv_columns)
aux_csv_number_of_columns = len(colunas_do_csv_aux)
#Normalize each CSV file so that all CSV files have the same columns
for coluna_ in global_csv_columns:
if search_column(colunas_do_csv_aux, coluna_)==False:
#If the column does not exist in the current CSV, add an empty column with the correct header:
df.insert(0, coluna_, "")
#Order the dataframe columns according to the order of the global_csv_columns list:
df = df[global_csv_columns]
lista_de_dataframes.append(df)
del df
big_unified_dataframe = pd.concat(lista_de_dataframes, copy=False).drop_duplicates().reset_index(drop=True)
big_unified_dataframe.to_csv('global_file.csv', index=False)
#Create an additional txt file to present with each row of the CSV in a JSON format:
with open('global_file.csv', 'r') as arquivo_csv:
with open('global_file_c.txt', 'w') as arquivo_txt:
reader = csv.DictReader(arquivo_csv, global_csv_columns)
iterreader = iter(reader)
next(iterreader)
for row in iterreader:
out=json.dumps(row)
arquivo_txt.write(out)
Now, on Windows and on CentOS, this works well for the final CSV since it has all the columns ordered as defined in the list:
global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']
This ordering is achieved by this code line:
#Order the dataframe columns according to the order of the global_csv_columns list:
df = df[global_csv_columns]
But the final ‘txt’ file is different on CentOS. In CentOS the order is changed. Below the output of the txt file in both platforms (windows and CentOS).
Windows:
{"Timestamp": "06/09/2022 10:33", "a_country": "UAE", "b_country": "UAE", "call_setup_time": "7.847", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
{"Timestamp": "06/09/2022 10:30", "a_country": "Saudi_Arabia", "b_country": "Saudi_Arabia", "call_setup_time": "10.038", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
...
CentOS:
{"latency": "", "call_setup_time": "7.847", "Timestamp": "06/09/2022 10:33", "test_type": "voice_call", "throughput": "", "b_country": "UAE", "a_country": "UAE", "quality": ""}
{"latency": "", "call_setup_time": "10.038", "Timestamp": "06/09/2022 10:30", "test_type": "voice_call", "throughput": "", "b_country": "Saudi_Arabia", "a_country": "Saudi_Arabia", "quality": ""}
...
Is there any way to assure the column order in CentOS?
3
Answers
try the pd.DataFrame.to_json function which allows you to write a dataframe to a json file directly. This will allow you to write a df to the json file without reading it from a csv file. I suspect this function may allow you to write without changing the order of the column.
Your output JSON dictionaries aren’t sorted so the order in which the tags appear could be random. I think in practice the tags usually appear in the order in which they were created in each dictionary but if you can have the dictionaries sorted by tag:
This will at least make them consistent although you may place more meaning on some tags.
Now reason is clear: order inside common
dict
s was added in python3.6 as implemention specific and is required to be furnished in python3.7 and newer.Read Are dictionaries ordered in Python 3.6+? if you want to know more.
Optimal solution would be to have same python versions up to minor, that is if you have 3.9.6 on your Windows machine then python3.9 on CentOS. If you are unable to install it python3.7 or python3.8 should do, however be warned that if you have both python2 and python3 installed on single machine, then you should use python3 if you want to use newer version, e.g.
where
helloworld.py
is file with python code.