skip to Main Content

I have a python code that is presenting a different behavior when I run it on Windows and when I run it on CentOS.
Below is the partial code that is of interest for this issue with comments to explain what is the purpose. It basically process a bunch of CSV files (some of them with different columns from each other) and merge them into a single CSV that has all the columns:

#Get the name of CSV files of the current folder:
 local_csv_files = glob("*.csv")
 #Define the columns and the order they should appear on the final file:
 global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']
 #Dataframe list:
 lista_de_dataframes=[]
 
 #Loop to be executed for all the CSV files in the current folder.
 for ficheiro_csv in local_csv_files:
    df = pd.read_csv(ficheiro_csv)
    #Store the CSV columns on a variable and collect the number of columns:
    colunas_do_csv_aux= df.columns.values
    global_number_of_columns = len(global_csv_columns)
    aux_csv_number_of_columns = len(colunas_do_csv_aux)
    #Normalize each CSV file so that all CSV files have the same columns
    for coluna_ in global_csv_columns:
       if search_column(colunas_do_csv_aux, coluna_)==False:
          #If the column does not exist in the current CSV, add an empty column with the correct header:
          df.insert(0, coluna_, "")
    #Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]
    lista_de_dataframes.append(df)
    del df
 big_unified_dataframe = pd.concat(lista_de_dataframes, copy=False).drop_duplicates().reset_index(drop=True)
 big_unified_dataframe.to_csv('global_file.csv', index=False)

#Create an additional txt file to present with each row of the CSV in a JSON format:
with open('global_file.csv', 'r') as arquivo_csv:
   with open('global_file_c.txt', 'w') as arquivo_txt:
      reader = csv.DictReader(arquivo_csv, global_csv_columns)
      iterreader = iter(reader)
      next(iterreader)
      for row in iterreader:
         out=json.dumps(row)
         arquivo_txt.write(out)

Now, on Windows and on CentOS, this works well for the final CSV since it has all the columns ordered as defined in the list:

global_csv_columns = ['Timestamp', 'a_country', 'b_country', 'call_setup_time','quality','latency','throughput','test_type']

This ordering is achieved by this code line:

#Order the dataframe columns according to the order of the global_csv_columns list:
    df = df[global_csv_columns]

But the final ‘txt’ file is different on CentOS. In CentOS the order is changed. Below the output of the txt file in both platforms (windows and CentOS).

Windows:

{"Timestamp": "06/09/2022 10:33", "a_country": "UAE", "b_country": "UAE", "call_setup_time": "7.847", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
{"Timestamp": "06/09/2022 10:30", "a_country": "Saudi_Arabia", "b_country": "Saudi_Arabia", "call_setup_time": "10.038", "quality": "", "latency": "", "throughput": "", "test_type": "voice_call"}
...

CentOS:

{"latency": "", "call_setup_time": "7.847", "Timestamp": "06/09/2022 10:33", "test_type": "voice_call", "throughput": "", "b_country": "UAE", "a_country": "UAE", "quality": ""}
{"latency": "", "call_setup_time": "10.038", "Timestamp": "06/09/2022 10:30", "test_type": "voice_call", "throughput": "", "b_country": "Saudi_Arabia", "a_country": "Saudi_Arabia", "quality": ""}
...

Is there any way to assure the column order in CentOS?

3

Answers


  1. try the pd.DataFrame.to_json function which allows you to write a dataframe to a json file directly. This will allow you to write a df to the json file without reading it from a csv file. I suspect this function may allow you to write without changing the order of the column.

    Login or Signup to reply.
  2. Your output JSON dictionaries aren’t sorted so the order in which the tags appear could be random. I think in practice the tags usually appear in the order in which they were created in each dictionary but if you can have the dictionaries sorted by tag:

    out=json.dumps(row, sort_keys=True)
    

    This will at least make them consistent although you may place more meaning on some tags.

    Login or Signup to reply.
  3. On CentOS I’m running: Python 2.7.18 On Windows I’m running: Python
    3.9.6

    Now reason is clear: order inside common dicts was added in python3.6 as implemention specific and is required to be furnished in python3.7 and newer.

    Read Are dictionaries ordered in Python 3.6+? if you want to know more.

    If you know which command/version/repository I should use to install a
    similar version on CentOS please let me know.

    Optimal solution would be to have same python versions up to minor, that is if you have 3.9.6 on your Windows machine then python3.9 on CentOS. If you are unable to install it python3.7 or python3.8 should do, however be warned that if you have both python2 and python3 installed on single machine, then you should use python3 if you want to use newer version, e.g.

    python3 helloworld.py
    

    where helloworld.py is file with python code.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search