skip to Main Content

i seem to be a little confused on what to do. I have 3 csv files, each have the the common rows of name age and sex. First csv file shows the common data of the 3 and proceeds to have card details belonging to each user, 2nd csv file contain the same 3 common details and proceeds to contain vehicle details while the third csv file is the same and then proceed to have extra details of marital status. Now i want to upload all three in a single table to be queried on mysql with python for a project. Would it be wise of me to upload all three tables and use a join query to join all three tables or do i find a python code to merge all three tables and upload on the database. I am pretty new to this so if you can explain simple for me that would be appreciated.

if i’m gonna use a python code to merge them any ideas what would work ?

first_Name,Last_Name,Age,Sex,Vehicle_make,Vehicle_model,Vehicle_year,Vehicle_type
firstName,lastName,age,iban,credit_card_number,credit_card_security_code,credit_card_start_date,credit_card_end_date,address_main,address_city,address_postcode
firstName,lastName,age,sex,retired,dependants,marital_status,salary,pension,company,commute_distance,address_postcode

here is what each row looks like and the need to merge them all. I’d also need to find a way to sort out which i could use a primary key. i have about 3-4 rows that will have identical details so i cant really say which i could use.

any ideas would be great.

2

Answers


  1. Try use Pandas to merge your data like this:

    import pandas as pd
    
    # read each file to Pandas DataFrame
    df1 = pd.read_csv('file1.csv')
    df2 = pd.read_csv('file2.csv')
    df3 = pd.read_csv('file3.csv')
    
    # concatenate frames with equal structure
    df = pd.concat([df1, df3])
    
    # merge frames by common columns
    df_merged = df.merge(df2, how='outer', on=['firstName', 'lastName', 'age'])
    df_merged.to_csv('merged.csv', sep=',')
    
    

    You can not only save merged frame to csv but directly to database with df.to_sql() method. Also it is possible to manipulate with your data as you wish.

    Login or Signup to reply.
  2. Try to solve your problem using standard Python tools

    import csv
    
    
    files= ['file1.csv', 'file2.csv', 'file3.csv']
    
    rows = list()
    column_names = list()
    
    """
    Add each row of file to list as list of dictionaries,
    e.g. [{'field1': 'value'}, {'field2': 'value'}, ...]
    and add column names from header of each file to another sequence
    """
    for f in files:
        with open(f, newline='', encoding='utf8') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                rows.append(row)
            for item in reader.fieldnames:
                if item not in column_names:
                    column_names.append(item)
    
    """
    Write rows to new csv file from previously formed dictionaries
    """
    with open('merged.csv', 'w', newline='', encoding='utf8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=column_names)
        writer.writeheader()
        writer.writerows(rows)
    
    

    It’s not so elegant but should works.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search