skip to Main Content

I have a big script, the result is that the data is stored in a dataframe and then in csv. Then csv is opened and written to PostgreSQL. But there is a problem that the data type of one column is int4, and after opening csv the column format is ‘text’. I cannot change the data type in the database, they must be there exactly as int. Tell me pls how to do it.

total.to_csv("C:/Users/.../total_type19.csv", index = False, sep =';')

conn5 = psycopg2.connect(dbname='', user='',
                       password='', host='', port = '')
cursor5 = conn5.cursor()


with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
    reader = csv.reader(file, delimiter = ";")
    for row in reader:
        # print(row)
        cursor5.execute(
            'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
            (row[0], row[1], row[2], row[3], row[4], row[5]))

conn5.commit()
   

The test_id column must be in int4 format

['312229', "['[{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3},{from:[5,8],to:[9,8],color:5},{from:[5,11],to:[10,11],color:6},{from:[1,0],to:[1,11],color:0},{from:[10,1],to:[10,6],color:4},{from:[3,0],to:[8,0],color:1}],']", '['["v","b","c","c","a","h","i","e","r","s","f","j"],["d","i","w","s","s","r","i","f","y","y","f","c"],["j","b","m","w","d","q","s","q","t","w","e","m"],["x","l","m","m","l","s","o","x","d","q","u","t"],["l","i","f","p","l","a","c","e","t","u","t","o"],["m","o","s","b","r","t","c","y","z","v","r","r"],["j","t","x","c","a","r","t","a","b","l","e","o"],["b","h","k","m","d","b","r","y","q","u","i","y"],["y","è","s","r","h","g","o","m","m","e","w","h"],["u","q","p","c","s","c","x","b","k","e","d","o"],["u","u","o","l","q","v","y","y","b","y","e","h"],["r","e","o","u","j","b","u","r","e","a","u","k"]],']', '['"#ff0000","#00fe00","#0000ff","#d2ea9a","#407f76","#211f95","#e1f233"']', '['"place","cartable","gomme","bureau","bibliothèque","feutre","cahier"']']

This is an example of one line from csv. Looks bad but that’s the way it should be

2

Answers


  1. Can you change your data to int or is it something like "m22" non-integer?

    # to remove non-numeric digits from string
    with open("C:/Users/.../total_type19.csv", "r",encoding = 'utf-8') as file:
        reader = csv.reader(file, delimiter = ";")
        header = next(reader )
        print(f"HEADER {header}")
        counter = 1 #or whatever number you want to start with
        for row in reader:
            print(row)
            test_id =row[0]
            test_id = ''.join([i for i in test_id if i.isdigit()])
            if test_id == '':
                counter +=1
                test_id = counter
            else:
                test_id = int(test_id)
            print(test_id)
            cursor5.execute(
                'INSERT INTO interaction_fillword (test_id,data_size,data_matrix, data_words_selection, data_colors, data_answers) VALUES(%s,%s, %s, %s, %s, %s)',
                (test_id, row[1], row[2], row[3], row[4], row[5]))
    
    Login or Signup to reply.
  2. Use copy_expert from `psycopg2.

    import psycopg2
    
    conn5 = psycopg2.connect(dbname='', user='',
                           password='', host='', port = '')
    cursor5 = conn5.cursor()
    
    with open("C:/Users/.../total_type19.csv", "r") as csv_file:
       cursor5.copy_expert("COPY interaction_fillword FROM STDIN WITH CSV HEADER", csv_file)
    
    

    The CSV HEADER will do a couple of things:

    1. Skip the header line automatically.
    2. Take empty non-quoted strings as NULL.

    copy_expert uses the Postgres COPY to do bulk data import(or export) a lot quicker then inserting. The down side is that COPY is all or nothing, either the entire import/export succeeds or a single error will rollback the entire thing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search