skip to Main Content

I need to update the row with the CSV filename and the time data was inserted into the database.
I can use the below code to insert data from CSV into the database:

with open('my.csv', 'r') as f:    
next(f)
cur.copy_from(f, 'csv_import', sep=',')

but for my requirement along with the csv data, there are 2 more columns which needs to be updated.

  1. filename of the csv file
  2. timestamp when data was loaded

how can we achieve this?

3

Answers


  1. The timestamp values can be done with triggers and teh CSV file would need to be updated from the operational system ,if it is linux could be added to the execution plan batch file or cronschedule.
    Trigger :

    CREATE OR REPLACE FUNCTION csvvalues()
    RETURNS TRIGGER AS 
    $body$
    BEGIN 
    NEW.timestamp = NOW; 
    RETURN NEW;
    END 
    $body$
    LANGUAGE plpgsql ; 
    CREATE TRIGGER csvvalues BEFORE ISNERT ON csv_import
    FOR EACH ROW EXECUTE FUNCTION csvvalues();
    

    and command line to be executed on the file location :

           pqsl -U postgres -h 192.168.1.100 -p 5432 -d database -c "with open('my.csv', 'r') as f: next(f) cur.copy_from(f, 'csv_import', sep=',')"
    password
            mv file1.csvfile2.csv
    

    if it is out from the server you can install the psql of current postgresql version and run the command line from the operation system being used

    Login or Signup to reply.
  2. Setup:

    cat test.csv
    1,[email protected],"John Stokes"
    2,[email protected],"Emily Ray"
    
    create table csv_add(id integer, mail varchar, name varchar, file_name varchar, ts_added timestamptz);
    
    
    

    Code to add file name and timestamp:

    import csv
    from io import StringIO
    import psycopg2
    
    con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
    
    cur = con.cursor()
    
    with open('test.csv') as f:
        reader = csv.reader(f)
        file_name = f.name
        out = StringIO()
        for row in reader:
            out.write(','.join(row + [file_name, datetime.now().isoformat()])+'n')
        out.seek(0)
        cur.copy_from(out, 'csv_add', ',')
        con.commit()
    
    
    select * from csv_add ;
     id |        mail        |    name     | file_name |            ts_added            
    ----+--------------------+-------------+-----------+--------------------------------
      1 | [email protected]  | John Stokes | test.csv  | 12/18/2022 09:58:20.475244 PST
      2 | [email protected] | Emily Ray   | test.csv  | 12/18/2022 09:58:20.475256 PST
    
    
    
    
    Login or Signup to reply.
  3. Sorry, gonna stick to pseudo-code (hopefully got the right raw sql syntax, don’t usually write sql raw).

    What I have often seen done is to use 2 tables, one a staging, import table, one the real table.

    So…

    with open('my.csv', 'r') as f:    
        next(f)
        cur.copy_from(f, 'csv_import', sep=',')
    
    cur.execute("""
    insert into actual (f1, f2, ts, filename) 
    select f1, f2,%(ts)s,%(filename)s 
    from csv_import""", 
    dict(ts = time(), filename="my.csv")
    )
    
    cur.execute("delete from csv_import")
    
    

    Don’t try to get fancy with * on the right side to skip the field list – that depends on column order in db, which may change with Alter Tables later. Been there, done that – picking up after someone did just that.

    This supports higher volumes easily and does not clutter your db with triggers. You can also reuse the staging table to get feeds from elsewhere, say an email gateway.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search