how to add filename and date when importing data from csv to postgres DB using python - Postgresql

ArpitRathod
December 17, 2022
150 views
3 votes
3 Answers

I need to update the row with the CSV filename and the time data was inserted into the database.
I can use the below code to insert data from CSV into the database:

with open('my.csv', 'r') as f:    
next(f)
cur.copy_from(f, 'csv_import', sep=',')

but for my requirement along with the csv data, there are 2 more columns which needs to be updated.

filename of the csv file
timestamp when data was loaded

how can we achieve this?

Answers

- mwalter
- December 17, 2022 at 9:50 pm
- 0 votes
0
The timestamp values can be done with triggers and teh CSV file would need to be updated from the operational system ,if it is linux could be added to the execution plan batch file or cronschedule.
Trigger :
```
CREATE OR REPLACE FUNCTION csvvalues()
RETURNS TRIGGER AS 
$body$
BEGIN 
NEW.timestamp = NOW; 
RETURN NEW;
END 
$body$
LANGUAGE plpgsql ; 
CREATE TRIGGER csvvalues BEFORE ISNERT ON csv_import
FOR EACH ROW EXECUTE FUNCTION csvvalues();
```
and command line to be executed on the file location :
```
       pqsl -U postgres -h 192.168.1.100 -p 5432 -d database -c "with open('my.csv', 'r') as f: next(f) cur.copy_from(f, 'csv_import', sep=',')"
password
        mv file1.csvfile2.csv
```
if it is out from the server you can install the psql of current postgresql version and run the command line from the operation system being used
Login or Signup to reply.

Setup:

cat test.csv
1,[email protected],"John Stokes"
2,[email protected],"Emily Ray"

create table csv_add(id integer, mail varchar, name varchar, file_name varchar, ts_added timestamptz);

Code to add file name and timestamp:

import csv
from io import StringIO
import psycopg2

con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)

cur = con.cursor()

with open('test.csv') as f:
    reader = csv.reader(f)
    file_name = f.name
    out = StringIO()
    for row in reader:
        out.write(','.join(row + [file_name, datetime.now().isoformat()])+'n')
    out.seek(0)
    cur.copy_from(out, 'csv_add', ',')
    con.commit()


select * from csv_add ;
 id |        mail        |    name     | file_name |            ts_added            
----+--------------------+-------------+-----------+--------------------------------
  1 | [email protected]  | John Stokes | test.csv  | 12/18/2022 09:58:20.475244 PST
  2 | [email protected] | Emily Ray   | test.csv  | 12/18/2022 09:58:20.475256 PST

- JLPeyret
- December 19, 2022 at 7:42 pm
- 0 votes
0
Sorry, gonna stick to pseudo-code (hopefully got the right raw sql syntax, don’t usually write sql raw).

What I have often seen done is to use 2 tables, one a staging, import table, one the real table.

So…
```
with open('my.csv', 'r') as f:    
    next(f)
    cur.copy_from(f, 'csv_import', sep=',')

cur.execute("""
insert into actual (f1, f2, ts, filename) 
select f1, f2,%(ts)s,%(filename)s 
from csv_import""", 
dict(ts = time(), filename="my.csv")
)

cur.execute("delete from csv_import")
```
Don’t try to get fancy with * on the right side to skip the field list – that depends on column order in db, which may change with Alter Tables later. Been there, done that – picking up after someone did just that.

This supports higher volumes easily and does not clutter your db with triggers. You can also reuse the staging table to get feeds from elsewhere, say an email gateway.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

how to add filename and date when importing data from csv to postgres DB using python – Postgresql

Answers