skip to Main Content

Using psycopg2, I could write large results as CSV using copy_expert and a BytesIO buffer like this with pandas:

copy_sql = "COPY (SELECT * FROM big_table) TO STDOUT CSV"

buffer = BytesIO()
cursor.copy_expert(copy_sql, buffer, size=8192)
buffer.seek(0)
pd.read_csv(buffer, engine="c").to_excel(self.output_file)

However, I can’t figure out how to replace the buffer in copy_expert with psycopg3‘s new copy command. Has anyone figured out a way to do this?

2

Answers


  1. Chosen as BEST ANSWER

    The key to writing a large query to a file through psycopg3 in this fashion is to use a SpooledTemporaryFile, which will limit the amount of memory usage in Python (see max_size). Then after the CSV is written to disk, convert with pandas.

    from tempfile import SpooledTemporaryFile
    from pandas import read_csv
    from psycopg import connect
    
    cursor = connect([connection]).cursor()
    copy_sql = "COPY (SELECT * FROM stocks WHERE price > %s) TO STDOUT"
    price = 100
    
    with SpooledTemporaryFile(
        mode="wb",
        max_size=65546,
        buffering=8192,
    ) as tmpfile:
        with cursor.copy(copy_sql, (price,)) as copy:
            for data in copy:
                tmpfile.write(data)
        tmpfile.seek(0)
    
        read_csv(tmpfile, engine="c").to_excel("my_spreadsheet.xlsx")
    

  2. Data and table setup:

    cat test.csv                                                                                                                                                            
    1,[email protected],"John Stokes"
    2,[email protected],"Emily Ray"
    
    create table test_csv (id integer, mail varchar, name varchar);
    
    import psycopg
    
    with open('test.csv') as f:
        with cur.copy("COPY test_csv FROM STDIN WITH CSV") as copy:
            while data := f.read(1000):
                copy.write(data)
    con.commit()
    

    Using a buffer:

    buffer = BytesIO()
    with cur.copy('copy test_csv to stdout') as copy:
        for data in copy:
            buffer.write(data)
    buffer.seek(0)
    pd.read_csv(buffer, engine="c").to_excel('test_csv.xlsx')
    

    Since you are using Pandas there is also:

    
    from sqlalchemy import create_engine
    engine = create_engine('postgresql://postgres:@localhost:5432/test')
    pd.read_sql('select * from test_csv', engine).to_excel('test_csv.xlsx')
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search