skip to Main Content

I would like to write CSV data directly from a bytes (or string) object in memory to duckdb database file (i.e. I want to avoid having to write and read the temporary .csv files). This is what I’ve got so far:

import io 
import duckdb 

data = b'a,b,cn0,1,2n3,4,5'
rawtbl = duckdb.read_csv(
    io.BytesIO(data), header=True, sep=","
)

con = duckdb.connect('some.db')
con.sql('CREATE TABLE foo AS SELECT * FROM rawtbl')

which throws following exception:

---------------------------------------------------------------------------
IOException                               Traceback (most recent call last)
Cell In[1], line 10
      5 rawtbl = duckdb.read_csv(
      6     io.BytesIO(data), header=True, sep=","
      7 )
      9 con = duckdb.connect('some.db')
---> 10 con.sql('CREATE TABLE foo AS SELECT * FROM rawtbl')

IOException: IO Error: No files found that match the pattern "DUCKDB_INTERNAL_OBJECTSTORE://2843be5a66472f9c"

However, it is possible to do:

>>> duckdb.sql('CREATE TABLE foo AS SELECT * FROM rawtbl')
>>> duckdb.sql('show tables')
┌─────────┐
│  name   │
│ varchar │
├─────────┤
│ foo     │
└─────────┘
>>> duckdb.sql('SELECT * from foo')
┌───────┬───────┬───────┐
│   a   │   b   │   c   │
│ int64 │ int64 │ int64 │
├───────┼───────┼───────┤
│     0 │     1 │     2 │
│     3 │     4 │     5 │
└───────┴───────┴───────┘

since rawtbl is a duckdb.duckdb.DuckDBPyRelation object. But that is the in-memory duckdb database, not the ‘some.db’ file.

Question

How to read csv data directly from bytes (or a string) to duckdb database file, without using intermediate CSV files?

Versions

duckdb 0.10.2 on Python 3.12.2 on Ubuntu

2

Answers


  1. I’m not entirely sure on best practices, but I did manage to get it to work with ATTACH.

    duckdb.sql("attach 'some.db'")
    

    duckdb_databases() gave me the names.

    duckdb.sql("from duckdb_databases()")
    
    ┌───────────────┬──────────────┬─────────┬───┬──────────┬─────────┬──────────┐
    │ database_name │ database_oid │  path   │ … │ internal │  type   │ readonly │
    │    varchar    │    int64     │ varchar │   │ boolean  │ varchar │ boolean  │
    ├───────────────┼──────────────┼─────────┼───┼──────────┼─────────┼──────────┤
    │ memory        │         1080 │ NULL    │ … │ false    │ duckdb  │ false    │
    │ some          │         1489 │ some.db │ … │ false    │ duckdb  │ false    │
    │ system        │            0 │ NULL    │ … │ true     │ duckdb  │ false    │
    │ temp          │         1479 │ NULL    │ … │ true     │ duckdb  │ false    │
    ├───────────────┴──────────────┴─────────┴───┴──────────┴─────────┴──────────┤
    │ 4 rows                                                 7 columns (6 shown) │
    └────────────────────────────────────────────────────────────────────────────┘
    

    We can then USE the file-backed db and select the data from rawtbl.

    duckdb.sql("""
    use "some";
    create table foo as (from rawtbl)
    """)
    

    Check the result:

    con = duckdb.connect("some.db")
    con.sql("from foo")
    
    ┌───────┬───────┬───────┐
    │   a   │   b   │   c   │
    │ int64 │ int64 │ int64 │
    ├───────┼───────┼───────┤
    │     0 │     1 │     2 │
    │     3 │     4 │     5 │
    └───────┴───────┴───────┘
    
    Login or Signup to reply.
  2. You can create the connection before you use read_csv and pass the connection into it.

    import io
    import duckdb
    from pathlib import Path
    
    
    data = b'a,b,cn0,1,2n3,4,5'
    db_path = 'some.db'
    Path(db_path).unlink(missing_ok=True)
    
    with duckdb.connect(db_path) as con:
        rawtbl = duckdb.read_csv(
            io.BytesIO(data), header=True, sep=",", connection=con,
        )
    
        con.execute('''
            CREATE TABLE foo as select * from rawtbl
        ''')
    
    with duckdb.connect(db_path) as con:
        res = con.sql('select * from foo')
        print(res)
    
    # ┌───────┬───────┬───────┐
    # │   a   │   b   │   c   │
    # │ int64 │ int64 │ int64 │
    # ├───────┼───────┼───────┤
    # │     0 │     1 │     2 │
    # │     3 │     4 │     5 │
    # └───────┴───────┴───────┘
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search