Mysql - Upsert using DuckDB

KumarSandeep
March 28, 2024
77 views
1 vote
2 Answers

Let assume I have two parquet file aka main and updates with some data as shown below. I need to implement upsert kind of operation in duck db.

parquet file main: id, name, city

data:
   id     name     city 
    1      a         p
    2      b         q
    3      c         r

parquet file updates: id, name, city

data:
   id     name     city
    1      a         m
    4      b         q

desired output:-

   id     name     city
    1      a         m  <----update city
    2      b         q
    3      c         r
    4      b         q  <----insert

currently I am doing the same using below queries:-

create table main as select * from '/tmp/main.parquet';
create table stage as  select * from '/tmp/updates.parquet';
delete from main using stage where main.id=stage.id;
insert into main select * from stage;
COPY main TO '/tmp/final.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');

but only thing is create table will load and keep all the data in memory what I don’t want as the main file may contain 8-10 million records, is there any way where I can achive the same using joins only and avoid creating main and stage tables.

Answers

Chosen as BEST ANSWER
- KumarSandeep
- March 28, 2024 at 12:17 pm
- 0 votes
0
Here is the query, one point here is that this will consume less memory and will be fast as compared to creating duck db tables as given in original question but consume more cpu cores.
```
COPY(
     select
     CASE WHEN a.id is NULL  THEN b.id ELSE a.id END as id,
     CASE WHEN b.id is NOT NULL THEN b.name ELSE a.name END as name,
     CASE WHEN b.id is NOT NULL THEN b.city ELSE a.city END as city
     from (SELECT  * FROM 'main.parquet') a full outer join (SELECT  * 
     FROM 'updates.parquet') b on a.id=b.id
    )
    TO '/tmp/output.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
```

(Edit)

- ThijsBruineman
- March 27, 2024 at 12:23 pm
- 0 votes
0
DuckDB has upsert support, see the docs: https://duckdb.org/docs/sql/statements/insert#on-conflict-clause

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – Upsert using DuckDB

Answers