I demonstrate the issue with the following simple table. (The actual table and JSON document have a lot more fields.)
CREATE table contact (
id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
firstname VARCHAR,
lastname VARCHAR,
birthday timestamp with time zone NOT NULL
);
I am trying to insert into this table from a JSONB using jsonb_populate_record
:
INSERT INTO contact (firstname, lastname, birthday)
SELECT (jsonb_populate_record(NULL::contact,
'{
"firstname": "John",
"lastname": "Doe",
"birthday": "2023-09-28"
}'
)).*;
This fails with error:
ERROR: INSERT has more expressions than target columns
LINE 1: …O contact (firstname, lastname, birthday) SELECT
jsonb_popu…
I understand that the error is coming from the JSONB not containing the id
key and value. So, the jsonb_populate_record
function is creating a record with the id
column but since it’s not specified in the INSERT
, it fails.
I have tried adding id
into the INSERT INTO contact (id, ....
but that fails with error that the id
is null (because my JSONB doesn’t have any id
in it).
How can I make it work so that the id
is GENERATED BY DEFAULT AS IDENTITY
as specified in the table?
EDIT:
I was able to find a solution:
INSERT INTO contact (id, firstname, lastname, birthday)
SELECT (jsonb_populate_record(NULL::contact, jsonb_set('{
"firstname": "John",
"lastname": "Doe",
"birthday": "2023-09-28"
}', '{id}', to_jsonb(nextval(pg_get_serial_sequence('contact', 'id')))))).*;
However, the id
generated after this seems to be skipping 4 digits:
select * from contact;
id | firstname | lastname | birthday
----+-----------+----------+------------------------
10 | John | Doe | 2023-09-28 00:00:00-04
14 | John | Doe | 2023-09-28 00:00:00-04
18 | John | Doe | 2023-09-28 00:00:00-04
22 | John | Doe | 2023-09-28 00:00:00-04
Any idea why and how to avoid this?
2
Answers
Decomposing the record returned from a function immediately results in a query plan where the function is evaluated once for every column. (
SELECT (func()).*
) The effect gets much worse for your actual JSON document with many columns. This shortcoming of the Postgres query planer wastes performance and can have unfortunate side effects like you observed: it fetches the next serial number from the sequence with every function call.To avoid that, call the function once in a subquery instead and only decompose on the next query level like this:
Related:
Note: this addresses the particular issue you asked for. Omitting the
id
column from theINSERT
, so it’s filled with its default value is a more efficient solution – if you are at liberty to register a row type for the purpose. See Frank’s answer.Actually, you should do both. Evaluating the function multiple times would still be a pointless waste, even when avoiding the visible side effects.
There is another solution, one that doesn’t touch the sequence by itself. Just create a custom data type, without the id and use this type:
By the way, why do use a timestamp for the date of birth?