skip to Main Content

With this schema:

CREATE TABLE tag (
  tag_id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  tag_slug text UNIQUE NOT NULL,
  tag_name text NOT NULL
);

I’m currently using Javascript to generate a bulk UPSERT command, like this (it uses knex.raw):

knex.raw(
  `
    INSERT INTO tag (tag_slug, tag_name)
    VALUES ${tags.map(() => `(?, ?)`).join(", ")}
    ON CONFLICT (tag_slug)
    DO UPDATE SET tag_name = excluded.tag_name
    RETURNING *
  `,
  tags.map((t) => [t.slug, t.text]).flat()
)

But I’d like to convert that to a stored procedure called upsert_tags that can be called like this (feel free to adjust the function signature to something equally – or more – ergonomic):

call upsert_tags(
  [
    ('first-tag', 'First Tag'),
    ('second-tag', 'Second Tag')
  ]
);

How can I do this?

This is my best attempt so far but it’s definitely not working!

CREATE PROCEDURE upsert_tags(tags array)
LANGUAGE SQL
AS $$
  INSERT INTO tag (tag_slug, tag_name) 
  VALUES (unnest(tags))
  ON CONFLICT (tag_slug)
  DO UPDATE SET tag_name = excluded.tag_name
  RETURNING *
$$

2

Answers


  1. As you have only 2 values

    CREATE TABLE tag (
      tag_id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
      tag_slug text UNIQUE NOT NULL,
      tag_name text NOT NULL
    );
    
    
    CREATE TABLE
    
    CREATE PROCEDURE upsert_tags(tags Text[])
    LANGUAGE SQL
    AS $$
      INSERT INTO tag (tag_slug, tag_name) 
      VALUES (tags[1],tags[2])
      ON CONFLICT (tag_slug)
      DO UPDATE SET tag_name = excluded.tag_name
      RETURNING *
    $$
    
    CREATE PROCEDURE
    
    call upsert_tags(Array ['A','B'])
    
    CALL
    
    SELECT * FROM tag
    
    tag_id tag_slug tag_name
    1 A B
    SELECT 1
    

    fiddle

    Login or Signup to reply.
  2. FUNCTION vs. PROCEDURE

    RETURNING * indicates You want to return a set of rows – a SETOF tag to be precise. The manual:

    Procedures do not return a function value; hence CREATE PROCEDURE lacks a RETURNS clause. However, procedures can instead return data to their callers via output parameters.

    Output parameters won’t cut it. A PROCEDURE is the wrong choice to begin with. Use a FUNCTION instead. See:

    Implementation

    There are many possible ways how to format input data.
    There are many possible ways how to treat conflicts exactly.
    There are many possible ways how to return data.

    We can make almost anything work. We can even make much of it dynamic and/or generic to work with varying tables / columns. (Think of possible future changes to the table …) See:

    The best solution depends on what you need exactly. Demonstrating two implementations.

    Implementation 1: Passing two separate Postgres arrays

    For just two columns it may be convenient to pass two separate arrays. Postgres has a dedicated variant of unnest() to unnest arrays in parallel. See:

    So:

    CREATE OR REPLACE FUNCTION public.f_upsert_tags1(_tag_slugs text[], _tag_names text[])
      RETURNS SETOF public.tag
      LANGUAGE sql AS
    $func$
    INSERT INTO public.tag AS t (tag_slug, tag_name) 
    SELECT *
    FROM   unnest($1, $2)
    ON     CONFLICT (tag_slug) DO UPDATE
    SET    tag_name = EXCLUDED.tag_name
    -- WHERE  t.tag_name <> EXCLUDED.tag_name  -- see below
    RETURNING *
    $func$;
    

    Call:

    SELECT *
    FROM   public.f_upsert_tags1('{first-tag, second-tag}'
                               , '{First Tag, Second Tag}');
    

    Returning SETOF public.tag returns complete resulting rows like your original. It introduces a dependency on the row type of the table, which has pros and cons …

    About passing arrays to a function:

    Implementation 2: Passing a JSON array of objects

    You mentioned Javascript, so it may be convenient to pass a JSON array of objects, which we then decompose with json_populate_recordset(). (There are other options like json_to_recordset() …)

    CREATE OR REPLACE FUNCTION public.f_upsert_tags2(_tags json)
      RETURNS SETOF public.tag
      LANGUAGE sql AS
    $func$
    INSERT INTO public.tag AS t (tag_slug, tag_name) 
    SELECT i.tag_slug, i.tag_name
    FROM   json_populate_recordset(null::public.tag, $1) i
    ON     CONFLICT (tag_slug) DO UPDATE
    SET    tag_name = EXCLUDED.tag_name
    -- WHERE  t.tag_name <> EXCLUDED.tag_name  -- see below
    RETURNING *
    $func$;
    

    Call:

    SELECT *
    FROM   public.f_upsert_tags2('[{"tag_slug":"first-tag", "tag_name":"First Tag2"}
                                 , {"tag_slug":"second-tag", "tag_name":"Second Tag2"}]');
    

    fiddle

    Concurrency? Performance?

    If the function may be called from multiple transactions concurrently (or concurrent, competing writes on the same table in any way), there are intricate race conditions.

    Either of these solutions (including your original) overwrites conflicting rows, even if tag_name does not change, which adds cost doing nothing useful. This matters if it happens a lot. We can skip that, but you still may want to get a complete set of output rows matching the input?

    For either of these issues see:

    If you also want to know whether each row was inserted or updated, see:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search