skip to Main Content

I have the following table:

create table foo (id serial primary key, ts date, value int);

Populated with 10k records already. Now I’m trying to find a way to efficiently update multiple rows in one go, so I wrote this code to compare two approaches (references: here for the first method, here for the second):

import asyncpg
import asyncio
import time

async def main():
    c = await asyncpg.connect(user='foo', password='bar', database='foodb', host='192.168.100.1')

    # prepare data
    arr = []
    arr2 = [[], []]
    for i in range(10001):
        row = (i+1, i)
        arr.append(row)
        arr2[0].append(row[0])
        arr2[1].append(row[1])


    # using FROM VALUES
    values = ",".join([ f"{i}" for i in arr ])
    q = f"UPDATE foo SET value = v.value FROM(VALUES {values}) AS v(id, value) WHERE v.id = foo.id;"
    start_time = time.time()
    await c.execute(q)
    print(f"from values: {time.time() - start_time}")

    # using UNNEST
    q = "UPDATE foo SET value = v.value FROM ( SELECT * FROM UNNEST($1::int[], $2::int[])) AS v(id, value) WHERE foo.id = v.id;"
    start_time = time.time()
    await c.execute(q, *arr2)
    print(f"unnest: {time.time() - start_time}")

if __name__ == '__main__':
    asyncio.run(main())

The first method is consistently much faster than the second.
The thing I don’t like about the first method is that (as far as I can tell – corrections welcome) it cannot be used with $x placeholders for the VALUES part, so one is forced to manually build the string (which is also unsafe from an SQL injection point of view I believe).

So my questions:

  • can VALUES be built dynamically and used with placeholders?
  • is UNNEST really slower or am I doing something wrong?

3

Answers


  1. I didn’t profile UNNEST performance, but as for your 1st question (regarding $x phaceolders) con.executemany is a proper choice for multiple updates:

    q = "UPDATE foo SET value = $2 WHERE id = $1;"
    await c.executemany(q, arr)
    
    Login or Signup to reply.
  2. When you are unable to bind arrays, my work around is to dynamically build the bind tokens, not the whole substitution…

    A rough and ready example being…

    arr = []
    for i in range(10001):
      arr.append(i+1)
      arr.append(i)
    
    tokens = ",".join([ "(?,?)" for i in range(10001) ]
    q = f"UPDATE foo SET value = v.value FROM(VALUES {tokens}) AS v(id, value) WHERE v.id = foo.id;"
    
    await c.execute(q, *arr)
    
    Login or Signup to reply.
  3. I can reproduce this in v14, but only when JIT is enabled, and only on an older compilation. I assume that is because newer compilations use a newer version of some llvm-dependent library (but both report the same version of llvm itself). The slow query is not the UNNEST query itself, but rather some introspection query which asyncpg uses, apparently to determine the types of the arguments to UNNEST. Whatever the change was, I don’t know how to undo it for further testing.

    Since I can’t get the slowness to reproduce in new compilations, the fact that v15 is faster is ambiguous; either something was fixed in v15, or just because it was compiled more recently.

    Since I think having jit be on by default was a mistake (by the PostgreSQL developers) in the first place, I would say you should just turn jit off and be done with it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search