skip to Main Content

I’m trying to update rows in a single table by splitting them into two "sets" of rows.

The top part of the set should have a status set to X and the bottom one should have a status set to status Y.

I’ve tried putting together a query that looks like this

WITH x_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        LIMIT 5
), y_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
        ((SELECT id from x_status), 'X'),
        ((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);

When I run this query I get the following error:

pq: more than one row returned by a subquery used as an expression

This makes sense, folks.ids is expected to return a list of IDs, hence the IN clause in the UPDATE statement, but I suspect the problem is I can not return the list in the values statement in the FROM clause as it turns into something like this:

(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)

Is there a way how this UPDATE can be done using a CTE query at all? I could split this into two separate UPDATE queries, but CTE query would be better and in theory faster.

2

Answers


  1. Chosen as BEST ANSWER

    So after more tinkering, I've come up with a solution.

    The problem with why the previous query fails is we are not grouping the IDs in the subqueries into arrays so the result expands into a huge list as I suspected.

    The solution is grouping the IDs in the subqueries into ARRAY -- that way they get returned as a single result (tuple) in ids value.

    This is the query that does the job. Note that we must unnest the IDs in the WHERE clause:

    WITH x_status AS (
            SELECT id
            FROM people
            WHERE surname = 'foo'
            ORDER BY date_registered DESC
            LIMIT 5
    ), y_status AS (
            SELECT id
            FROM people
            WHERE surname = 'foo'
            ORDER BY date_registered DESC
            OFFSET 5
    )
    UPDATE people
    SET status = folks.status
    FROM (values
            (ARRAY(SELECT id from x_status), 'X'),
            (ARRAY(SELECT id from y_status), 'Y')
    ) as folks (ids, status)
    WHERE id IN (SELECT * from unnest(folks.ids));
    

  2. I think I understand now… if I get your problem, you want to set the status to ‘X’ for the oldest five records and ‘Y’ for everything else?

    In that case I think the row_number() analytic would work — and it should do it in a single pass, two scans, and eliminating one order by. Let me know if something like this does what you seek.

    with ranked as (
      select
        id, row_number() over (order by date_registered desc) as rn
      from people
    )
    update people p
    set
      status = case when r.rn <= 5 then 'X' else 'Y' end
    from ranked r
    where
      p.id = r.id
    

    Any time you do an update from another data set, it’s helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.

    Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.

    Let me know if I missed the boat.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search