I’m trying to update rows in a single table by splitting them into two "sets" of rows.
The top part of the set should have a status set to X
and the bottom one should have a status set to status Y
.
I’ve tried putting together a query that looks like this
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
((SELECT id from x_status), 'X'),
((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);
When I run this query I get the following error:
pq: more than one row returned by a subquery used as an expression
This makes sense, folks.ids
is expected to return a list of ID
s, hence the IN
clause in the UPDATE
statement, but I suspect the problem is I can not return the list in the values
statement in the FROM
clause as it turns into something like this:
(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)
Is there a way how this UPDATE
can be done using a CTE query at all? I could split this into two separate UPDATE
queries, but CTE query would be better and in theory faster.
2
Answers
So after more tinkering, I've come up with a solution.
The problem with why the previous query fails is we are not grouping the
ID
s in the subqueries into arrays so the result expands into a huge list as I suspected.The solution is grouping the
ID
s in the subqueries intoARRAY
-- that way they get returned as a single result (tuple) inids
value.This is the query that does the job. Note that we must
unnest
theID
s in theWHERE
clause:I think I understand now… if I get your problem, you want to set the status to ‘X’ for the oldest five records and ‘Y’ for everything else?
In that case I think the
row_number()
analytic would work — and it should do it in a single pass, two scans, and eliminating oneorder by
. Let me know if something like this does what you seek.Any time you do an update from another data set, it’s helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.
Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.
Let me know if I missed the boat.