Postgresql - More than one row returned by a subquery used as an expression when UPDATE on multiple rows

milosgajdos
September 21, 2022
229 views
0 votes
2 Answers

I’m trying to update rows in a single table by splitting them into two "sets" of rows.

The top part of the set should have a status set to X and the bottom one should have a status set to status Y.

I’ve tried putting together a query that looks like this

WITH x_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        LIMIT 5
), y_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
        ((SELECT id from x_status), 'X'),
        ((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);

When I run this query I get the following error:

pq: more than one row returned by a subquery used as an expression

This makes sense, folks.ids is expected to return a list of IDs, hence the IN clause in the UPDATE statement, but I suspect the problem is I can not return the list in the values statement in the FROM clause as it turns into something like this:

(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)

Is there a way how this UPDATE can be done using a CTE query at all? I could split this into two separate UPDATE queries, but CTE query would be better and in theory faster.

Tags: postgresql sql

Answers

Chosen as BEST ANSWER
- milosgajdos
- September 21, 2022 at 1:29 pm
- 0 votes
0
So after more tinkering, I've come up with a solution.

The problem with why the previous query fails is we are not grouping the IDs in the subqueries into arrays so the result expands into a huge list as I suspected.

The solution is grouping the IDs in the subqueries into ARRAY -- that way they get returned as a single result (tuple) in ids value.

This is the query that does the job. Note that we must unnest the IDs in the WHERE clause:
```
WITH x_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        LIMIT 5
), y_status AS (
        SELECT id
        FROM people
        WHERE surname = 'foo'
        ORDER BY date_registered DESC
        OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
        (ARRAY(SELECT id from x_status), 'X'),
        (ARRAY(SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (SELECT * from unnest(folks.ids));
```

(Edit)

- Hambone
- September 21, 2022 at 4:31 pm
- 0 votes
0
I think I understand now… if I get your problem, you want to set the status to ‘X’ for the oldest five records and ‘Y’ for everything else?

In that case I think the row_number() analytic would work — and it should do it in a single pass, two scans, and eliminating one order by. Let me know if something like this does what you seek.
```
with ranked as (
  select
    id, row_number() over (order by date_registered desc) as rn
  from people
)
update people p
set
  status = case when r.rn <= 5 then 'X' else 'Y' end
from ranked r
where
  p.id = r.id
```
Any time you do an update from another data set, it’s helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.

Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.

Let me know if I missed the boat.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – More than one row returned by a subquery used as an expression when UPDATE on multiple rows

Answers