Consider the following example:
There are 3 tables: clients
(1) -> (n) orders
(1) -> (n) order_updates
.
I want to update clients
based on the corresponding order_updates
with JOIN
(which produces duplicates for clients
data):
UPDATE "clients"
SET "clients"."status" = 'active'
FROM "orders"
JOIN "orders_updates" ON "orders.id" = "order_updates"."order_id"
WHERE "orders"."client_id" = "clients"."id" AND
"order_updates"."created_at" > (CURRENT_DATE - INTERVAL '1 day')::date
Is it going to perform the same way (under the hood) as the following statement, which does not produce duplicates for the UPDATE
statement?
UPDATE "clients"
SET "clients"."status" = 'active'
WHERE "clients"."id" IN (
SELECT "orders"."client_id"
JOIN "orders_updates" ON "orders.id" = "order_updates"."order_id"
WHERE "orders"."client_id" = "clients"."id" AND
"order_updates"."created_at" > (CURRENT_DATE - INTERVAL '1 day')::date
)
Which one is preferable?
2
Answers
By default, the effectiveness will depend on the available indexes on the tables.
I think the 2nd option is preferable.
Provide the request execution plan(s). They can be analyzed and compared.
Consider another request option.
Between the two approaches, the second one using the IN clause with a subquery is generally preferable in situations like this. This is because it avoids the possibility of duplicate updates for the same client, which can be both inefficient and potentially lead to incorrect results.