skip to Main Content

Consider the following example:
There are 3 tables: clients (1) -> (n) orders (1) -> (n) order_updates.
I want to update clients based on the corresponding order_updates with JOIN (which produces duplicates for clients data):

UPDATE "clients"
SET "clients"."status" = 'active'
FROM "orders"
JOIN "orders_updates" ON "orders.id" = "order_updates"."order_id"
WHERE "orders"."client_id" = "clients"."id" AND 
      "order_updates"."created_at" > (CURRENT_DATE - INTERVAL '1 day')::date

Is it going to perform the same way (under the hood) as the following statement, which does not produce duplicates for the UPDATE statement?

UPDATE "clients"
SET "clients"."status" = 'active'
WHERE "clients"."id" IN (
  SELECT "orders"."client_id"
  JOIN "orders_updates" ON "orders.id" = "order_updates"."order_id"
  WHERE "orders"."client_id" = "clients"."id" AND 
        "order_updates"."created_at" > (CURRENT_DATE - INTERVAL '1 day')::date
)

Which one is preferable?

2

Answers


  1. By default, the effectiveness will depend on the available indexes on the tables.

    I think the 2nd option is preferable.
    Provide the request execution plan(s). They can be analyzed and compared.

    Consider another request option.

    UPDATE "clients"
    SET "clients"."status" = 'active'
    WHERE EXISTS (
      SELECT "orders"."client_id"
      JOIN "orders_updates" ON "orders.id" = "order_updates"."order_id"
      WHERE "orders"."client_id" = "clients"."id" AND 
            "order_updates"."created_at" > (CURRENT_DATE - INTERVAL '1 day')::date
    )
    
    Login or Signup to reply.
  2. Between the two approaches, the second one using the IN clause with a subquery is generally preferable in situations like this. This is because it avoids the possibility of duplicate updates for the same client, which can be both inefficient and potentially lead to incorrect results.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search