skip to Main Content

I have the following example data structure of customer that can be part of multiple groups using a junction table and data:

CREATE TABLE customer (
    id INT NOT NULL
);

CREATE TABLE groups (
    id INT NOT NULL
);

CREATE TABLE customers_to_groups (
    id serial,
    group_id INT,
    customer_id INT
);

INSERT INTO customer(id) VALUES(0);
INSERT INTO customer(id) VALUES(1);
INSERT INTO customer(id) VALUES(2);
INSERT INTO customer(id) VALUES(3);
INSERT INTO groups(id) VALUES(1);
INSERT INTO groups(id) VALUES(3);
INSERT INTO groups(id) VALUES(5);
INSERT INTO groups(id) VALUES(6);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(0, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(0, 5);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(1, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(1, 90);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(2, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 3);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 5);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 90);

I need to get customers that have specific groups they are part of, and I need to get a list of all customers that are part of at least 1 group in multiple lists of group. For example I want to get all customers that are in group [5 OR 6] AND [1 OR 3], so for example a customer in group 5 and 1 wold be a returned, but somebody in group 1 and 90 or just group 1 not. With the provided sample data we would get the customer of id 0 and 3 only as they conform to the given rules above.

Just doing WHERE group_id IN (5,6) AND group_id IN (1,3) does not seem to work, so I am looking for alternative.

I got this so far that works:

SELECT DISTINCT c.id
FROM customer c
INNER JOIN customers_to_groups at1 ON c.id = at1.customer_id
INNER JOIN customers_to_groups at2 ON c.id = at2.customer_id
WHERE at1.group_id IN (5, 6)
  AND at2.group_id IN (1, 3);

Expected Results:

id
0
3

Is there a way to do it that is more performant?

2

Answers


  1. You can achieve the desired result with a more optimized query by using a GROUP BY and HAVING clause. This approach avoids the need for multiple self-joins

    SELECT customer_id
    

    FROM customers_to_groups
    WHERE group_id IN (5, 6, 1, 3)
    GROUP BY customer_id
    HAVING COUNT(DISTINCT CASE WHEN group_id IN (5, 6) THEN 1 END) > 0
    AND COUNT(DISTINCT CASE WHEN group_id IN (1, 3) THEN 1 END) > 0;

    Index Consideration:
    To further improve performance, ensure that you have an index on the customer_id and group_id columns in the customers_to_groups table:CREATE INDEX idx_customer_group ON customers_to_groups (customer_id, group_id);

    Login or Signup to reply.
  2. We can GROUP BY customer’s id and use a HAVING clause. There we can use CASE or FILTER if your RDBMS supports it. Postgres should.

    There your conditions will be set.

    We need to JOIN the customers_to_groups only once, so this will likely be faster.

    The query will be:

    SELECT c.id 
    FROM customer c
    INNER JOIN customers_to_groups ctg
      ON c.id = ctg.customer_id
    GROUP BY c.id
    HAVING 
      COUNT(CASE WHEN ctg.group_id IN (1,3) THEN 1 END) > 0
      AND COUNT(CASE WHEN ctg.group_id IN (5,6) THEN 1 END) > 0;
    

    or

    SELECT c.id 
    FROM customer c
    INNER JOIN customers_to_groups ctg
      ON c.id = ctg.customer_id
    GROUP BY c.id
    HAVING 
      COUNT(*) FILTER(WHERE ctg.group_id IN (1,3)) > 0
      AND COUNT(*) FILTER(WHERE ctg.group_id IN (5,6)) > 0;
    

    See this demo with your sample data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search