I have the following example data structure of customer that can be part of multiple groups using a junction table and data:
CREATE TABLE customer (
id INT NOT NULL
);
CREATE TABLE groups (
id INT NOT NULL
);
CREATE TABLE customers_to_groups (
id serial,
group_id INT,
customer_id INT
);
INSERT INTO customer(id) VALUES(0);
INSERT INTO customer(id) VALUES(1);
INSERT INTO customer(id) VALUES(2);
INSERT INTO customer(id) VALUES(3);
INSERT INTO groups(id) VALUES(1);
INSERT INTO groups(id) VALUES(3);
INSERT INTO groups(id) VALUES(5);
INSERT INTO groups(id) VALUES(6);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(0, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(0, 5);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(1, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(1, 90);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(2, 1);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 3);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 5);
INSERT INTO customers_to_groups(customer_id, group_id) VALUES(3, 90);
I need to get customers that have specific groups they are part of, and I need to get a list of all customers that are part of at least 1 group in multiple lists of group. For example I want to get all customers that are in group [5 OR 6] AND [1 OR 3]
, so for example a customer in group 5 and 1 wold be a returned, but somebody in group 1 and 90 or just group 1 not. With the provided sample data we would get the customer of id 0 and 3 only as they conform to the given rules above.
Just doing WHERE group_id IN (5,6) AND group_id IN (1,3)
does not seem to work, so I am looking for alternative.
I got this so far that works:
SELECT DISTINCT c.id
FROM customer c
INNER JOIN customers_to_groups at1 ON c.id = at1.customer_id
INNER JOIN customers_to_groups at2 ON c.id = at2.customer_id
WHERE at1.group_id IN (5, 6)
AND at2.group_id IN (1, 3);
Expected Results:
id |
---|
0 |
3 |
Is there a way to do it that is more performant?
2
Answers
You can achieve the desired result with a more optimized query by using a GROUP BY and HAVING clause. This approach avoids the need for multiple self-joins
FROM customers_to_groups
WHERE group_id IN (5, 6, 1, 3)
GROUP BY customer_id
HAVING COUNT(DISTINCT CASE WHEN group_id IN (5, 6) THEN 1 END) > 0
AND COUNT(DISTINCT CASE WHEN group_id IN (1, 3) THEN 1 END) > 0;
Index Consideration:
To further improve performance, ensure that you have an index on the customer_id and group_id columns in the customers_to_groups table:
CREATE INDEX idx_customer_group ON customers_to_groups (customer_id, group_id);
We can
GROUP BY
customer’s id and use aHAVING
clause. There we can useCASE
orFILTER
if your RDBMS supports it. Postgres should.There your conditions will be set.
We need to JOIN the customers_to_groups only once, so this will likely be faster.
The query will be:
or
See this demo with your sample data.