I have a table, let’s say, users
that has a one to many relationship to another table called, say, users_attributes
. The users_attributes
table is a simple key-value type table with a foreign key column to the users
table. Something like this:
create table users(
id: integer primary key,
name: varchar
);
create table users_attributes(
attribute_id: integer primary key,
user_id: integer references users(id),
attribute_name: varchar,
attribute_value: varchar
);
Now I need to filter users
based on attribute_name and attribute_value in the users_attributes
table. I have tried this query (Postgresql) which works but it take a lot longer to execute:
select * from users u
left join users_attributes ua1 on u.id = ua1.user_id and ua1.attribute_name = 'dog_name'
left join users_attributes ua2 on u.id = ua2.user_id and ua2.attribute_name = 'cat_name'
where ua1.attribute_value = 'Spot' and ua2.attribute_value = 'Mittens';
Here, the number of joins go up for each attribute that I need to filter the users by. This is causing the query to slow down (between 4-10 seconds depending on number of joins) since there are appx. a hundred thousand users. An explain plan on the query seems to support this theory.
Is there a way to query the users in a way that returns faster?
I am using Postgresql.
2
Answers
The mix of
LEFT JOIN
andWHERE
conditions makes no sense, logically. See:Basic rewrite:
Basically, it’s a case of relational-division.
There are many ways to do this. The best query style depends on your cardinalities, your typical filters, and what you are optimizing for. Here is a whole arsenal:
If you’re trying to find distinct users who have both a cat named Mittens and a dog named Spot, you may be interested in
intersect
. This acts sort of likeunion
but only keeps values which appear in both sets, removing duplicates by default.I’m not much for PostgreSQL; so, whether that would be more performant as a joined subquery, CTE, or
in()
expression, I couldn’t say.