skip to Main Content

I have a table, let’s say, users that has a one to many relationship to another table called, say, users_attributes. The users_attributes table is a simple key-value type table with a foreign key column to the users table. Something like this:

create table users(
  id: integer primary key, 
  name: varchar
);

create table users_attributes(
  attribute_id: integer primary key,
  user_id: integer references users(id),
  attribute_name: varchar, 
  attribute_value: varchar
);

Now I need to filter users based on attribute_name and attribute_value in the users_attributes table. I have tried this query (Postgresql) which works but it take a lot longer to execute:

select * from users u
left join users_attributes ua1 on u.id = ua1.user_id and ua1.attribute_name = 'dog_name'
left join users_attributes ua2 on u.id = ua2.user_id and ua2.attribute_name = 'cat_name'
where ua1.attribute_value = 'Spot' and ua2.attribute_value = 'Mittens';

Here, the number of joins go up for each attribute that I need to filter the users by. This is causing the query to slow down (between 4-10 seconds depending on number of joins) since there are appx. a hundred thousand users. An explain plan on the query seems to support this theory.

Is there a way to query the users in a way that returns faster?

I am using Postgresql.

2

Answers


  1. The mix of LEFT JOIN and WHERE conditions makes no sense, logically. See:

    Basic rewrite:

    SELECT *
    FROM   users u
    JOIN   users_attributes ua1 ON u.id = ua1.user_id
    JOIN   users_attributes ua2 ON u.id = ua2.user_id
    WHERE  ua1.attribute_name = 'dog_name'
    AND    ua1.attribute_value = 'Spot'
    AND    ua2.attribute_name = 'cat_name'
    AND    ua2.attribute_value = 'Mittens';
    

    Basically, it’s a case of .

    There are many ways to do this. The best query style depends on your cardinalities, your typical filters, and what you are optimizing for. Here is a whole arsenal:

    Login or Signup to reply.
  2. If you’re trying to find distinct users who have both a cat named Mittens and a dog named Spot, you may be interested in intersect. This acts sort of like union but only keeps values which appear in both sets, removing duplicates by default.

    select
     u.*
    from users as u
    where u.id in
    (
        select
         ua.user_id
        from users_attributes as ua
        where ua.attribute_name = 'dog_name'
        and ua.attribute_value = 'Spot'
        intersect
        select
         ua.user_id
        from users_attributes as ua
        where ua.attribute_name = 'cat_name'
        and ua.attribute_value = 'Mittens'
    )
    

    I’m not much for PostgreSQL; so, whether that would be more performant as a joined subquery, CTE, or in() expression, I couldn’t say.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search