skip to Main Content

In mysql, I have a table full of attributes, looking like so:

USER_ID ATTR_NAME ATTR_VALUE
1 Name Jess
1 Age 23
1 Sex m
2 Name Jess
2 Age 23
3 Name Ann
3 Sex f

(Note that not every attribute must be present for every user)

I want to find all USER_IDs where one or multiple attributes do match, for example:

Show me all users where the name is ‘Jess’ and the age is ’23’.

This should return: 1, 2

How would I express that in SQL?

EDIT: As people are asking for attempts heres my first try:

SELECT DISTINCT USER_ID 
FROM ATTR_TABLE 
WHERE 
  ( ATTR_NAME = 'Name' AND ATTR_VALUE = 'Jess' ) AND 
  ( ATTR_NAME = 'Age' AND ATTR_VALUE = '23' )

This certainly returns nothing, as not a single line has ATTR_NAME Name and ATTR_NAME Age

This might be basic SQL, however the learning curve is there and I was not able to come up with a working solution since I am not yet into the SQL jargon and I am not even able to properly google for possible hints.

7

Answers


  1. First Create a temporary Table

    replace user_attributes in the first select query, to match your table name

    -- Create a temporary table to store the grouped attributes
    CREATE TEMPORARY TABLE temp_grouped_attributes AS
    SELECT
        USER_ID,
        MAX(CASE WHEN ATTR_NAME = 'Name' THEN ATTR_VALUE ELSE NULL END) AS Name,
        MAX(CASE WHEN ATTR_NAME = 'Age' THEN ATTR_VALUE ELSE NULL END) AS Age,
        MAX(CASE WHEN ATTR_NAME = 'Sex' THEN ATTR_VALUE ELSE NULL END) AS Sex
    FROM user_attributes
    GROUP BY USER_ID;
    
    -- Now Select / Search your new table
    SELECT *
    FROM temp_grouped_attributes
    WHERE Name = 'Jess' AND Age = 23;
    
    

    the temp_grouped_attributes will be as such

    User ID Name Age Sex
    1 Jess 24 m
    2 Jess 23 Null
    3 Ann null f

    The last select query will be as such

    User ID Name Age Sex
    2 Jess 23 Null
    Login or Signup to reply.
  2. Here is a way to do it using self join :

    SELECT DISTINCT a1.USER_ID 
    FROM ATTR_TABLE a1
    INNER JOIN ATTR_TABLE a2 ON a1.USER_ID = a2.USER_ID
    WHERE 
      a1.ATTR_NAME = 'Name' AND a1.ATTR_VALUE = 'Jess' 
      AND a2.ATTR_NAME = 'Age' AND a2.ATTR_VALUE = '23';
    

    Demo here

    Login or Signup to reply.
  3. Maybe as simple as:

    SELECT 
    DISTINCT(user_id) 
    FROM test_table 
    WHERE 
    (attr_name, attr_value) IN (('Name','Jess'))
    OR 
    (attr_name, attr_value) IN (('Age','23'));
    

    Why is that so hard? Am I missing something?

    Login or Signup to reply.
  4. Lets translate your request:

    Show me => SELECT
    all users => * or specifically list the fields you want
    where => WHERE criteria will follow…
    the name is ‘Jess’ => ATTR_NAME = 'Name' AND ATTR_VALUE = 'Jess'
    and => AND
    the age is ’23’ => ATTR_NAME = 'Age' AND ATTR_VALUE = '23'

    What complicates this result set is that the entity that you want to select is split across multiple rows, the first step is to transpose the values (being a dynamic schema there are a few options), the following uses self joins for something different:

    SELECT * FROM (
        userName.USER_ID,
        userName.ATTR_VALUE AS Name,
        userAge.ATTR_VALUE AS Age,
        userSex.ATTR_VALUE AS Sex
    FROM user_attributes userName
    LEFT OUTER JOIN ATTR_TABLE userAge ON userName.USER_ID = userAge.USER_ID AND userAge.ATTR_NAME = 'Age'
    LEFT OUTER ATTR_TABLE userSex ON userName.USER_ID = userSex.USER_ID AND userSex.ATTR_NAME = 'Sex'
    WHERE username.ATTR_NAME = 'Name'
    ) Users
    WHERE Name = 'Jess' AND Age = '23'
    
    Login or Signup to reply.
  5. As my understood , you want to get USER_ID of users depends on their attributes of table

    SELECT t1.USER_ID FROM yourTable t1 JOIN yourTable t2 ON t1.USER_ID = t2.USER_ID WHERE ( t1.ATTR_NAME = 'Name' AND t1.ATTR_VALUE = 'Jess' ) AND ( t2.ATTR_NAME = 'Age' AND t2.ATTR_VALUE = '23' );
    
    Login or Signup to reply.
  6. This is an other way using group by and having clauses :

    select USER_ID
    from ATTR_TABLE
    group by USER_ID
    having count(case when ATTR_NAME = 'Name' AND ATTR_VALUE = 'Jess' then 1 end ) = 1
           and count(case when ATTR_NAME = 'Age' AND ATTR_VALUE = '23' then 1 end ) = 1
    

    Demo here

    Login or Signup to reply.
  7. Your table schema is "EAV" or Entity-Attribute-Value. This is a common schema for applications to use if the number of attributes per entity is unknown or volatile. If this is a schema you own, and the attributes of user_id don’t change very frequently that it warrants an EAV table, then you may want to consider changing it as the SQL and compute costs can get ugly.

    With a normal user table, this would be as simple as

    SELECT user_id FROM users WHERE name='Jess' and Age='23';
    

    But with EAV, your attribute columns are stored as values, flipping the relational concept of an RDBMS on its head, to a degree. It’s not a "bad" design, it’s just that you are trading flexibility for compute/cost.

    In your very reasonable requirements, there are a few ways to solve. Likely the most cost effective route is to gather up all records that match your Attribute/Value pairing:

    (attr_name = 'Name' AND attr_value = 'Jess') 
    OR (attr_name = 'Age' AND attr_value = '23')
    

    Using an OR clause, since no single record in your table can have more than one attribute, and then aggregating and filtering the aggregation with a HAVING clause.

    Since you are searching for the combination of 2 attributes, HAVING COUNT(*) = 2 will limit your results to only user_ids that contain the two attributes you are after.

    SELECT user_id
    FROM mytable
    WHERE (attr_name = 'Name' AND attr_value = 'Jess') 
      OR (attr_name = 'Age' AND attr_value = '23') 
    GROUP BY user_id 
    HAVING count(*) = 2
    

    dbfiddle here

    There are other ways to skin this cat, but they often involve pivoting the data through case-expressions or multiple joins and can get very compute heavy as a result. As that wikipedia article states

    The Achilles heel of EAV is the difficulty of working with large
    volumes of EAV data. It is often necessary to transiently or
    permanently inter-convert between columnar and row-or EAV-modeled
    representations of the same data; this can be both error-prone if done
    manually as well as CPU-intensive. […] The conversion operation is
    called pivoting.

    Pivoting gets expensive fast, so any way to limit the need for pivot or multiple table scans is preferred. The method used in this answer is a little risky since it assumes that you won’t have more than one name or age entry for each distinct user_id. You can, and should, implement a primary key/constraint to prevent that scenario.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search