In mysql, I have a table full of attributes, looking like so:
USER_ID | ATTR_NAME | ATTR_VALUE |
---|---|---|
1 | Name | Jess |
1 | Age | 23 |
1 | Sex | m |
2 | Name | Jess |
2 | Age | 23 |
3 | Name | Ann |
3 | Sex | f |
(Note that not every attribute must be present for every user)
I want to find all USER_ID
s where one or multiple attributes do match, for example:
Show me all users where the name is ‘Jess’ and the age is ’23’.
This should return: 1, 2
How would I express that in SQL?
EDIT: As people are asking for attempts heres my first try:
SELECT DISTINCT USER_ID
FROM ATTR_TABLE
WHERE
( ATTR_NAME = 'Name' AND ATTR_VALUE = 'Jess' ) AND
( ATTR_NAME = 'Age' AND ATTR_VALUE = '23' )
This certainly returns nothing, as not a single line has ATTR_NAME Name
and ATTR_NAME Age
…
This might be basic SQL, however the learning curve is there and I was not able to come up with a working solution since I am not yet into the SQL jargon and I am not even able to properly google for possible hints.
7
Answers
First Create a temporary Table
replace
user_attributes
in the first select query, to match your table namethe temp_grouped_attributes will be as such
The last select query will be as such
Here is a way to do it using self join :
Demo here
Maybe as simple as:
Why is that so hard? Am I missing something?
Lets translate your request:
Show me =>
SELECT
all users =>
*
or specifically list the fields you wantwhere =>
WHERE
criteria will follow…the name is ‘Jess’ =>
ATTR_NAME = 'Name' AND ATTR_VALUE = 'Jess'
and =>
AND
the age is ’23’ =>
ATTR_NAME = 'Age' AND ATTR_VALUE = '23'
What complicates this result set is that the entity that you want to select is split across multiple rows, the first step is to transpose the values (being a dynamic schema there are a few options), the following uses self joins for something different:
As my understood , you want to get USER_ID of users depends on their attributes of table
This is an other way using
group by
andhaving
clauses :Demo here
Your table schema is "EAV" or Entity-Attribute-Value. This is a common schema for applications to use if the number of attributes per entity is unknown or volatile. If this is a schema you own, and the attributes of
user_id
don’t change very frequently that it warrants an EAV table, then you may want to consider changing it as the SQL and compute costs can get ugly.With a normal
user
table, this would be as simple asBut with EAV, your attribute columns are stored as values, flipping the relational concept of an RDBMS on its head, to a degree. It’s not a "bad" design, it’s just that you are trading flexibility for compute/cost.
In your very reasonable requirements, there are a few ways to solve. Likely the most cost effective route is to gather up all records that match your Attribute/Value pairing:
Using an
OR
clause, since no single record in your table can have more than one attribute, and then aggregating and filtering the aggregation with aHAVING
clause.Since you are searching for the combination of 2 attributes,
HAVING COUNT(*) = 2
will limit your results to onlyuser_id
s that contain the two attributes you are after.dbfiddle here
There are other ways to skin this cat, but they often involve pivoting the data through case-expressions or multiple joins and can get very compute heavy as a result. As that wikipedia article states
Pivoting gets expensive fast, so any way to limit the need for pivot or multiple table scans is preferred. The method used in this answer is a little risky since it assumes that you won’t have more than one
name
orage
entry for each distinctuser_id
. You can, and should, implement a primary key/constraint to prevent that scenario.