MySQL query pivot where all options exist

Theofanis
September 18, 2023
209 views
2 votes
3 Answers

I have a MySQL table entries with a primary key id. Also I have a pivot table entries_tags with foreign keys entry_id and tag. tag is plain text by the way, I know this should be normalized but that is not an option. Right now I have built a query to retrieve all entries by one specific tag.

I want to run a query (as efficient as possible) where I would retrieve entries having all of the tags I specify.

In case it helps this is the current query where I retrieve all entries by one specific tag:

SELECT *
FROM entries
WHERE id IN (
   SELECT entry_id
   FROM entries_tags
   WHERE tag = '...'
)

Example

Example data

entries

id	name
1	entry 1
2	entry 2
3	entry 3

entries_tags

entry_id	tag
1	A
1	B
1	C
2	A
2	B
3	A
3	C

Expected result
In matching entries I expect the complete line, here I note only the ids for ease of reading.

Notes

The order of the tags provided should not be taken into consideration.
The order of the results is not of interest.

input tags	matching entries
`A` `B`	`1` `2`
`A` `C`	`1` `3`
`A` `B` `C`	`1`
`A`	`1` `2` `3` `4`

Thank you for your time

Tags: mysql sql

Answers

- MarkBeras
- September 4, 2023 at 3:25 pm
- 0 votes
0
You can use INTERSECT operator:
```
SELECT *
FROM entries
WHERE id IN (
   SELECT entry_id
   FROM entries_tags
   WHERE tag = 'tag1'
)
INTERSECT
SELECT *
FROM entries
WHERE id IN (
   SELECT entry_id
   FROM entries_tags
   WHERE tag = 'tag2'
)
-- Add more INTERSECT for additional tags if needed
;
```
It gets less sufficient with the tags number growth.
Login or Signup to reply.

- shAt
- September 4, 2023 at 4:02 pm
- 0 votes
0
I assume that your input tags are single-space separated value like A B or A B C and space not allowed in tag names.
in this case you can use a query like this:
```
select e.id
from entries e
  left join entries_tags et
  on e.id = et.entry_id
group by e.id
having sum(case when 'A B' like concat('%',et.tag,'%') then 1 else 0 end) = 
  length('A B')-length(replace('A B',' ',''))+1
```
SQL Fiddle Demo

In having part I find tags by like operator and sum all founded tags.
then I need to calculate count of tags, so i use a little hack by removing single-spaces to find it, the result will be entries.ids that have all similar tags.

Note: using name of tags in queries is a bad-practice, I suggest you to use another table to store tags with ids and use those ids in middle table.
actually in queries like my approach similarity will not work properly for some similar tags, I can improve my query a bit to cover some cases.

An improvement can be like this: concat(' ','A B',' ') like concat('% ',et.tag,' %'); so query should be like:
```
select e.id
from entries e
  left join entries_tags et
  on e.id = et.entry_id
group by e.id
having sum(case when concat(' ','A B',' ') like concat('% ', et.tag, ' %') then 1 else 0 end) = 
  length('A B')-length(replace('A B',' ',''))+1
```
SQL Fiddle Demo

With adding variable to query you can test easily like this:
SQL Fiddle Demo
Login or Signup to reply.

- user1191247
- September 4, 2023 at 4:26 pm
- 0 votes
0
This is probably the simplest method but not necessarily the most performant:
```
SELECT e.*
FROM entries e
JOIN entries_tags et ON e.id = et.entry_id
WHERE et.tag IN ('A', 'B', 'C')
GROUP BY et.entry_id
HAVING COUNT(*) = 3; -- because you are looking for three tags
```
You need a composite index on entries_tags (entry_id, tag), preferably unique. If not unique, you need to change the COUNT(*) to COUNT(DISTINCT et.tag).

If your entries rows are very large, running the aggregation in a subquery may be faster:
```
SELECT e.*
FROM (
    SELECT entry_id
    FROM entries_tags
    WHERE tag IN ('A', 'B', 'C')
    GROUP BY entry_id
    HAVING COUNT(*) = 3
) et
JOIN entries e ON e.id = et.entry_id;
```
Here’s a db<>fiddle.

While playing with a test dataset (~1M entries, ~4M entries_tags and 4K distinct tags) I was very surprised by some of the results I saw. You may not see similar behaviour but I thought it worth noting.

In many test cases the following query was much faster:
```
SELECT *
FROM entries e
WHERE EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'A')
  AND EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'B')
  AND EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'C');
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.