I have a MySQL table entries
with a primary key id
. Also I have a pivot table entries_tags
with foreign keys entry_id
and tag
. tag
is plain text by the way, I know this should be normalized but that is not an option. Right now I have built a query to retrieve all entries by one specific tag.
I want to run a query (as efficient as possible) where I would retrieve entries having all of the tags I specify.
In case it helps this is the current query where I retrieve all entries by one specific tag:
SELECT *
FROM entries
WHERE id IN (
SELECT entry_id
FROM entries_tags
WHERE tag = '...'
)
Example
Example data
entries
id | name |
---|---|
1 | entry 1 |
2 | entry 2 |
3 | entry 3 |
entries_tags
entry_id | tag |
---|---|
1 | A |
1 | B |
1 | C |
2 | A |
2 | B |
3 | A |
3 | C |
Expected result
In matching entries I expect the complete line, here I note only the ids for ease of reading.
Notes
- The order of the tags provided should not be taken into consideration.
- The order of the results is not of interest.
input tags | matching entries |
---|---|
A B |
1 2 |
A C |
1 3 |
A B C |
1 |
A |
1 2 3 4 |
Thank you for your time
3
Answers
You can use
INTERSECT
operator:It gets less sufficient with the tags number growth.
I assume that your input tags are single-space separated value like
A B
orA B C
and space not allowed in tag names.in this case you can use a query like this:
SQL Fiddle Demo
In having part I find tags by
like
operator and sum all founded tags.then I need to calculate count of tags, so i use a little hack by removing single-spaces to find it, the result will be
entries.id
s that have all similar tags.An improvement can be like this:
concat(' ','A B',' ') like concat('% ',et.tag,' %')
; so query should be like:SQL Fiddle Demo
With adding variable to query you can test easily like this:
SQL Fiddle Demo
This is probably the simplest method but not necessarily the most performant:
You need a composite index on
entries_tags (entry_id, tag)
, preferably unique. If not unique, you need to change theCOUNT(*)
toCOUNT(DISTINCT et.tag)
.If your
entries
rows are very large, running the aggregation in a subquery may be faster:Here’s a db<>fiddle.
While playing with a test dataset (~1M entries, ~4M entries_tags and 4K distinct tags) I was very surprised by some of the results I saw. You may not see similar behaviour but I thought it worth noting.
In many test cases the following query was much faster: