skip to Main Content

I have a MySQL table entries with a primary key id. Also I have a pivot table entries_tags with foreign keys entry_id and tag. tag is plain text by the way, I know this should be normalized but that is not an option. Right now I have built a query to retrieve all entries by one specific tag.

I want to run a query (as efficient as possible) where I would retrieve entries having all of the tags I specify.

In case it helps this is the current query where I retrieve all entries by one specific tag:

SELECT *
FROM entries
WHERE id IN (
   SELECT entry_id
   FROM entries_tags
   WHERE tag = '...'
)

Example

Example data

entries

id name
1 entry 1
2 entry 2
3 entry 3

entries_tags

entry_id tag
1 A
1 B
1 C
2 A
2 B
3 A
3 C

Expected result
In matching entries I expect the complete line, here I note only the ids for ease of reading.

Notes

  • The order of the tags provided should not be taken into consideration.
  • The order of the results is not of interest.
input tags matching entries
A B 1 2
A C 1 3
A B C 1
A 1 2 3 4

Thank you for your time

3

Answers


  1. You can use INTERSECT operator:

    SELECT *
    FROM entries
    WHERE id IN (
       SELECT entry_id
       FROM entries_tags
       WHERE tag = 'tag1'
    )
    INTERSECT
    SELECT *
    FROM entries
    WHERE id IN (
       SELECT entry_id
       FROM entries_tags
       WHERE tag = 'tag2'
    )
    -- Add more INTERSECT for additional tags if needed
    ;
    

    It gets less sufficient with the tags number growth.

    Login or Signup to reply.
  2. I assume that your input tags are single-space separated value like A B or A B C and space not allowed in tag names.
    in this case you can use a query like this:

    select e.id
    from entries e
      left join entries_tags et
      on e.id = et.entry_id
    group by e.id
    having sum(case when 'A B' like concat('%',et.tag,'%') then 1 else 0 end) = 
      length('A B')-length(replace('A B',' ',''))+1
    

    SQL Fiddle Demo

    In having part I find tags by like operator and sum all founded tags.
    then I need to calculate count of tags, so i use a little hack by removing single-spaces to find it, the result will be entries.ids that have all similar tags.

    Note: using name of tags in queries is a bad-practice, I suggest you to use another table to store tags with ids and use those ids in middle table.
    actually in queries like my approach similarity will not work properly for some similar tags, I can improve my query a bit to cover some cases.

    An improvement can be like this: concat(' ','A B',' ') like concat('% ',et.tag,' %'); so query should be like:

    select e.id
    from entries e
      left join entries_tags et
      on e.id = et.entry_id
    group by e.id
    having sum(case when concat(' ','A B',' ') like concat('% ', et.tag, ' %') then 1 else 0 end) = 
      length('A B')-length(replace('A B',' ',''))+1
    

    SQL Fiddle Demo

    With adding variable to query you can test easily like this:
    SQL Fiddle Demo

    Login or Signup to reply.
  3. This is probably the simplest method but not necessarily the most performant:

    SELECT e.*
    FROM entries e
    JOIN entries_tags et ON e.id = et.entry_id
    WHERE et.tag IN ('A', 'B', 'C')
    GROUP BY et.entry_id
    HAVING COUNT(*) = 3; -- because you are looking for three tags
    

    You need a composite index on entries_tags (entry_id, tag), preferably unique. If not unique, you need to change the COUNT(*) to COUNT(DISTINCT et.tag).

    If your entries rows are very large, running the aggregation in a subquery may be faster:

    SELECT e.*
    FROM (
        SELECT entry_id
        FROM entries_tags
        WHERE tag IN ('A', 'B', 'C')
        GROUP BY entry_id
        HAVING COUNT(*) = 3
    ) et
    JOIN entries e ON e.id = et.entry_id;
    

    Here’s a db<>fiddle.


    While playing with a test dataset (~1M entries, ~4M entries_tags and 4K distinct tags) I was very surprised by some of the results I saw. You may not see similar behaviour but I thought it worth noting.

    In many test cases the following query was much faster:

    SELECT *
    FROM entries e
    WHERE EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'A')
      AND EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'B')
      AND EXISTS (SELECT 1 FROM entries_tags WHERE entry_id = e.id AND tag = 'C');
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search