skip to Main Content

I have a large data table containing details by date and across 3 independent criteria with around 12 discreet values for each criteria. That is, each criteria field in the table is defined as a 12 value ENUM. Users pull summary data by date and any filtering across the three criteria, including none at all. To make a single criteria lookup efficient, 3 separate indexes are required (date,CriteriaA), (date,CriteriaB), (date,CriteriaC). 4 indexes if you want to lookup against any of the 3 (date,A,B,C),(date,A,C),(date,B,C),(date,C).

In an attempt to be more efficient in the lookup, I built a SET column containing all 36 values from the 3 criteria. All values across the criteria are unique and none are a subset of any other. I added an index to this set (date, set_col). Queries against this table using a set lookup fails to take advantage of the index, however. Neither FIND_IN_SET('Value',set_col), set_col LIKE '%Value%', nor set_col & [pos. in set] triggers the index (according to explain and overall resultset return speed).

Is there a trick to indexing SET columns?

I tried queries like

Select Date, count(*) 
FROM tbl 
where DATE between [Start] and [End] 
and FIND_IN_SET('Value',set_col) 
group by Date

I would expect it to run nearly as fast as a lookup against the individual criteria column that has an index against it. But instead it runs as fast when just an index against DATE exists. Same number of rows processed according to Explain.

2

Answers


  1. It’s not possible to index SET columns for arbitrary queries.

    A SET type is basically a bitfield, with one bit set for each of the values defined for your set. You could search for a specific bit pattern in such a bitfield, or you could search for a range of specific bit patterns, or an inequality, etc. But searching for rows where one specific bit is set in the bitfield is not going to be indexable.

    FIND_IN_SET() is really searching for a specific bit set in the bitfield. It will not use an index for this predicate. The best you can hope to do for optimization is to have an index that narrows down the examined rows based on the other search term on date. Then among the rows matching the date range, the FIND_IN_SET() will be applied row-by-row.

    It’s the same problem as searching for substrings. The following predicates will not use an index on the column:

    SELECT ... WHERE SUBSTRING(mytext, 5, 8) = 'word'
    
    SELECT ... WHERE LOCATE(mytext, 'word') > 0
    
    SELECT ... WHERE mytext LIKE '%word%'
    

    A conventional index on the data would be alphabetized from the start of the string, not from some arbitrary point in the middle of the string. This is why fulltext indexing was created as an alternative to a simple B-tree index on the whole string value. But there’s no special index type for bitfields.

    I don’t think the SET data type is helping in your case.
    You should use your multi-column indexes with permutations of the columns.

    Login or Signup to reply.
  2. Go back to 3 ENUMs. Then have

    INDEX(A, date),
    INDEX(B, date),
    INDEX(C, date)
    

    Those should significantly help with queries like

    WHERE A = 'foo' AND date BETWEEN...
    

    and somewhat help for

    WHERE A = 'foo' AND date BETWEEN...
      AND B = 'bar'
    

    If you will also have queries without A/B/C, then add

    INDEX(date)
    

    Note: INDEX(date, A) is no better than INDEX(date) when using a "range". That is, I recommend against the indexes you mentioned.

    FIND_IN_SET(), like virtually all other function calls, is not sargable . However enum=const is sargable since it is implemented as a simple integer.

    You did not mention

    WHERE A IN ('x', 'y') AND ...
    

    That is virtually un-indexable. However, my suggestions are better than nothing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search