skip to Main Content

Since MySQL version 8.0.17 there is Multi-Valued Indexes for JSON array column, is there also a simple possibility to create Multi-Valued Indexes for comma delimited string column (CHAR, VARCHAR, …) to increase the performance of FIND_IN_SET function?

I know that there is an option to create a secondary table customers_zip and use the INNER JOIN or IN function, but this is a complicated option. I use 10k databases whit ~150 various tables -> ~1.5M different comma delimited columns.

Example table:

CREATE TABLE `customers` (
 `id` BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 `zip_string` VARCHAR(100) NOT NULL,
 `zip_json` JSON,
 INDEX `zip_json` ((CAST(`zip_json` AS UNSIGNED ARRAY)))
);

Insert 1 000 000 rows:

INSERT INTO `customers` (`id`, `zip_string`, `zip_json`) VALUES 
 (1, '10001,10002,10003', '[10001,10002,10003]'),
 (2, '10004,10005', '[10004,10005]'),
 ...;

Select for FIND_IN_SET load 120ms

SELECT * FROM `customers` WHERE FIND_IN_SET('10005', `zip_string`);

Select for JSON_CONTAINS or MEMBER OF load 1ms

SELECT * FROM `customers` WHERE JSON_CONTAINS(`zip_json`, '10005');
SELECT * FROM `customers` WHERE 10005 MEMBER OF(`zip_json`);

2

Answers


  1. Chosen as BEST ANSWER

    I created a little hack :)

    Add a virtual JSON array column zip_index, whose contents will be automatically generated from the zip_string column, and which will be Multi-Valued Indexes

    ALTER TABLE `customers` ADD COLUMN `zip_index` JSON GENERATED ALWAYS AS (CONCAT('[', `zip_string`, ']')) VIRUTAL NOT NULL;
    
    CREATE INDEX `zip_index` ON `customers` ((CAST(`zip_index` AS UNSIGNED ARRAY)));
    

    Select for JSON_CONTAINS load 2ms

    SELECT * FROM `customers` WHERE JSON_CONTAINS(`zip_index`, '10005');
    

    The content in column zip_index is automatically generated and indexed when using UPDATE on column zip_string or INSERT a new row (mabey reduced performance?). But it is not necessary to use multiple tables for each similar column (comma delimiter string). Using the JSON_CONTAINS function is still as easy as the FIND_IN_SET function.


    Example for indexing non-integer lists

    CREATE INDEX `zip_index` ON `customers` ((CAST(`zip_index` AS CHAR(100) ARRAY)));
    

    The content of the original column zip_string must contain strings enclosed in quotes, like "A10001","B10002","C10003"


  2. No, you can’t make any kind of index to optimize searches for arbitrary elements in a comma-separated list.

    The closest options are:

    • Convert the comma-separated list to a JSON array, and make a multi-valued index, however in current versions of MySQL this only works for integers. There is no support for making a multi-valued index of strings.

    • Use a functional index, but this only works for a fixed search criteria. That is, you could add index on the expression (FIND_IN_SET('10005', zip_string)), but the value 10005 is hard-coded in the index definition. It’s not useful to find any other value.

    • Use a fulltext index, but this has limitations too.

    It really is easier to optimize if you refactor to a normalized table design, storing individual values in their own row. Storing comma-separated lists working against the strengths of a relational database.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search