Since MySQL version 8.0.17 there is Multi-Valued Indexes for JSON array column, is there also a simple possibility to create Multi-Valued Indexes for comma delimited string column (CHAR, VARCHAR, …) to increase the performance of FIND_IN_SET
function?
I know that there is an option to create a secondary table customers_zip
and use the INNER JOIN
or IN
function, but this is a complicated option. I use 10k databases whit ~150 various tables -> ~1.5M different comma delimited columns.
Example table:
CREATE TABLE `customers` (
`id` BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`zip_string` VARCHAR(100) NOT NULL,
`zip_json` JSON,
INDEX `zip_json` ((CAST(`zip_json` AS UNSIGNED ARRAY)))
);
Insert 1 000 000 rows:
INSERT INTO `customers` (`id`, `zip_string`, `zip_json`) VALUES
(1, '10001,10002,10003', '[10001,10002,10003]'),
(2, '10004,10005', '[10004,10005]'),
...;
Select for FIND_IN_SET
load 120ms
SELECT * FROM `customers` WHERE FIND_IN_SET('10005', `zip_string`);
Select for JSON_CONTAINS
or MEMBER OF
load 1ms
SELECT * FROM `customers` WHERE JSON_CONTAINS(`zip_json`, '10005');
SELECT * FROM `customers` WHERE 10005 MEMBER OF(`zip_json`);
2
Answers
I created a little hack :)
Add a virtual JSON array column
zip_index
, whose contents will be automatically generated from thezip_string
column, and which will be Multi-Valued IndexesSelect for
JSON_CONTAINS
load 2msThe content in column
zip_index
is automatically generated and indexed when usingUPDATE
on columnzip_string
orINSERT
a new row (mabey reduced performance?). But it is not necessary to use multiple tables for each similar column (comma delimiter string). Using theJSON_CONTAINS
function is still as easy as theFIND_IN_SET
function.Example for indexing non-integer lists
The content of the original column
zip_string
must contain strings enclosed in quotes, like"A10001","B10002","C10003"
No, you can’t make any kind of index to optimize searches for arbitrary elements in a comma-separated list.
The closest options are:
Convert the comma-separated list to a JSON array, and make a multi-valued index, however in current versions of MySQL this only works for integers. There is no support for making a multi-valued index of strings.
Use a functional index, but this only works for a fixed search criteria. That is, you could add index on the expression
(FIND_IN_SET('10005', zip_string))
, but the value 10005 is hard-coded in the index definition. It’s not useful to find any other value.Use a fulltext index, but this has limitations too.
It really is easier to optimize if you refactor to a normalized table design, storing individual values in their own row. Storing comma-separated lists working against the strengths of a relational database.