I have several millions of files and each file may have several thousands of functions. I have to store the names of these files and functions in a MySQL table to display this in a UI. The file names will be of length up to 100 characters and the function names may range from 300 to 32000 characters. I created a separate table for storing function names as most of them are redundant and it was consuming lot of space. Below are the details of the tables:
create table files (
id int not null auto_increment,
file_name VARCHAR(100),
function_id int,
primary key(id) using btree
CONSTRAINT `unique_functions` FOREIGN KEY (`function_id`) REFERENCES functions(id) ON UPDATE NO ACTION ON DELETE CASCADE
)
create table functions (
id int not null auto_increment,
function_name varchar(32000) not null COLLATE 'latin1_general_cs',
primary key(id) using btree
)
I query for the function IDs as below to insert them in files table:
select ID from functions where function_name= 'some_name';
But the performance of the query is very slow so I tried to create an index on function_name column but it fails with below error:
CREATE UNIQUE INDEX function_name_unique ON functions(function_name);
#1071 - Specified key was too long; max key length is 3072 bytes
I tried looking for a solution but could not find much help. Most of the articles say it is bad practice to create an index on long texts but I am not sure if i can avoid that? Any ideas are really appreciated.
NOTE:
I also found that we can use prefix length for INDEX but most of my function names are same in the beginning and I am not sure of this will be helpful in this case.
2
Answers
LOAD DATA
, it will be very fast.MariaDB has a "hash" index to help with such.
In the table, include
You would have to do things like:
and
(A similar approach would involve
REVERSE()
.)The above could be "hidden" from the users via Stored Routines, "Generated columns", "functional indexes", app subroutines, etc. (This depends on the specific version of MySQL/MariaDB you are using.)
Note: A prefix index does not provide a usable
UNIQUE
test. (One the prefix amount is checked for uniqueness.) See other suggestions for using a Hash.