skip to Main Content

I have several millions of files and each file may have several thousands of functions. I have to store the names of these files and functions in a MySQL table to display this in a UI. The file names will be of length up to 100 characters and the function names may range from 300 to 32000 characters. I created a separate table for storing function names as most of them are redundant and it was consuming lot of space. Below are the details of the tables:

create table files (
    id int not null auto_increment,
    file_name VARCHAR(100),
    function_id int,
    primary key(id) using btree
    CONSTRAINT `unique_functions` FOREIGN KEY (`function_id`) REFERENCES functions(id) ON UPDATE NO ACTION ON DELETE CASCADE
)

create table functions (
    id int not null auto_increment,
    function_name varchar(32000) not null COLLATE 'latin1_general_cs',
    primary key(id) using btree
)

I query for the function IDs as below to insert them in files table:

select ID from functions where function_name= 'some_name';

But the performance of the query is very slow so I tried to create an index on function_name column but it fails with below error:

CREATE UNIQUE INDEX function_name_unique ON functions(function_name);

#1071 - Specified key was too long; max key length is 3072 bytes

I tried looking for a solution but could not find much help. Most of the articles say it is bad practice to create an index on long texts but I am not sure if i can avoid that? Any ideas are really appreciated.

NOTE:
I also found that we can use prefix length for INDEX but most of my function names are same in the beginning and I am not sure of this will be helpful in this case.

2

Answers


    • Load all the data into a table without the desired Unique index(es). If this is LOAD DATA, it will be very fast.
    • Have an extra column for the hash.
    • Generate the hashes. Do you need to do this in the app? Or in SQL?
    • Now add the indexes. This is ‘fast enough’.

    MariaDB has a "hash" index to help with such.

    Login or Signup to reply.
  1. In the table, include

    full_fcn_name TEXT NOT NULL,
    fcn_end VARCHAR(100) NOT NULL,
    INDEX(fcn_end)
    

    You would have to do things like:

    INSERT ... (... fcn_end, full_fcn_name)
        VALUES
        (..., RIGHT(?, 100), ?)
    

    and

    SELECT ...
        WHERE fcn_end = RIGHT(?, 100)
          AND full_fcn_name = ?
    

    (A similar approach would involve REVERSE().)

    The above could be "hidden" from the users via Stored Routines, "Generated columns", "functional indexes", app subroutines, etc. (This depends on the specific version of MySQL/MariaDB you are using.)

    Note: A prefix index does not provide a usable UNIQUE test. (One the prefix amount is checked for uniqueness.) See other suggestions for using a Hash.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search