I have a large MySQL table and when it had a low number of records the searches were fast, however the table now has over 400,000 records and searches take less than a second between 0.60 and 0.75. I’ve tried using indexes to bring this down to a nearer the 0.10 second or at least lower than 0.60 with no success. This estimate was from using microtime
in PHP immediately before and after the SQL query.
This is a snippet of the table structure output from SHOW CREATE TABLE
, there are other fields in the table but these aren’t used in the SQL search statement.
CREATE TABLE `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` longtext COLLATE utf8_unicode_ci NOT NULL,
`url_sha512hash` char(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'no hash.',
`viewdate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
UNIQUE KEY `uniq-web_url` (`url_sha512hash`),
KEY `idx-viewdate` (`viewdate`)
) ENGINE=MyISAM AUTO_INCREMENT=404899 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This is the SQL statement which takes between 60 and 75 seconds that I would like to speed up.
SELECT `id`, `url` FROM `mytable` USE INDEX(`idx-viewdate`)
WHERE (`url` LIKE "https://www.domain.tld/path/to/dir"
AND `viewdate` > "2022-11-20 23:23:00")
OR (`url` LIKE "https://www.domain.tld/path/to/dir/" AND `viewdate` > "INSERT SAME VIEW DATE AS BEFORE")
ORDER BY `id` DESC;
Output from EXPLAIN
+------+-------------+---------+-------+---------------+--------------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+-------+---------------+--------------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | mytable | range | idx-viewdate | idx-viewdate | 4 | NULL | 28 | Using index condition; Using where; Using filesort |
+------+-------------+---------+-------+---------------+--------------+---------+------+------+----------------------------------------------------+
3
Answers
Create a single index on
url, viewdate, id
.As per comments, use InnoDB, not MyISAM.
Then use this
WHERE
clause…LIKE
is expensive, and unnecessary if you’re looking for exact matches.IN()
is still more expensive than=
, but this form makes it clear you’re looking for one or two exact matches.Finally, be VERY certain to use parameterises queries / prepared statements. Do NOT manually substitute strings in to your query using string manipulation in PHP.
EDIT
Or, put the hash column in the index, and hash the two forms of the URL parameters when searching.
What version of MySQL are you running? Do you have control of the server or is it shared hosting?
The output from the following queries would be helpful for context –
The output of the following query will help with understanding the distribution your url data (may be slow so be cautious if server under load) –
As suggested by @MatBailie, you should add a composite index on url and viewdate, but do not include id.
When adding an index on TEXT/BLOB columns you must specify the key length (number of chars to be indexed). A longtext column can hold values up to 4GB, which would not be good for an index.
You also need to change the data type for your url column to a varchar. The current maximum length is 237 so bring it down to whatever you are happy with. 256 would give you some headroom but you may be more comfortable with 300.
Please provide updated EXPLAIN output for your query, after applying the changes.
UPDATE
You are currently storing your hash in a multi-byte column which is relatively inefficient. I would suggest changing it to a binary column which will use less space and be more performant for index lookups.
Then, when inserting into the new binary hash you need to remember to –
And to search based on the new binary hash –
Adding viewdate to the index on the binary hash column would be a waste of space as it is going to have to hit the clustered index to retrieve the url (and possibly other columns) anyway, unless you have a high miss-rate (queries returning 0 rows).
If the
id
column is not being used to referencemytable
from elsewhere, you could ditch the surrogate PK and use the binary hash instead, as you already have a unique index on it.Some ideas:
Get rid of
id
; promote the hash to be the PK.Switch from
ORDER BY id
toORDER BY viewdate
Change from
OR
toUNION
url
REGEXP "INSERT SAME URL AS BEFORE/?" instead of OR. Note the "?".I agree that you should switch to InnoDB; the suggestions here depend on that change.
Since
REGEXP
is not optimized as well asLIKE
, toss theOR
and usePutting them together…
Final query. (Note
id
is gone)Plan A
and have
Plan B
Since you are really looking at two possible URLs, not lots, then
together with
Plan C
But even better is to remove the trailing "/" when
INSERTing
(at least for the hash) and whenSELECTing
. (Or require the trailing "/" as suggested by MatBailie.) Either way, there is only one thing to test:together with
Plan C may get it down to 10ms on a spinning HDD or 1ms on SSD. A and B might be half that fast.
If you need the
id
for other purposes, tack onAnd, yes, convert the hex to
BINARY(nn)
withUNHEX
. (I did not incorporate such.)