I’ve a simple query that is taking way to much time to process, the query is very simple and the WHERE clause has just simple comparators but the query takes between 15 to 20 secs to process in a 256gb server.
Here is the query
SELECT ad.*,
an.*
FROM addresses ad
INNER JOIN announces an ON ad.id = an.address_id
WHERE ( ad.city = 'city' )
ORDER BY an.postedAt DESC LIMIT 0, 100;
A little info about the table structure:
- all fields [including
city
andinserted_at
] are indexes. - each row has a bunch of fields but no long
varchar
blog
mostly int and varchar2 255… i would say 30 fields in total - the table addresses has about 500k rows and the table announces has about 8kk rows
- if I remove the order by from this query it runs fast as a cheetah
- the total result set for this query (if i remove the
limit
clause) is 6million rows.
I’ve tried using FORCE INDEX
and forcing it to use the posted_at index and the query takes less than 1 second, the problem is that this query is generated dynamically in a part of the code that i can’t touch, i do have access to database and can create indexes (is there any way to create an index on 2 columns from 2 tables?) set up mysql parameters or give it more memory
Here is the result of explain
on this query:
+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | ad | range | PRIMARY,unique2,idx_city_district,idx_city,index_city_street | idx_city | 93 | NULL | 291413 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | an | ref | fk_address_id_announces,index_address_postedAt | index_address_postedAt | 5 | fisgar_sp.ad.id | 2 | |
+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+
And here is the result of SHOW PROFILE
on this query
+--------------------------------+-----------+
| Status | Duration |
+--------------------------------+-----------+
| Starting | 0.000184 |
| Waiting for query cache lock | 0.000032 |
| starting | 0.000030 |
| Checking query cache for query | 0.000160 |
| checking permissions | 0.000035 |
| Opening tables | 0.000070 |
| After opening tables | 0.000034 |
| System lock | 0.000034 |
| table lock | 0.000034 |
| Waiting for query cache lock | 0.000064 |
| init | 0.000087 |
| Optimizing | 0.000062 |
| Statistics | 0.000197 |
| Preparing | 0.000114 |
| Creating tmp table | 0.000095 |
| Sorting result | 0.000040 |
| Executing | 0.000031 |
| Sending data | 18.195285 |
| Creating sort index | 1.561521 |
| Removing tmp table | 0.285116 |
| Creating sort index | 0.000127 |
| End of update loop | 0.000022 |
| Query end | 0.000015 |
| Commit | 0.000018 |
| closing tables | 0.000016 |
| Unlocking tables | 0.000013 |
| closing tables | 0.000022 |
| Starting cleanup | 0.000014 |
| Freeing items | 0.000023 |
| Updating status | 0.000093 |
| Reset for next command | 0.000016 |
+--------------------------------+-----------+
31 rows in set (0.001 sec)
Looking upon phpmyadmin the table announces is 9GB and the table addresses is 250MB so i gave the
innodb_buffer_pool_size=32G
and sort_buffer_size=16GB
to see if it was going to load the whole table in memory and sort in memory but yet filesort keep showing in the explain and the query is too slow
Any tips or anything I can investigate further?
Update
Adding extra info, this query performs slow in the specific case which 'city'
is a specific city that represents 80% of our database, let’s say new york, if i run it for a smaller city with few announces the query goes fast….
To me what happens is: mysql uses the city index for the address table, so load a bunch of addresses, then uses its ids to load the announces which will consist of basically 80% of the table itself… then with this huge volume of data it needs to sort to get the newest… so the problem happens
But how can I fix it without touching the query
- the total resultset for this query (if i remove the
limit
clause) is 6million rows. - postedAt refers to table announce
- Result of show create table (ommitting some non relevant column names)
Table addresses
:
CREATE TABLE `addresses`
(
`id` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(255) DEFAULT NULL,
`n` int(11) DEFAULT NULL,
`e` varchar(30) DEFAULT NULL,
`d` varchar(255) DEFAULT NULL,
`z` varchar(8) DEFAULT NULL,
`c` varchar(90) DEFAULT NULL,
`lo` decimal(11,8) DEFAULT 0.00000000,
`la` double(10,8) DEFAULT 0.00000000,
`lola` point DEFAULT NULL,
`p` varchar(300) DEFAULT NULL,
`city` varchar(30) DEFAULT NULL,
`inserted_at` timestamp NOT NULL DEFAULT current_timestamp(),
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
UNIQUE KEY `unique2` (`c`,`z`,`n`,`e`),
KEY `lola` (`lola`(25)),
KEY `idx_c_d` (`city`,`d`),
KEY `idx_city` (`city`) USING HASH,
KEY `index_city_s` (`city`,`s`)
) ENGINE=InnoDB AUTO_INCREMENT=44177177 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci ROW_FORMAT=COMPACT |
announces | CREATE TABLE `announces` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`v` int(1) NOT NULL DEFAULT 1,
`address_id` int(11) DEFAULT NULL,
`s` varchar(255) DEFAULT NULL,
`a` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`d` blob DEFAULT NULL,
`pt` varchar(50) NOT NULL DEFAULT '0',
`f` int(2) DEFAULT 0,
`dt` int(1) NOT NULL DEFAULT 0,
`bp` int(11) DEFAULT 0,
`rp` int(11) DEFAULT 0,
`cn` varchar(100) DEFAULT NULL,
`cp` int(11) DEFAULT 0,
`iptu` int(11) DEFAULT 0,
`s` int(11) DEFAULT 0,
`br` int(11) DEFAULT 0,
`brm` int(11) DEFAULT 0,
`ps` int(11) DEFAULT 0,
`ta` int(11) DEFAULT 0,
`ua` int(11) DEFAULT 0,
`pa` int(11) DEFAULT 0,
`code` varchar(65) DEFAULT NULL,
`extra` varchar(150) DEFAULT NULL,
`pri` tinyint(1) NOT NULL DEFAULT 0,
`per` mediumtext DEFAULT NULL,
`postedAt` timestamp NULL DEFAULT '0000-00-00 00:00:00',
`ti` int(11) DEFAULT NULL,
`inserted_at` timestamp NOT NULL DEFAULT current_timestamp(),
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`tpo` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `source` (`source`),
KEY `fk_address_id_announces` (`address_id`),
KEY `inserted_at` (`inserted_at`),
KEY `bp` (`bp`),
KEY `pt` (`pt`),
KEY `ua` (`ua`),
KEY `index_pt_ta` (`pt`,`ta`),
KEY `pri` (`pri`) USING HASH,
KEY `index_posted_at` (`postedAt`) USING BTREE,
KEY `index_address_postedAt` (`address_id`,`postedAt`),
CONSTRAINT `fk_address_id_announces` FOREIGN KEY (`address_id`) REFERENCES `addresses` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=278835392 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci ROW_FORMAT=COMPACT |
====================================
UPDATE
Here is the tmp folder
SELECT @@tmpdir;
+----------+
| @@tmpdir |
+----------+
| /tmp |
+----------+
1 row in set (0.001 sec)
remembering that /tmp refers to the structure of the container
the hd running the so and docker is a ssd
the result of OPTIMIZER_TRACE
can be seen in the link https://pastecode.dev/s/BhuNr863yZ
besides this json the query returns 0 and 0 for MISSING_BYTES_BEYOND_MAX_MEM_SIZE
| INSUFFICIENT_PRIVILEGES
|
====================================
UPDATE
CLARIFICATIONS:
I might be taking a little long to update this question with the newest info, but it is not because I’m not putting effort into it, is because all this needed to be tested in a PRODUCTION ENVIRONMENT with ACTIVE USERS… So all extra info that you guys provide me here i need to wait for AFTER HOURS so i can test and guarantee that my tests wont affect current users nor that other load in the system will affect my tests. That being said I can’t really understand why some people are voting for close this question, shame on you!
MariaDB [(none)]> SELECT @@max_heap_table_size, @@tmp_table_size;
+-----------------------+------------------+
| @@max_heap_table_size | @@tmp_table_size |
+-----------------------+------------------+
| 16777216 | 2147483648 |
+-----------------------+------------------+
1 row in set (0.001 sec)
Without the explain the following query takes less than 1 sec to run.
MariaDB [fisgar_sp]> EXPLAIN SELECT ad.*, ad.* FROM announces an FORCE INDEX (index_posted_at) INNER JOIN addresses ad ON ad.id = an.address_id WHERE ( ad.city = 'New York' ) ORDER BY postedAt DESC LIMIT 0, 100;
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
| 1 | SIMPLE | an | index | NULL | index_posted_at | 5 | NULL | 200 | Using where |
| 1 | SIMPLE | ad | eq_ref | PRIMARY,unique2,idx_city_d,idx_city,index_city_s | PRIMARY | 4 | fisgar_sp.an.address_id | 1 | Using where |
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
2 rows in set (0.001 sec)
Following @user1191247 suggestions of SET STATEMENT max_heap_table_size=4294967295, tmp_table_size=4294967295 FOR SELECT
… worked great, the query went from 22 seconds to 2 seconds so i can adjust those specific mem settings to get a huge improvement
I thank all guys that helped me on this, and specially @user1191247, if he is willing to write an answer I can accept the answer
3
Answers
Sorting is always tricky subject, especially if, as you have mentioned, one particular value represents 80% of the rows. In that case MySQL tends to prefer table scans (which are sequential) than index lookups (which are random – it may make sense to scan randomly a subset of rows but 80% is quite a lot for random reads).
Having said that, you have managed to find the solution (FORCE INDEX on posted_at index). If you cannot modify the application, you can use a loadbalancer that allows you to rewrite queries that pass through it. One of the solutions that can do it is ProxySQL. With it, you can modify the query on the fly and add FORCE INDEX statement that will speed up the query.
Of course, introducing another software adds to the overhead and it’s not something you can decide on lightly, but if your tests will deem it feasible to use ProxySQL, you will end up with a tool in your toolbox that will enable you to take a full control over the traffic and SQL that’s going to your database, no matter if you can modify the application or not.
Below are some resources that you may find useful:
Query rewrite with ProxySQL: use case scenario – ProxySQL
ProxySQL: How To Rewrite A Query
Lazy loading.
SELECT *
hauls around lots of stuff; let’s avoid that hauling around 6M rows when only 100 are being delivered.And have
Then run
EXPLAIN SELECT ...
. I think bothn1
andd1
will say "Using index. This will give a clue of improved performance.Some other issues:
KEY
idx_city(
city) USING HASH
— That will be silently turned into `BTree.INDEX(city)
is not needed if you also haveINDEX(city, xx)
. Similarly forindex_address_postedAt
in the other table.double(10,8)
is counterproductive and removed in a future version; suggest you either switch toDECIMAL(10,8)
or simplyDOUBLE
.Query is facing perofrmance issue and need to perform a sort operation.
You can’t modify the query directly.
For Optimizing the queries You need to work on the optimization as :
I hope this will help you to minimize query performance.