skip to Main Content

I’ve a simple query that is taking way to much time to process, the query is very simple and the WHERE clause has just simple comparators but the query takes between 15 to 20 secs to process in a 256gb server.

Here is the query

SELECT ad.*, 
       an.* 
FROM addresses ad 
INNER JOIN announces an ON ad.id = an.address_id 
WHERE ( ad.city = 'city' ) 
ORDER BY an.postedAt DESC LIMIT 0, 100;

A little info about the table structure:

  • all fields [including city and inserted_at] are indexes.
  • each row has a bunch of fields but no long varchar blog mostly int and varchar2 255… i would say 30 fields in total
  • the table addresses has about 500k rows and the table announces has about 8kk rows
  • if I remove the order by from this query it runs fast as a cheetah
  • the total result set for this query (if i remove the limit clause) is 6million rows.

I’ve tried using FORCE INDEX and forcing it to use the posted_at index and the query takes less than 1 second, the problem is that this query is generated dynamically in a part of the code that i can’t touch, i do have access to database and can create indexes (is there any way to create an index on 2 columns from 2 tables?) set up mysql parameters or give it more memory

Here is the result of explain on this query:

+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+
| id | select_type | table | type  | possible_keys                                                | key                    | key_len | ref             | rows   | Extra                                                  |
+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+
| 1  | SIMPLE      | ad    | range | PRIMARY,unique2,idx_city_district,idx_city,index_city_street | idx_city               | 93      | NULL            | 291413 | Using index condition; Using temporary; Using filesort |
| 1  | SIMPLE      | an    | ref   | fk_address_id_announces,index_address_postedAt               | index_address_postedAt | 5       | fisgar_sp.ad.id | 2      |                                                        |
+----+-------------+-------+-------+--------------------------------------------------------------+------------------------+---------+-----------------+--------+--------------------------------------------------------+

And here is the result of SHOW PROFILE on this query

+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| Starting                       |  0.000184 |
| Waiting for query cache lock   |  0.000032 |
| starting                       |  0.000030 |
| Checking query cache for query |  0.000160 |
| checking permissions           |  0.000035 |
| Opening tables                 |  0.000070 |
| After opening tables           |  0.000034 |
| System lock                    |  0.000034 |
| table lock                     |  0.000034 |
| Waiting for query cache lock   |  0.000064 |
| init                           |  0.000087 |
| Optimizing                     |  0.000062 |
| Statistics                     |  0.000197 |
| Preparing                      |  0.000114 |
| Creating tmp table             |  0.000095 |
| Sorting result                 |  0.000040 |
| Executing                      |  0.000031 |
| Sending data                   | 18.195285 |
| Creating sort index            |  1.561521 |
| Removing tmp table             |  0.285116 |
| Creating sort index            |  0.000127 |
| End of update loop             |  0.000022 |
| Query end                      |  0.000015 |
| Commit                         |  0.000018 |
| closing tables                 |  0.000016 |
| Unlocking tables               |  0.000013 |
| closing tables                 |  0.000022 |
| Starting cleanup               |  0.000014 |
| Freeing items                  |  0.000023 |
| Updating status                |  0.000093 |
| Reset for next command         |  0.000016 |
+--------------------------------+-----------+
31 rows in set (0.001 sec)

Looking upon phpmyadmin the table announces is 9GB and the table addresses is 250MB so i gave the
innodb_buffer_pool_size=32G and sort_buffer_size=16GB to see if it was going to load the whole table in memory and sort in memory but yet filesort keep showing in the explain and the query is too slow

Any tips or anything I can investigate further?

Update

Adding extra info, this query performs slow in the specific case which 'city' is a specific city that represents 80% of our database, let’s say new york, if i run it for a smaller city with few announces the query goes fast….

To me what happens is: mysql uses the city index for the address table, so load a bunch of addresses, then uses its ids to load the announces which will consist of basically 80% of the table itself… then with this huge volume of data it needs to sort to get the newest… so the problem happens

But how can I fix it without touching the query

  1. the total resultset for this query (if i remove the limit clause) is 6million rows.
  2. postedAt refers to table announce
  3. Result of show create table (ommitting some non relevant column names)

Table addresses:

CREATE TABLE `addresses` 
(
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `s` varchar(255) DEFAULT NULL,
  `n` int(11) DEFAULT NULL,
  `e` varchar(30) DEFAULT NULL,
  `d` varchar(255) DEFAULT NULL,
  `z` varchar(8) DEFAULT NULL,
  `c` varchar(90) DEFAULT NULL,
  `lo` decimal(11,8) DEFAULT 0.00000000,
  `la` double(10,8) DEFAULT 0.00000000,
  `lola` point DEFAULT NULL,
  `p` varchar(300) DEFAULT NULL,
  `city` varchar(30) DEFAULT NULL,
  `inserted_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique2` (`c`,`z`,`n`,`e`),
  KEY `lola` (`lola`(25)),
  KEY `idx_c_d` (`city`,`d`),
  KEY `idx_city` (`city`) USING HASH,
  KEY `index_city_s` (`city`,`s`)
) ENGINE=InnoDB AUTO_INCREMENT=44177177 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci ROW_FORMAT=COMPACT |



 announces | CREATE TABLE `announces` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `v` int(1) NOT NULL DEFAULT 1,
  `address_id` int(11) DEFAULT NULL,
  `s` varchar(255) DEFAULT NULL,
  `a` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `d` blob DEFAULT NULL,
  `pt` varchar(50) NOT NULL DEFAULT '0',
  `f` int(2) DEFAULT 0,
  `dt` int(1) NOT NULL DEFAULT 0,
  `bp` int(11) DEFAULT 0,
  `rp` int(11) DEFAULT 0,
  `cn` varchar(100) DEFAULT NULL,
  `cp` int(11) DEFAULT 0,
  `iptu` int(11) DEFAULT 0,
  `s` int(11) DEFAULT 0,
  `br` int(11) DEFAULT 0,
  `brm` int(11) DEFAULT 0,
  `ps` int(11) DEFAULT 0,
  `ta` int(11) DEFAULT 0,
  `ua` int(11) DEFAULT 0,
  `pa` int(11) DEFAULT 0,
  `code` varchar(65) DEFAULT NULL,
  `extra` varchar(150) DEFAULT NULL,
  `pri` tinyint(1) NOT NULL DEFAULT 0,
  `per` mediumtext DEFAULT NULL,
  `postedAt` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `ti` int(11) DEFAULT NULL,
  `inserted_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `tpo` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `source` (`source`),
  KEY `fk_address_id_announces` (`address_id`),
  KEY `inserted_at` (`inserted_at`),
  KEY `bp` (`bp`),
  KEY `pt` (`pt`),
  KEY `ua` (`ua`),
  KEY `index_pt_ta` (`pt`,`ta`),
  KEY `pri` (`pri`) USING HASH,
  KEY `index_posted_at` (`postedAt`) USING BTREE,
  KEY `index_address_postedAt` (`address_id`,`postedAt`),
  CONSTRAINT `fk_address_id_announces` FOREIGN KEY (`address_id`) REFERENCES `addresses` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=278835392 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci ROW_FORMAT=COMPACT |

====================================
UPDATE

Here is the tmp folder

SELECT @@tmpdir;
+----------+
| @@tmpdir |
+----------+
| /tmp     |
+----------+
1 row in set (0.001 sec)

remembering that /tmp refers to the structure of the container
the hd running the so and docker is a ssd

the result of OPTIMIZER_TRACE can be seen in the link https://pastecode.dev/s/BhuNr863yZ
besides this json the query returns 0 and 0 for MISSING_BYTES_BEYOND_MAX_MEM_SIZE | INSUFFICIENT_PRIVILEGES |

====================================
UPDATE

CLARIFICATIONS:
I might be taking a little long to update this question with the newest info, but it is not because I’m not putting effort into it, is because all this needed to be tested in a PRODUCTION ENVIRONMENT with ACTIVE USERS… So all extra info that you guys provide me here i need to wait for AFTER HOURS so i can test and guarantee that my tests wont affect current users nor that other load in the system will affect my tests. That being said I can’t really understand why some people are voting for close this question, shame on you!

MariaDB [(none)]> SELECT @@max_heap_table_size, @@tmp_table_size;
+-----------------------+------------------+
| @@max_heap_table_size | @@tmp_table_size |
+-----------------------+------------------+
|              16777216 |       2147483648 |
+-----------------------+------------------+
1 row in set (0.001 sec)

Without the explain the following query takes less than 1 sec to run.

MariaDB [fisgar_sp]> EXPLAIN SELECT ad.*, ad.* FROM announces an FORCE INDEX (index_posted_at) INNER JOIN addresses ad ON ad.id = an.address_id WHERE ( ad.city = 'New York' ) ORDER BY postedAt DESC LIMIT 0, 100;
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
| id   | select_type | table | type   | possible_keys                                                | key             | key_len | ref                     | rows | Extra       |
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
|    1 | SIMPLE      | an    | index  | NULL                                                         | index_posted_at | 5       | NULL                    | 200  | Using where |
|    1 | SIMPLE      | ad    | eq_ref | PRIMARY,unique2,idx_city_d,idx_city,index_city_s | PRIMARY         | 4       | fisgar_sp.an.address_id | 1    | Using where |
+------+-------------+-------+--------+--------------------------------------------------------------+-----------------+---------+-------------------------+------+-------------+
2 rows in set (0.001 sec)

Following @user1191247 suggestions of SET STATEMENT max_heap_table_size=4294967295, tmp_table_size=4294967295 FOR SELECT… worked great, the query went from 22 seconds to 2 seconds so i can adjust those specific mem settings to get a huge improvement

I thank all guys that helped me on this, and specially @user1191247, if he is willing to write an answer I can accept the answer

3

Answers


  1. Sorting is always tricky subject, especially if, as you have mentioned, one particular value represents 80% of the rows. In that case MySQL tends to prefer table scans (which are sequential) than index lookups (which are random – it may make sense to scan randomly a subset of rows but 80% is quite a lot for random reads).

    Having said that, you have managed to find the solution (FORCE INDEX on posted_at index). If you cannot modify the application, you can use a loadbalancer that allows you to rewrite queries that pass through it. One of the solutions that can do it is ProxySQL. With it, you can modify the query on the fly and add FORCE INDEX statement that will speed up the query.

    Of course, introducing another software adds to the overhead and it’s not something you can decide on lightly, but if your tests will deem it feasible to use ProxySQL, you will end up with a tool in your toolbox that will enable you to take a full control over the traffic and SQL that’s going to your database, no matter if you can modify the application or not.

    Below are some resources that you may find useful:

    Query rewrite with ProxySQL: use case scenario – ProxySQL

    ProxySQL: How To Rewrite A Query

    Login or Signup to reply.
  2. Lazy loading. SELECT * hauls around lots of stuff; let’s avoid that hauling around 6M rows when only 100 are being delivered.

    SELECT  d2.*, n2.*
        FROM ( SELECT d1.id AS d1_id,
                      n1.id AS n1_id
                   FROM addresses d1 
                   JOIN announces n1 ON d1.id = n1.address_id 
                   WHERE ( d1.city = 'city' ) 
                   ORDER BY n1.postedAt DESC LIMIT 0, 100;
             ) x
        JOIN addresses d2  d1.id = x.d1_id
        JOIN announces n2  n2.id = x.d2_id
    

    And have

    addresses:  INDEX(city, id)
    announces:  INDEX(postedAt. address_id,  id)
    announces:  INDEX(address_id, postedAt,  id)
    

    Then run EXPLAIN SELECT .... I think both n1 and d1 will say "Using index. This will give a clue of improved performance.

    Some other issues:

    • KEY idx_city (city) USING HASH — That will be silently turned into `BTree.
    • INDEX(city) is not needed if you also have INDEX(city, xx). Similarly for index_address_postedAt in the other table.
    • The (m,n) of double(10,8) is counterproductive and removed in a future version; suggest you either switch to DECIMAL(10,8) or simply DOUBLE.
    Login or Signup to reply.
  3. Query is facing perofrmance issue and need to perform a sort operation.
    You can’t modify the query directly.

    For Optimizing the queries You need to work on the optimization as :

    1. You need to work on index optimization and you don’t need to create another index for the address table as you are already having it for city column.
    2. You need to work on the conifguration of mysql and need to increase innodb_buffer_pool_size and sort_buffer_size.
    3. Table Optimization: Temporary table is being created for sorting. You can optimize it by adjusting tmp_table_size and max_heap_table_size configuration variables.

    I hope this will help you to minimize query performance.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search