skip to Main Content

Hi stackoverflow I have here a sql statement that is pretty slow performing which I think is due to the subquery seen in the sql below. My question is simply given that this subquery and the fact that it must set ‘exists’ to 0 or 1 given the logic can this be improved.

    SELECT
    p.id,
    p.name,
    (
        SELECT
            COUNT(*) > 0
        FROM
            product_log AS pl
        WHERE
            pl.product_id = p.id
            AND
            pl.state_name in ("Creation", "Auction", ...)
    ) AS 'has_log'
    from product as p;

Table creations for log and product

CREATE TABLE `product_log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `product_id` int(11) DEFAULT NULL,
  `state_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
  `create_datetime` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `product_id` (`product_id`),
  KEY `state_name` (`state_name`,`create_datetime`)
(`id`,`product_id`,`state_name`,`create_datetime`)
) ENGINE=InnoDB AUTO_INCREMENT=8132540 DEFAULT CHARSET=utf8;

Running explain on the query, is a bit different since much of the details where omitted compared to the actual query that is run but here is the output

|id |select_type       |table|partitions|type  |possible_keys                                                    |key                |key_len|ref                          |rows  |filtered|Extra                                       |
|---|------------------|-----|----------|------|-----------------------------------------------------------------|-------------------|-------|-----------------------------|------|--------|--------------------------------------------|
|1  |PRIMARY           |p    |          |eq_ref|PRIMARY,guarantee_list_index                                     |PRIMARY            |4      |ppd.product_id               |1     |100     |                                            |
|2  |DEPENDENT SUBQUERY|pl   |          |ref   |product_id,state_name                                            |product_id         |5      |p.id                         |3     |51.5    |Using where                                 |

2

Answers


  1. something like?

    select p.id, p.name, CASE WHEN pl.id IS NULL THEN 0 ELSE 1 END AS has_log
    from product as p
    left join product_log AS pl on pl.product_id = p.id AND pl.state_name in ("Creation", "Auction", ...)
    

    eliminating the sub select executed each row,
    and avoid using count() having to go over all records if you only need exists

    Login or Signup to reply.
  2. You don’t have to count all matches. Instead just check whether at least one match exists with EXISTS or IN.

    SELECT
      p.id,
      p.name,
      EXISTS
      (
        SELECT null
        FROM product_log AS pl
        WHERE pl.product_id = p.id
        AND pl.state_name in ('Creation', 'Auction', ...)
      ) AS has_log
    FROM product AS p;
    

    or

    SELECT
      p.id,
      p.name,
      p.id IN
      (
        SELECT pl.product_id
        FROM product_log AS pl
        WHERE pl.state_name in ('Creation', 'Auction', ...)
      ) AS has_log
    FROM product AS p;
    

    Both queries do the same thing, so the DBMS can can up with the same execution plan for both. But the queries show there are two ways the DBMS may approach the task:

    1. Loop through all products and look up each product’s logs.
    2. Find all products with a requested log and use this list for the lookup.

    These approaches require different indexes. If I want to lookup the logs for a product, I want:

    CREATE INDEX idx1 ON product_log (product_id, state_name);
    

    If I want to build the lookup list first, I want:

    CREATE INDEX idx2 ON product_log (state_name, product_id);
    

    I suggest you create both indexes. Then see the explain plan for the query, see which one is used and drop the other.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search