skip to Main Content

I have a join query which takes a lot of time to process.

SELECT
COUNT(c.id)
FROM `customers` AS `c`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` =            `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08')); 

This is primarily because the table ‘customers’ has 2 million records.

I could go over into indexing etc. But, the larger point is, this 2.5 million could become a billion records 1 day.

I’m looking for solutions which can enhance performance.
I’ve given thought to

a) horizontal scalability. -: distribute the mysql table into different sections and query the count independently.

b) using composite indexes.

c) My favourite one -: Just create a seperate collection in mongodb or redis which only houses the count(output of this query) Since, the count is just 1 number. this will not require a huge size aka better query performance (Only question is, how many such queries are there, because that will increase size of the new collection)

2

Answers


  1. Try this and see if it improve performance:

    SELECT
    COUNT(c.id)
    FROM `customers` AS `c`
    INNER JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
    LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
    LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` =            `cx`.`customer_default_address_id`
    LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
    WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
    

    As I mention in the comment, since the condition AND ((DATE(cx.last_email_open_date) > '2019-11-08'));, already made customers table to INNER JOIN with customer_extra table, you might just change it to INNER JOIN customer_extra AS cx ON c.id = cx.customer_id and follow it with other LEFT JOIN.

    The INNER JOIN will at least get the initial result to only return any customer who have last_email_open_date value based on what has been specified.

    Login or Signup to reply.
    • Say COUNT(*), not COUNT(c.id)
    • Remove these; they slow down the query without adding anything that I can see:

      LEFT JOIN  `setting` AS `ssh`  ON `c`.`shop_id` = `ssh`.`id`
      LEFT JOIN  `customers_address` AS `ca`  ON `ca`.`id` = `cx`.`customer_default_address_id`
      LEFT JOIN  `lytcustomer_tier` AS `ct`  ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
      
    • DATE(...) makes that test not “sargable”. This works for DATE or DATETIME; and this is much faster:

      cx.last_email_open_date > '2019-11-08'
      
    • Consider whether that should be >= instead of >.

    • Need an index on shop_id. (Please provide SHOW CREATE TABLE.)
    • Don’t use LEFT JOIN when JOIN would work equally well.
    • If customer_extra is columns that should have been in customer, now is the time to move them in. That would let you use this composite index for even more performance:

      INDEX(shop_id, last_email_open_date)   -- in this order
      

    With those changes, a billion rows in MySQL will probably not be a problem. If it is, there are still more fixes I can suggest.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search