skip to Main Content
SELECT 
    d.USPS_IMB AS IMb,
    d.CorrelationId,
    d.EINNo AS EIN,
    d.UniqueFormId AS FormId,
    d.ProductionFacility AS Facility,
    CASE 
        WHEN d.ProductionStatusNo < 5 THEN 'Received'
        WHEN d.ProductionStatusNo = 5 THEN 'Printed'
        WHEN d.ProductionStatusNo = 6 THEN 'Folded / Sealed'
        WHEN d.ProductionStatusNo = 7 THEN 'Mailed'
        ELSE 'Unknown'
    END AS Status,
    d.StatusDateTime AS StatusDateUTC
FROM detail d
JOIN (
    SELECT 
        UniqueFormId,
        MAX(ProductionStatusNo) AS MaxProductionStatusNo
    FROM detail
    WHERE ProductionFacility='MAIL'
    GROUP BY UniqueFormId
) AS maxStatus
ON d.UniqueFormId = maxStatus.UniqueFormId AND d.ProductionStatusNo = maxStatus.MaxProductionStatusNo
WHERE d.ProductionFacility='MAIL';

I have above query on detail table and currently I have index on UniqueFormId, productionFacility as well on ProductionStatusNo .

I need suggestions to improve performance on this query so results can be loaded.

CREATE TABLE `detail` (
  `Id` int NOT NULL AUTO_INCREMENT,
 
  `EINNo` varchar(45) NOT NULL,
  `EmployeeNo` varchar(45) NOT NULL,
  `Form` varchar(45) NOT NULL,
  `ProductionStatus` varchar(45) NOT NULL,
  `ProductionStatusNo` int NOT NULL,
  
  `StatusDateTime` datetime DEFAULT NULL,
 
  `UniqueFormId` varchar(450) NOT NULL,
  `ProductionFacility` varchar(450) NOT NULL,
  
  `USPS_IMB` varchar(45) DEFAULT NULL,
  `StatusDate` date DEFAULT NULL
 
  PRIMARY KEY (`Id`,`ProductionStatusNo`),
  KEY `idx_detail_EINNo` (`EINNo`),
  KEY `idx_detail_EmployeeNo` (`EmployeeNo`),
  KEY `idx_detail_Form` (`Form`),
  KEY `idx_detail_UniqueFormId` (`UniqueFormId`),
  KEY `idx_detail_ProductionFacility` (`ProductionFacility`),
  KEY `idx_detail_ProductionStatusNo` (`ProductionStatusNo`),
  KEY `idx_detail_StatusDate` (`StatusDate`),
  
  KEY `idx_detail_PackageReference2` (`PackageReference2`),
  KEY `idx_detail_CartonID` (`CartonID`)
) ENGINE=InnoDB AUTO_INCREMENT=216695199 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`ProductionStatusNo`)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (3) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (4) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (5) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (6) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (7) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (8) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (9) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (10) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN (11) ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN (12) ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN (13) ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;

2

Answers


  1. To optimize your SQL query you can try the following:

    Create a composite index:

    CREATE INDEX idx_detail_facility_form_status ON detail (ProductionFacility, UniqueFormId, ProductionStatusNo);
    

    Reevaluate partitioning by possibly using ProductionFacility:

    ALTER TABLE detail
    PARTITION BY RANGE (ProductionFacility) (
        PARTITION p0 VALUES LESS THAN ('A'),
        PARTITION p1 VALUES LESS THAN ('B'),
        -- add additional partitions as needed
    );
    

    Optimize the subquery:

    SELECT 
        UniqueFormId,
        MAX(ProductionStatusNo) AS MaxProductionStatusNo
    FROM detail
    WHERE ProductionFacility = 'MAIL'
    GROUP BY UniqueFormId;
    

    Create an additional index:

    CREATE INDEX idx_detail_formid_status ON detail (UniqueFormId, ProductionStatusNo);
    

    Analyze the query execution plan:

    EXPLAIN SELECT 
        d.USPS_IMB AS IMb,
        d.CorrelationId,
        d.EINNo AS EIN,
        d.UniqueFormId AS FormId,
        d.ProductionFacility AS Facility,
        CASE 
            WHEN d.ProductionStatusNo < 5 THEN 'Received'
            WHEN d.ProductionStatusNo = 5 THEN 'Printed'
            WHEN d.ProductionStatusNo = 6 THEN 'Folded / Sealed'
            WHEN d.ProductionStatusNo = 7 THEN 'Mailed'
            ELSE 'Unknown'
        END AS Status,
        d.StatusDateTime AS StatusDateUTC
    FROM detail d
    JOIN (
        SELECT 
            UniqueFormId,
            MAX(ProductionStatusNo) AS MaxProductionStatusNo
        FROM detail
        WHERE ProductionFacility='MAIL'
        GROUP BY UniqueFormId
    ) AS maxStatus
    ON d.UniqueFormId = maxStatus.UniqueFormId AND d.ProductionStatusNo = maxStatus.MaxProductionStatusNo
    WHERE d.ProductionFacility='MAIL';
    

    Ensure data types and schema are optimized and perform regular maintenance tasks such as ANALYZE TABLE to keep statistics up-to-date.

    Login or Signup to reply.
  2. In the derived table (which will probably be run first), detail needs

    INDEX(ProductionFacility, UniqueFormId, ProductionStatusNo)
    

    with the columns in the indicated order.

    The outer SELECT will also like that index.

    The given PARTITIONing probably slows down the query. Please explain why you chose it.

    Given that composite index, this is no longer useful (so DROP it):

    KEY `idx_detail_ProductionFacility` (`ProductionFacility`)
    

    Note: Three Single-column indexes will not do as good as the "composite" index I suggest.

    The size of ProductionFacility varchar(450) NOT NULL — may give you trouble. Shrink the size and/or normalize it. Ditto for UniqueFormId.

    For further discussion, please provide EXPLAIN SELECT.

    (Alas, if you really have 216695199 rows, the ALTERs I am suggesting will take a long time.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search