skip to Main Content

I’m scratching my head on this one. The below essentially times out on both my local machine and AWS. Here’s the kicker, each subquery runs fine. The first subquery takes 55ms and returns 24000 records. The second takes 42ms and returns 4800 records.

I’m wondering if there’s a server configuration issue I’m overlooking?

The person_main table is fully indexed as well, and only has 115,212 records.

Here’s the query details:

-- This takes 20ms, 4800 records
SELECT phw.person_main_ref_id AS ID
FROM person_history_work phw
WHERE MATCH(phw.work_summary) AGAINST('finance');

-- This takes 20ms, 24000 records
SELECT phw.person_main_ref_id AS ID
FROM person_history_work phw
WHERE MATCH(phw.work_title) AGAINST('software');

-- This takes 700ms, returns 14000 records
SELECT pm.ID
FROM person_main pm      
WHERE
pm.ID IN  
(
    -- 49838 records
    SELECT phw.person_main_ref_id AS ID
    FROM person_history_work phw
    WHERE phw.ID < 50000
)
AND
pm.ID IN
(
    -- 299679 records
    SELECT phw.person_main_ref_id AS ID
    FROM person_history_work phw
    WHERE phw.ID > 10000
);

-- This times out at over 60000ms, should return at max 4800 records
SELECT pm.ID
FROM person_main pm      
WHERE
pm.ID IN  
(
    -- 4800 records
    SELECT phw.person_main_ref_id AS ID
    FROM person_history_work phw
    WHERE MATCH(phw.work_summary) AGAINST('finance')
)
AND
pm.ID IN
(
    -- 24000 records
    SELECT phw.person_main_ref_id AS ID
    FROM person_history_work phw
    WHERE MATCH(phw.work_title) AGAINST('software')
)

If I execute the query with just one of the subqueries it takes only 60ms.

The issue is if I use AND with either of them, it times out. The order doesn’t matter. It makes absolutely no sense as it’s only comparing IDs to being in the result sets.

If I change it to OR, the query completes in 800ms.

Here’s the EXPLAIN:

enter image description here

Any ideas what’s going on here?

===

Here’s the create for person_main and person_history_work.

    CREATE TABLE `person_main` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `date_created` datetime NOT NULL DEFAULT (now()),
  `date_updated` datetime NOT NULL DEFAULT (now()) ON UPDATE CURRENT_TIMESTAMP,
  `name_first` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  `name_middle` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `name_last` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  `name_nick` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `name_full` text COLLATE utf8mb4_unicode_ci GENERATED ALWAYS AS (trim(concat(if((`name_nick` is null),_utf8mb4'',concat(`name_nick`,_utf8mb4' ')),if((`name_first` is null),_utf8mb4'',concat(`name_first`,_utf8mb4' ')),if((`name_middle` is null),_utf8mb4'',concat(`name_middle`,_utf8mb4' ')),if((`name_last` is null),_utf8mb4'',`name_last`)))) STORED NOT NULL,
  `date_birth` date DEFAULT NULL,
  `department_type_gc` smallint DEFAULT NULL,
  `designation` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `ID_UNIQUE` (`ID`),
  KEY `IDX_person_main_name_first` (`name_first`),
  KEY `IDX_person_main_name_middle` (`name_middle`),
  KEY `IDX_person_main_name_last` (`name_last`),
  KEY `IDX_person_main_name_nick` (`name_nick`),
  KEY `IDX_person_main_date_created` (`date_created`),
  KEY `IDX_person_main_date_updated` (`date_updated`),
  KEY `IDX_person_main_date_birth` (`date_birth`),
  KEY `IDX_person_main_department_type_gc` (`department_type_gc`),
  KEY `IDX_person_main_designation` (`designation`(128)),
  FULLTEXT KEY `IDX_person_main_name_full` (`name_full`)
) ENGINE=InnoDB AUTO_INCREMENT=177557 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AVG_ROW_LENGTH=55

CREATE TABLE `person_history_work` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `person_main_ref_id` int NOT NULL COMMENT 'Pointer back to people_main to define owner.',
  `internal_company_name` text COLLATE utf8mb4_unicode_ci,
  `company_main_ref_id` int DEFAULT NULL,
  `work_title` text COLLATE utf8mb4_unicode_ci,
  `work_title_scrubbed` text COLLATE utf8mb4_unicode_ci GENERATED ALWAYS AS (regexp_replace(`work_title`,_utf8mb4'[^0-9A-Za-z]',_utf8mb4'')) STORED,
  `date_start` date DEFAULT NULL,
  `date_end` date DEFAULT NULL,
  `work_summary` text COLLATE utf8mb4_unicode_ci,
  `person_email_ref_id` int DEFAULT NULL,
  `person_phone_ref_id` int DEFAULT NULL,
  `person_address_ref_id` int DEFAULT NULL,
  `compensation_main_ref_id` int DEFAULT NULL,
  `history_work_details` json DEFAULT NULL,
  `processed_sit` tinyint NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  UNIQUE KEY `UK_person_history_work_ID` (`ID`),
  KEY `fk_people_main_id_idx` (`person_main_ref_id`),
  KEY `IDX_person_history_work_company_main_ref_id` (`company_main_ref_id`),
  KEY `IDX_person_history_work_date_start` (`date_start`),
  KEY `IDX_person_history_work_date_end` (`date_end`),
  KEY `FK_person_history_work_compensation_main_ref_id` (`compensation_main_ref_id`),
  KEY `FK_person_history_work_person_email_ref_id` (`person_email_ref_id`),
  KEY `FK_person_history_work_person_phone_ref_id` (`person_phone_ref_id`),
  KEY `IDX_person_history_work_work_title_scrubbed` (`work_title_scrubbed`(128)),
  KEY `IDX_person_history_work_processed_sit` (`processed_sit`),
  FULLTEXT KEY `IDX_person_history_work_work_summary` (`work_summary`),
  FULLTEXT KEY `IDX_person_history_work_internal_company_name` (`internal_company_name`),
  FULLTEXT KEY `IDX_person_history_work_work_title` (`work_title`),
  CONSTRAINT `FK_person_history_work_compensation_main_ref_id` FOREIGN KEY (`compensation_main_ref_id`) REFERENCES `compensation_main` (`ID`),
  CONSTRAINT `FK_person_history_work_person_email_ref_id` FOREIGN KEY (`person_email_ref_id`) REFERENCES `person_email` (`ID`),
  CONSTRAINT `FK_person_history_work_person_phone_ref_id` FOREIGN KEY (`person_phone_ref_id`) REFERENCES `person_phone` (`ID`),
  CONSTRAINT `fk_phw_person_main_id` FOREIGN KEY (`person_main_ref_id`) REFERENCES `person_main` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=369407 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AVG_ROW_LENGTH=81

2

Answers


  1. Your indices have no statistics. Try running ANALYZE on the tables then run the query again.

    And I would be rather surprised if you don’t get a MASSIVE performance improvment if you use proper joins instead of IN (subquery).

    Login or Signup to reply.
  2. IN ( SELECT ) has historically been inefficiently optimized; avoid it.

    Your query starts with the large, unfiltered, table. Let’s rearrange things to start with the other table.

    SELECT  phw1.person_main_ref_id AS ID
        FROM  person_history_work phw1
        JOIN  person_history_work phw2 USING(person_main_ref_id)
        WHERE  MATCH(phw1.work_summary) AGAINST('finance')
          AND  MATCH(phw2.work_title) AGAINST('software');
    

    Better yet would be to combine the two FT indexes:

    FULLTEXT 'both' (work_summary, work_title) 
    
    SELECT  person_main_ref_id AS ID
        FROM  person_history_work
        WHERE  MATCH(pwork_summary, work_title)
             AGAINST('+finance +software' IN BOOLEAN MODE) 
    

    However, that may have a problem — the keywords may show up in either column. Consider whether this is a benefit or a drawback. And think about workarounds. If this is just the first of several filtering processes, this problem may not be serious.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search