I want to ask a question,
imagine i have this query:
SELECT COUNT(DISTINCT (`patient_id`)) AS `num`
FROM `visits` AS `Visit`
INNER JOIN `patients` AS `patient` ON `Visit`.`patient_id` = `patient`.`id`
WHERE `Visit`.`doctor_id` = 21293
AND `Visit`.`office_id` = 14712
AND `Visit`.`started_at` IS NOT NULL
LIMIT 1;
now the query is slow and the started_at is indexed.
if i do this instead :
SELECT COUNT(DISTINCT (`patient_id`)) AS `num`
FROM `visits` AS `Visit`
INNER JOIN `patients` AS `patient` ON `Visit`.`patient_id` = `patient`.`id`
WHERE `Visit`.`doctor_id` = 21293
AND `Visit`.`office_id` = 14712
AND `Visit`.`started_at` > '2000-01-01'
LIMIT 1;
and use comparison for specifying the nullability of the column, it is getting faster. now i wonder if this is the case for all of my queries and i can use that tactic for all of them. is it faster in all cases?
I didnt try anything im just asking if they know it for a fact
2
Answers
Data volume plays a major part in discussing about the efficiency of both these queries. The second query seems more efficient due to its specific date filter. But, the actual performance difference can be due to other factors as well, like indexing, data, and database optimization strategies.
It largely depends on your table configuration and your index. If you have a simple index on
started_at
, then it creates a data-structure under the hood that optimizes your searches upon the indexed field-set. Speed largely depends on how that data-structure is being implemented under the hood and how it handles your scenarios. See this article as an example reference: https://builtin.com/data-science/b-tree-indexNow, you have a first search that checks for not null. We know that null is the unknown and therefore the RDBMS will not be able to properly compare any timestamp with
null
, see this:so if your indexed graph’s nodes are being compared against null, then each of the items will likely be compared to null and that will amount to as many comparisons as many nodes you have whereas your second query will quickly evaluate entire trees of nodes by comparing the root to your search criteria, greatly reducing the number of comparisons.
However, if you define an index on whether a field is null (see this for reference:https://dev.mysql.com/doc/refman/8.4/en/is-null-optimization.html), then the is null or is not null comparisons will be quick.
If you want to make them both quick, then you can create a multiple-column index, the first index criteria checking whether the field is null and the second index criteria would depend upon its value. Then you could filter by