I have a MySQL dev_table
with thousands of products identified by their Name, containing changes to price and availability (among other values) over time. The table contains only the dates where a change occurred.
I am looking for an efficient way to select the latest results for all products available on a certain date – e.g. on 2016-01-01 – even though there may be no entry for the product for the given date.
SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));
CREATE TABLE `dev_table` (`uid` int(11) NOT NULL, `Name` varchar(200) NOT NULL, `Date` date NOT NULL, `Available` tinyint(1) NOT NULL, `Price` decimal(5,2) NOT NULL, `Size` tinyint(1) NOT NULL);
ALTER TABLE `dev_table` ADD PRIMARY KEY (`uid`);
INSERT INTO `dev_table` (`uid`, `Name`, `Date`, `Available`, `Price`, `Size`) VALUES
(1, 'product A', '2000-01-01', 1, '123.33', 5), (2, 'product B', '2000-01-01', 1, '17.50', 1), (3, 'product C', '2003-06-01', 1, '578.25', 6), (4, 'product A', '2015-08-07', 1, '131.67', 2), (5, 'product D', '2016-01-01', 1, '8.02', 2),
(6, 'product C', '2016-01-01', 0, '478.72', 6), (7, 'product A', '2023-01-27', 1, '134.07', 5), (8, 'product B', '2023-09-01', 0, '12.01', 1);
+-----+------------+-------------+-----------+--------+------+
| uid | Name | Date | Available | Price | Size |
+-----+------------+-------------+-----------+--------+------+
| 1 | product A | 2000-01-01 | 1 | 123.33 | 5 |
| 2 | product B | 2000-01-01 | 1 | 17.50 | 1 |
| 3 | product C | 2003-06-01 | 1 | 578.25 | 6 |
| 4 | product A | 2015-08-07 | 1 | 131.67 | 2 |
| 5 | product D | 2016-01-01 | 1 | 8.02 | 2 |
| 6 | product C | 2016-01-01 | 0 | 478.72 | 6 |
| 7 | product A | 2023-01-27 | 1 | 134.07 | 5 |
| 8 | product B | 2023-09-01 | 0 | 12.01 | 1 |
+-----+------------+-------------+-----------+--------+------+
I use 2 nested subqueries to achieve this:
SELECT * FROM `dev_table`
WHERE `uid` IN (
SELECT max(subquery1.`uid`) AS lastUid FROM `dev_table` AS subquery1
WHERE subquery1.`Date` <= '2016-01-01'
AND subquery1.`Name` NOT IN (
SELECT subquery2.`Name` FROM `dev_table` AS subquery2
WHERE subquery2.`Date` <= '2016-01-01'
AND subquery2.`Available` != 1
)
GROUP BY subquery1.`Name`
ORDER by lastUid DESC
);
This does return the rows for uid 2, 4 and 5 as expected, these being the latest entries for product B, A and D respectively, and not returning uid 6, as product D is not available on the given date.
This solution is inefficient on larger tables, however. It gets even worse when adding more criteria such as MATCH (`Name`) AGAINST ('product' IN BOOLEAN MODE)
. Also, ONLY_FULL_GROUP_BY
does seem a bit outdated.
How can I improve this query?
3
Answers
You can use the
HAVING
clause to exclude those with at least one unavailable flag, to do so, we must select only those with total available products. Total products ‘SUM(d.Available)
‘ equal total products ‘COUNT(*)
‘ :Demo here
You’re right that nested subqueries can become quite inefficient, especially as the table grows. One approach to optimize your query is by using window functions and common table expressions (CTEs).
Your current query suggests that a product cannot go from
Available = 0
back toAvailable = 1
. Is this correct? If not, both your query and SelVazi’s will return incorrect results, as they will not include products which become available again.Assuming products can become available again, you need to check the availability of the most recent row on, or before, the desired date. Whether a window function (as suggested by Amit) or max subquery is more efficient will depend on the scale and distribution of your data, as well as the rest of the query.
This query requires that both
uid
andDate
are in sequence, but should be slightly more efficient as it joins on PK, not a secondary index:This query joins on the secondary index, so may be slightly less efficient than the query above, and it will produce multiple rows for the same
Name
if there are multiple rows for the sameName
on the sameDate
(e.g. different sizes):Both of these queries require an index on
(Name, Date)
:Here’s a db<>fiddle.