I have a table as follows:
CREATE TABLE `table` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`cc` int(3) unsigned NOT NULL,
`number` int(10) NOT NULL,
`name` varchar(64) NOT NULL,
`datetime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
The DBMS is MariaDB 10.1.26 on Debian 9.1. I had been trying to get it to list ranges of consecutive numbers. With the following query, I am able to accomplish that:
SELECT min(number) first_number, max(number) last_number, count(*) AS no_records FROM (
SELECT c.*, @rn := @rn + 1 rn
from (SELECT number FROM table WHERE cc = 1 GROUP BY number ORDER BY number) AS c
CROSS JOIN (SELECT @rn := 0) r
) c
GROUP BY number - rn ORDER BY number ASC
But if I wanted items bunched together based on the value in an additional column, this doesn’t work. Say I want the items grouped only when their values for name
are all the same. Say this is my data:
INSERT INTO `table` (`id`, `cc`, `number`, `name`) VALUES
(1, 1, 12, 'Hello'),
(2, 1, 2, 'Apple'),
(3, 1, 3, 'Bean'),
(4, 1, 10, 'Hello'),
(5, 1, 11, 'Hello'),
(6, 1, 1, 'Apple'),
(7, 1, 14, 'Deer'),
(8, 1, 14, 'Door'),
(9, 1, 15, 'Hello'),
(10, 1, 17, 'Hello'),
I’d like to get a report like this:
first last count name
1 2 2 Apple
3 3 1 Bean
10 12 3 Hello
14 14 1 Deer
14 14 1 Door
15 15 1 Hello
17 17 1 Hello
In other words, in addition to grouping items that are consecutive, these groups are split up into separate groups when their values of name
differ. (In other words, items are only in an island together if they are all consecutive AND have the same exact name
). The closest (and it’s not very close) that I have come, is doing this:
SELECT min(number) first_number, max(number) last_number, count(*) AS no_records FROM (
SELECT c.*, @rn := @rn + 1 rn
from (SELECT number FROM table WHERE cc = 1 GROUP BY number, name ORDER BY number) AS c
CROSS JOIN (SELECT @rn := 0) r
) c
GROUP BY number - rn, name ORDER BY number ASC
This doesn’t work, though, and what happens is it seems to return the first appearance of a name as first
and the last appearance as last
, with no_records
being the difference in number between them, which is certainly not right at all.
I feel like this question might be related, but I’ve not been able to make much sense of it, and when I tried tweaking it to my table, it just did the equivalent of a simple SELECT *
more or less. What modifications to my query do I need to make to get it to work?
Keep in mind:
- items can be inserted in any order
- Numbers can be repeated
- Names can be repeated, not necessarily consecutively
2
Answers
Your example is not a gaps-and-islands problem. If it is representative of your actual problem, you can just use aggregation:
I say this because gaps-and-islands is much more challenging without window functions. And that begs the question of why you are not using a more recent version of MariaDB. The end-of-life for 10.1 is October of this year, anyway.
EDIT:
As a gaps-and-islands, this is a little tricky because each name has to be handled separately. The trick is to use
row_number()
with partitioning:If you have adjacent numbers for a name and subtract a sequential value, then the result is constant. For instance:
The
diff
identifies the groups for aggregation.Oops, I forget this was for a soon-to-be-obsolete version of MariaDB:
For performance, you want an index on
(name, number)
. The performance should be reasonable, unless names have more than a few hundred rows.Here is a db<>fiddle.
There is not much to change in your query. You basically need to select
name
andnumber
in the subquery and sort in the same order. Then you can group byname, number - rn
in the outer query.Result:
db<>fiddle
I usually advocate against the use of session variables in this way. The reason is that such solutions depend on internal implementation, and can be broken by version updates or settings changes. For example: Once MariaDB decided to ignore the ORDER BY clause in subqueries without LIMIT. This is why I included a huge LIMIT.
I also replaced
number
withfirst_number
in the outer ORDER BY clause to avoid problems with ONLY_FULL_GROUP_BY mode.A more stable way to generate row numbers is to use an AOTO_INCREMENT column in a temporary table:
The final SELECT query is identical with the outer query above:
db<>fiddle
In a more recent version (starting from MariaDB 10.2) you can use
ROW_NUMBER()
window function instead:db<>fiddle