I’m trying to group rows when it starts with 1 and has consecutive 0s, and stop grouping once it meets another 1 or 2.
Here is an example: (column ‘group’ is what I expect to see)
+-----+------+-------+
| seq | data | group |
+-----+------+-------+
| 1 | 1 | 1 |
| 2 | 0 | 1 |
| 3 | 0 | 1 |
| 4 | 0 | 1 |
| 5 | 2 | |
| 6 | 2 | |
| 7 | 1 | 2 |
| 8 | 0 | 2 |
| 9 | 0 | 2 |
| 10 | 1 | 3 |
| 11 | 0 | 3 |
| 12 | 0 | 3 |
| 13 | 2 | |
| 14 | 0 | |
| 15 | 0 | |
| 16 | 0 | |
| 17 | 0 | |
| 18 | 0 | |
| 19 | 2 | |
| 20 | 2 | |
| 21 | 1 | 4 |
| 22 | 0 | 4 |
| 23 | 0 | 4 |
| 24 | 0 | 4 |
| 25 | 2 | |
| 26 | 2 | |
| 27 | 0 | |
| 28 | 1 | 5 |
| 29 | 0 | 5 |
| 30 | 0 | 5 |
| 31 | 0 | 5 |
| 32 | 1 | 6 |
| 33 | 0 | 6 |
| 34 | 0 | 6 |
| 35 | 0 | 6 |
| 36 | 0 | 6 |
| 37 | 2 | |
| 38 | 0 | |
| 39 | 0 | |
| 40 | 0 | |
| 41 | 0 | |
| 42 | 0 | |
| 43 | 0 | |
| 44 | 0 | |
| 45 | 0 | |
| 46 | 2 | |
| 47 | 2 | |
| 48 | 2 | |
+-----+------+-------+
I’ve tried using coalesce to make it possible, but this is what I’ve got:
select seq, data, coalesce((select sum(data) +1
from mytable pre
where pre.seq <= mytable.seq and pre.data = 1),0) as s1
from mytable
+-----+------+-------+
| seq | data | group |
+-----+------+-------+
| 1 | 1 | 2 |
| 2 | 0 | 2 |
| 3 | 0 | 2 |
| 4 | 0 | 2 |
| 5 | 2 | 2 |
| 6 | 2 | 2 |
| 7 | 1 | 3 |
| 8 | 0 | 3 |
| 9 | 0 | 3 |
| 10 | 1 | 4 |
| 11 | 0 | 4 |
| 12 | 0 | 4 |
| 13 | 2 | 4 |
| 14 | 0 | 4 |
| 15 | 0 | 4 |
| 16 | 0 | 4 |
| 17 | 0 | 4 |
| 18 | 0 | 4 |
| 19 | 2 | 4 |
| 20 | 2 | 4 |
| 21 | 1 | 5 |
| 22 | 0 | 5 |
| 23 | 0 | 5 |
| 24 | 0 | 5 |
| 25 | 2 | 5 |
| 26 | 2 | 5 |
| 27 | 0 | 5 |
| 28 | 1 | 6 |
| 29 | 0 | 6 |
| 30 | 0 | 6 |
| 31 | 0 | 6 |
| 32 | 1 | 7 |
| 33 | 0 | 7 |
| 34 | 0 | 7 |
| 35 | 0 | 7 |
| 36 | 0 | 7 |
| 37 | 2 | 7 |
| 38 | 0 | 7 |
| 39 | 0 | 7 |
| 40 | 0 | 7 |
| 41 | 0 | 7 |
| 42 | 0 | 7 |
| 43 | 0 | 7 |
| 44 | 0 | 7 |
| 45 | 0 | 7 |
| 46 | 2 | 7 |
| 47 | 2 | 7 |
| 48 | 2 | 7 |
+-----+------+-------+
I want to stop grouping once it meets 1 or 2, could you help me figure this out? Thanks
++
I have a similar task, which is to group from bottom to top but with a slight different conditions.
I need to group data when it starts with data ‘2’, and close the group when it meets 1 or 2 but need to include them within the group too (0 or 1). Below is the expected output.
+-----+------+-------+ | seq | data | group | +-----+------+-------+ | 1 | 1 | 11 | | 2 | 0 | 11 | | 3 | 0 | 11 | | 4 | 0 | 11 | | 5 | 2 | 11 | | 6 | 2 | 10 | | 7 | 1 | | | 8 | 0 | | | 9 | 0 | | | 10 | 1 | 9 | | 11 | 0 | 9 | | 12 | 0 | 9 | | 13 | 2 | 9 | | 14 | 0 | 8 | | 15 | 0 | 8 | | 16 | 0 | 8 | | 17 | 0 | 8 | | 18 | 0 | 8 | | 19 | 2 | 8 | | 20 | 2 | 7 | | 21 | 1 | 6 | | 22 | 0 | 6 | | 23 | 0 | 6 | | 24 | 0 | 6 | | 25 | 2 | 6 | | 26 | 2 | 5 | | 27 | 0 | | | 28 | 1 | | | 29 | 0 | | | 30 | 0 | | | 31 | 0 | | | 32 | 1 | 4 | | 33 | 0 | 4 | | 34 | 0 | 4 | | 35 | 0 | 4 | | 36 | 0 | 4 | | 37 | 2 | 4 | | 38 | 0 | 3 | | 39 | 0 | 3 | | 40 | 0 | 3 | | 41 | 0 | 3 | | 42 | 0 | 3 | | 43 | 0 | 3 | | 44 | 0 | 3 | | 45 | 0 | 3 | | 46 | 2 | 3 | | 47 | 2 | 2 | | 48 | 2 | 1 | +-----+------+-------+
I’ve used the similar approach of @shawnt00 ‘s answer and below is what I got.
It gives what I expected to see, but I want to avoid left outer join if possible. Is there any ways to make this query clean?
with g1 as (
select *,
count(case when data = 2 then 1 end) over (order by seq desc) as grp
from mytable
), g2 as (
select *,
count(case when data > 0 then 1 end) over (partition by grp order by seq desc) as grp2
from g1
)
select curr.seq, curr.data, curr.grp,
CASE WHEN curr.data = 1 AND next.grp2 = 1 THEN 1
ELSE curr.grp2 END AS grp2
from g2 curr left outer join g2 next on curr.seq + 1 = next.seq
-- where grp2 = 1;
order by curr.seq
;
+-----+------+-----+------+
| seq | data | grp | grp2 |
+-----+------+-----+------+
| 1 | 1 | 11 | 1 |
| 2 | 0 | 11 | 1 |
| 3 | 0 | 11 | 1 |
| 4 | 0 | 11 | 1 |
| 5 | 2 | 11 | 1 |
| 6 | 2 | 10 | 1 |
| 7 | 1 | 9 | 3 |
| 8 | 0 | 9 | 2 |
| 9 | 0 | 9 | 2 |
| 10 | 1 | 9 | 1 |
| 11 | 0 | 9 | 1 |
| 12 | 0 | 9 | 1 |
| 13 | 2 | 9 | 1 |
| 14 | 0 | 8 | 1 |
| 15 | 0 | 8 | 1 |
| 16 | 0 | 8 | 1 |
| 17 | 0 | 8 | 1 |
| 18 | 0 | 8 | 1 |
| 19 | 2 | 8 | 1 |
| 20 | 2 | 7 | 1 |
| 21 | 1 | 6 | 1 |
| 22 | 0 | 6 | 1 |
| 23 | 0 | 6 | 1 |
| 24 | 0 | 6 | 1 |
| 25 | 2 | 6 | 1 |
| 26 | 2 | 5 | 1 |
| 27 | 0 | 4 | 3 |
| 28 | 1 | 4 | 3 |
| 29 | 0 | 4 | 2 |
| 30 | 0 | 4 | 2 |
| 31 | 0 | 4 | 2 |
| 32 | 1 | 4 | 1 |
| 33 | 0 | 4 | 1 |
| 34 | 0 | 4 | 1 |
| 35 | 0 | 4 | 1 |
| 36 | 0 | 4 | 1 |
| 37 | 2 | 4 | 1 |
| 38 | 0 | 3 | 1 |
| 39 | 0 | 3 | 1 |
| 40 | 0 | 3 | 1 |
| 41 | 0 | 3 | 1 |
| 42 | 0 | 3 | 1 |
| 43 | 0 | 3 | 1 |
| 44 | 0 | 3 | 1 |
| 45 | 0 | 3 | 1 |
| 46 | 2 | 3 | 1 |
| 47 | 2 | 2 | 1 |
| 48 | 2 | 1 | 1 |
+-----+------+-----+------+
2
Answers
Using subqueries and
lead
to build the runs of uniqueid
s for each valid group of rows:See fiddle
Tag the rows twice. First tally the blocks/runs beginning with a 1. Now within each of those, separate and number by runs that start with anything not zero. The "grouped" rows must then be the first ones in the second list.
The descending groups are fundamentally the same. Since the stop row is included and triggers the transition, the counter is lagged by a single row and the total is adjusted so that it starts counting at one.
This should be more efficient than self joins.