Mysql - Calculating difference between consecutive rows, then finding top 3 IDs with highest count of negative differences SQL

Gerald
October 11, 2022
216 views
3 votes
3 Answers

I’m having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:

Region	Year	Difference
central	01	null
central	02	10
central	03	-9
east	01	-29
east	02	10
east	03	-9	central	-29
west	02	10
west	03	-9	central	-29
west	02	10
north	03	-9	central	-29
north	02	10
north	03	-9

What I currently did to find the difference was to utilise the lag function in sql.

The current problems I’m facing are:

The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?

Any help would be appreciated, I’m racking my brain at this query. Thank you!

Answers

Chosen as BEST ANSWER
- Gerald
- October 11, 2022 at 6:35 pm
- 0 votes
0
Managed to solve it after some thinking with a nested select
```
select region, count(*) 
from ("nested statement here") f
where difference < 0
group by region
order by count(*) desc
limit 3
```

(Edit)

- WilliJakob
- October 11, 2022 at 6:16 pm
- 0 votes
0
You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.
```
SELECT Region
, SUM(CASE WHEN IFNULL(Difference,0) < 0 THEN 1 ELSE 0 END) CountDifference
FROM yourTable
GROUP BY Region
ORDER BY CountDifference DESC
LIMIT 3
```
Also here: https://dbfiddle.uk/83qE9RX2
Login or Signup to reply.

- DannySlor
- October 11, 2022 at 6:44 pm
- 0 votes
0
First we fix the difference column by making every first year value null. Then we group by region and count all the case where difference < 0. After that we use dense_rank() to find all the regions with the top count results (including ties).
```
select   Region
        ,negative_differece_count
from     (
         select   *
                  ,dense_rank() over(order by negative_differece_count desc) as dns_rnk
         from     (  
                  select   Region
                          ,count(case when (case when Year = 1 then null else Difference end) < 0 then 1 end) as negative_differece_count
                  from     t
                  group by Region
                  ) t
         ) t
where    dns_rnk <= 3
```
Region negative_differece_count

north 2

central 1

east 1

west 1

Fiddle
Login or Signup to reply.