I’m having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:
Region | Year | Difference | ||
---|---|---|---|---|
central | 01 | null | ||
central | 02 | 10 | ||
central | 03 | -9 | ||
east | 01 | -29 | ||
east | 02 | 10 | ||
east | 03 | -9 | central | -29 |
west | 02 | 10 | ||
west | 03 | -9 | central | -29 |
west | 02 | 10 | ||
north | 03 | -9 | central | -29 |
north | 02 | 10 | ||
north | 03 | -9 |
What I currently did to find the difference was to utilise the lag function in sql.
The current problems I’m facing are:
- The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
- After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?
Any help would be appreciated, I’m racking my brain at this query. Thank you!
3
Answers
Managed to solve it after some thinking with a nested select
You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.
Also here: https://dbfiddle.uk/83qE9RX2
First we fix the difference column by making every first year value
null
.Then we group by region
andcount
all the case wheredifference < 0
. After that we usedense_rank()
to find all the regions with the top count results (including ties).Fiddle