skip to Main Content

I’m having a lot of difficulty writing an SQL query. I need to compute the difference between consecutive rows for 4 different regions across different years, then subsequently display only the top 3 regions with the highest count of negative differences. An example as shown below:

Region Year Difference
central 01 null
central 02 10
central 03 -9
east 01 -29
east 02 10
east 03 -9 central -29
west 02 10
west 03 -9 central -29
west 02 10
north 03 -9 central -29
north 02 10
north 03 -9

What I currently did to find the difference was to utilise the lag function in sql.

The current problems I’m facing are:

  • The first year for each region should all have null differences, since there is no year before the first year. However, only my first row in the select statement has a null value, the rest of the first years for the other regions have a non-null as there is a value before it (the latest year in the previous region comes before it).
  • After obtaining all the differences, how do I only show the top 3 regions that have the most instances of negative differences?

Any help would be appreciated, I’m racking my brain at this query. Thank you!

3

Answers


  1. Chosen as BEST ANSWER

    Managed to solve it after some thinking with a nested select

    select region, count(*) 
    from ("nested statement here") f
    where difference < 0
    group by region
    order by count(*) desc
    limit 3
    

  2. You have to convert null values in Differnce into something, for ex. 0 if 0 means a neutral value because there is no information.

    SELECT Region
    , SUM(CASE WHEN IFNULL(Difference,0) < 0 THEN 1 ELSE 0 END) CountDifference
    FROM yourTable
    GROUP BY Region
    ORDER BY CountDifference DESC
    LIMIT 3
    

    Also here: https://dbfiddle.uk/83qE9RX2

    Login or Signup to reply.
  3. First we fix the difference column by making every first year value null. Then we group by region and count all the case where difference < 0. After that we use dense_rank() to find all the regions with the top count results (including ties).

    select   Region
            ,negative_differece_count
    from     (
             select   *
                      ,dense_rank() over(order by negative_differece_count desc) as dns_rnk
             from     (  
                      select   Region
                              ,count(case when (case when Year = 1 then null else Difference end) < 0 then 1 end) as negative_differece_count
                      from     t
                      group by Region
                      ) t
             ) t
    where    dns_rnk <= 3
    
    Region negative_differece_count
    north 2
    central 1
    east 1
    west 1

    Fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search