skip to Main Content

I have an unusual problem I’m trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).

I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don’t need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there’s some way I can utilise recursive.

I’ve not been able to solve this problem yet, but I’ve put together a SQL Fiddle which provides a simplified version:

http://sqlfiddle.com/#!17/236b5/5

The desired_dynamic_number is what I’m trying to generate and the generated_dynamic_number is my current work-in-progress attempt.

Any pointers around the best way to achieve the desired_dynamic_number values dynamically?

Update:

I’m almost there using lag:

http://sqlfiddle.com/#!17/236b5/24

3

Answers


  1. Chosen as BEST ANSWER

    The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).

    For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.

    The SQL:

    with tally_table as (
        select row_number() over (order by seq4()) as gen_list
        from table(generator(rowcount => 1000))
    ),
    base as (
    select *,
        IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
        from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
    ) --select * from base order by grouped_by,random;
    ,
    cart_product as (
        select *
        from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
    )  --select * from cart_product;
    ,
    filter_product as (
        select *,
        row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
        from cart_product
      where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
    ) --select * from try2 order by 2,3 ;
    select base.grouped_by,
        base.random,
        base.override_as_number,
        base.answer,  -- This is hard coded as test data
        IFF(override_as_number is null, gen_list, seq_order) as computed_answer
    from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
    order by base.grouped_by,
    random;
    

    In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.


  2. Below query gives exact result, but you need to verify with all combinations

    select c.*,COALESCE(c.override_as_number,c.act) as final FROM
    (
    select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
    (
    select a.*,COALESCE(override_as_number,row_num) as actual FROM  
    (
    select grouped_by , secondary_order_by  ,      
    dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num                             
    ,override_as_number,desired_dynamic_number                                                                                   from fiddle    
    ) a
    ) b 
    ) c ;  
    

    column "final" is the result

    grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final 
    ------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
     AAA        | a                  |       1 |                  1 |                      1 |      1 |   1 |     1
     AAA        | b                  |       2 |                    |                      2 |      2 |   2 |     2
     AAA        | c                  |       3 |                  3 |                      3 |      3 |   3 |     3
     AAA        | d                  |       4 |                  3 |                      3 |      3 |   3 |     3
     AAA        | e                  |       5 |                    |                      4 |      5 |   4 |     4
     AAA        | f                  |       6 |                    |                      5 |      6 |   5 |     5
     AAA        | g                  |       7 |                999 |                    999 |    999 |   6 |   999
     XYZ        | a                  |       1 |                    |                      1 |      1 |   1 |     1
     ZZZ        | a                  |       1 |                    |                      1 |      1 |   1 |     1
     ZZZ        | b                  |       2 |                    |                      2 |      2 |   2 |     2
    (10 rows)
    

    Hope this helps!

    Login or Signup to reply.
  3. step-by-step demo:db<>fiddle

    SELECT 
        *,
        COALESCE(                                               -- 3
            first_value(override_as_number) OVER w              -- 2
            , 1
        )
        + row_number() OVER w - 1                               -- 4, 5
    FROM (
        SELECT
            *,
            SUM(                                                -- 1
                CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
            ) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
                as grouped
        FROM sample
    ) s
    WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
    
    1. Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
    2. first_value() gives the first value of such subpartition.
    3. The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
    4. row_number() - 1 creates a row count within a subpartition, starting with 0.
    5. Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search