PostgreSQL how to generate a partition row_number() with certain numbers overridden

Gavin
August 25, 2022
228 views
3 votes
3 Answers

I have an unusual problem I’m trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).

I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don’t need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there’s some way I can utilise recursive.

I’ve not been able to solve this problem yet, but I’ve put together a SQL Fiddle which provides a simplified version:

http://sqlfiddle.com/#!17/236b5/5

The desired_dynamic_number is what I’m trying to generate and the generated_dynamic_number is my current work-in-progress attempt.

Any pointers around the best way to achieve the desired_dynamic_number values dynamically?

Update:

I’m almost there using lag:

http://sqlfiddle.com/#!17/236b5/24

Tags: postgresql row-number

Answers

Chosen as BEST ANSWER

The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).

For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.

The SQL:

with tally_table as (
    select row_number() over (order by seq4()) as gen_list
    from table(generator(rowcount => 1000))
),
base as (
select *,
    IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
    from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
    select *
    from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
)  --select * from cart_product;
,
filter_product as (
    select *,
    row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
    from cart_product
  where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
    base.random,
    base.override_as_number,
    base.answer,  -- This is hard coded as test data
    IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;

In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

(Edit)

Below query gives exact result, but you need to verify with all combinations

select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM  
(
select grouped_by , secondary_order_by  ,      
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num                             
,override_as_number,desired_dynamic_number                                                                                   from fiddle    
) a
) b 
) c ;

column "final" is the result

grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final 
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
 AAA        | a                  |       1 |                  1 |                      1 |      1 |   1 |     1
 AAA        | b                  |       2 |                    |                      2 |      2 |   2 |     2
 AAA        | c                  |       3 |                  3 |                      3 |      3 |   3 |     3
 AAA        | d                  |       4 |                  3 |                      3 |      3 |   3 |     3
 AAA        | e                  |       5 |                    |                      4 |      5 |   4 |     4
 AAA        | f                  |       6 |                    |                      5 |      6 |   5 |     5
 AAA        | g                  |       7 |                999 |                    999 |    999 |   6 |   999
 XYZ        | a                  |       1 |                    |                      1 |      1 |   1 |     1
 ZZZ        | a                  |       1 |                    |                      1 |      1 |   1 |     1
 ZZZ        | b                  |       2 |                    |                      2 |      2 |   2 |     2
(10 rows)

Hope this helps!

- SMan
- August 25, 2022 at 10:36 am
- 0 votes
0
step-by-step demo:db<>fiddle
```
SELECT 
    *,
    COALESCE(                                               -- 3
        first_value(override_as_number) OVER w              -- 2
        , 1
    )
    + row_number() OVER w - 1                               -- 4, 5
FROM (
    SELECT
        *,
        SUM(                                                -- 1
            CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
        ) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
            as grouped
    FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
```
1. Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
2. first_value() gives the first value of such subpartition.
3. The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
4. row_number() - 1 creates a row count within a subpartition, starting with 0.
5. Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.