Mysql - Fill in Nulls with Row above

Chris
June 22, 2023
211 views
0 votes
2 Answers

I am really struggling with producing some code create ‘level1 question’ & ‘level2 question’ columns.

The logic to populate the columns is:

‘L1’ when level = 1 & type = group then question
‘L2’ when level = 2 & type = group then question

But I can not find an efficient way of creating ‘level1 question’ & ‘level2 question’ like in the example.

id	sort	type	question	level	answer	L1	L2	level1 question	level2 question
1	1	Group	Location	1	NULL	Location	NULL	Location	NULL
1	2	Question	UK	2	Yes	NULL	NULL	Location	NULL
1	3	Question	EU	2	No	NULL	NULL	Location	NULL
1	4	Group	Asia	2	Yes	NULL	ASIA	Location	ASIA
1	5	Question	AUS	3	Yes	NULL	NULL	Location	ASIA
1	6	Question	NZ	3	No	NULL	NULL	Location	ASIA
1	7	Group	Season	1	NULL	Season	NULL	Season	NULL
1	8	Question	Summer	2	Yes	NULL	NULL	Season	NULL
2	1	Group	City	1	NULL	City	NULL	City	NULL
2	2	Question	London	2	Yes	NULL	NULL	City	NULL
2	3	Group	Street	2	NULL	NULL	Street	City	Street
2	4	Question	King ST	3	No	NULL	NULL	City	Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

'level1 question' = 
                (select top 1 L1 
                from table2 t2
                where L1 is not null
                and t1.Sort >= t2.Sort and t1.ID = t2.ID
                order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
            from table2 t2
            where L2 is not null AND (Level >2 OR Type = 'Group')
            and t1.Sort >= t2.Sort and t1.ID = t2.ID
            order by id , sort DESC)

Answers

- DaveMullin
- June 21, 2023 at 10:55 pm
- 0 votes
0
I think you’re experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.

Login or Signup to reply.

The syntax of your code suggests that you are running SQL Server.

In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

select t.*,
    last_value(case when level = 1 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

In earlier versions, that do not support ignore nulls, we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = 'Group' then question end,
            case when level = 2 and type = 'Group' then question end
        ) 
    ) as v(question1, question2)
) t

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – Fill in Nulls with Row above

Answers