I am really struggling with producing some code create ‘level1 question’ & ‘level2 question’ columns.
The logic to populate the columns is:
- ‘L1’ when level = 1 & type = group then question
- ‘L2’ when level = 2 & type = group then question
But I can not find an efficient way of creating ‘level1 question’ & ‘level2 question’ like in the example.
id | sort | type | question | level | answer | L1 | L2 | level1 question | level2 question |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | Group | Location | 1 | NULL | Location | NULL | Location | NULL |
1 | 2 | Question | UK | 2 | Yes | NULL | NULL | Location | NULL |
1 | 3 | Question | EU | 2 | No | NULL | NULL | Location | NULL |
1 | 4 | Group | Asia | 2 | Yes | NULL | ASIA | Location | ASIA |
1 | 5 | Question | AUS | 3 | Yes | NULL | NULL | Location | ASIA |
1 | 6 | Question | NZ | 3 | No | NULL | NULL | Location | ASIA |
1 | 7 | Group | Season | 1 | NULL | Season | NULL | Season | NULL |
1 | 8 | Question | Summer | 2 | Yes | NULL | NULL | Season | NULL |
2 | 1 | Group | City | 1 | NULL | City | NULL | City | NULL |
2 | 2 | Question | London | 2 | Yes | NULL | NULL | City | NULL |
2 | 3 | Group | Street | 2 | NULL | NULL | Street | City | Street |
2 | 4 | Question | King ST | 3 | No | NULL | NULL | City | Street |
I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.
'level1 question' =
(select top 1 L1
from table2 t2
where L1 is not null
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
, 'level2 question' =
(select top 1 L2
from table2 t2
where L2 is not null AND (Level >2 OR Type = 'Group')
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
2
Answers
I think you’re experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.
The syntax of your code suggests that you are running SQL Server.
In the 2022 version, we can use
last_value()
withignore nulls
to efficiently retrieve the latest non-null of the conditional computation:In earlier versions, that do not support
ignore nulls
, we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values: