I have two SQL queries like this for postgresql
:
The first query is fetching subscriptions that were never synced or that were updated recently and synced a long time ago:
SELECT *
FROM subscriptions
WHERE (last_sync_at IS NULL OR last_sync_at < updated_at) AND updated_at < $1
ORDER BY last_sync_at NULLS FIRST
LIMIT $2`,
The second query is fetching subscriptions but the one synchronized the longest time ago first.
SELECT *
FROM subscriptions
ORDER BY last_sync_at NULLS FIRST
LIMIT $1`,
I want to aggregate this two queries like so:
If the first query return less result than the limit, then I want to execute the second query to fill the batch.
For exemple if the limit is 100
and the first query returns 90
, I want to execute the second one with a limit of 10
.
I can do this easily with code, but I wonder I there’s a way to do that with one SQL query and if it will be more perfomant?
I tried something like this but obviously it doesn’t work:
`(SELECT *, COUNT(*) as nb_rows_selected
FROM subscriptions
WHERE (last_chartmogul_sync_at IS NULL OR last_chartmogul_sync_at < updated_at) AND updated_at < $1
ORDER BY last_chartmogul_sync_at NULLS FIRST
GROUP BY id
LIMIT $2)
UNION
(SELECT *
FROM subscriptions
ORDER BY last_sy NULLS FIRST
LIMIT nb_rows_selected - $2)`,
Any help would be appreciated, thanks!
2
Answers
I found the right answer by slightly modifying my first approach. I still use a
UNION
and aCOUNT
but this time I've broken the complexity by using the WITH clause:Explanation:
With this approach the second query is limited to
[the number of rows of the first query] - [the limit]
. Which is way effective. Because in case the first query fullfiled the limit, the second query will have a limit of0
and will return immediately.Performance Considerations:
From my understanding, this combined approach is generally more efficient than executing two separate queries, as it reduces the overhead associated with multiple database connections.
The only scenario where two separate queries might be more performant is when the first query consistently fulfills the limit, in which case the second query is unnecessary. However, this situation is unlikely to occur frequently in my case.
Also the
NOT IN
clause should have minimal impact on performance since theid
column is indexed.One trick you can use is to add a computed column to keep track of each half of the union. Then subquery and order limit again using that computed column to provide the ordering:
The above will take records from the first query on priority to fill the limit quota of 100. Only if that first query returns less than 100 records would records from the second query be used.