Postgresql - Execute second SQL query if the first one returns less rows than the limit

ValentinNasraty
August 22, 2024
139 views
1 vote
2 Answers

I have two SQL queries like this for postgresql:

The first query is fetching subscriptions that were never synced or that were updated recently and synced a long time ago:

SELECT * 
FROM subscriptions
WHERE (last_sync_at IS NULL OR last_sync_at < updated_at) AND updated_at < $1
ORDER BY last_sync_at NULLS FIRST
LIMIT $2`,

The second query is fetching subscriptions but the one synchronized the longest time ago first.

SELECT *
FROM subscriptions
ORDER BY last_sync_at NULLS FIRST
LIMIT $1`,

I want to aggregate this two queries like so:
If the first query return less result than the limit, then I want to execute the second query to fill the batch.

For exemple if the limit is 100 and the first query returns 90, I want to execute the second one with a limit of 10.

I can do this easily with code, but I wonder I there’s a way to do that with one SQL query and if it will be more perfomant?

I tried something like this but obviously it doesn’t work:

`(SELECT *, COUNT(*) as nb_rows_selected
    FROM subscriptions
    WHERE (last_chartmogul_sync_at IS NULL OR last_chartmogul_sync_at < updated_at) AND updated_at < $1
    ORDER BY last_chartmogul_sync_at NULLS FIRST
    GROUP BY id
    LIMIT $2)
UNION 
(SELECT * 
    FROM subscriptions
    ORDER BY last_sy NULLS FIRST
    LIMIT nb_rows_selected - $2)`,

Any help would be appreciated, thanks!

Answers

Chosen as BEST ANSWER
- ValentinNasraty
- August 22, 2024 at 5:04 pm
- 0 votes
0
I found the right answer by slightly modifying my first approach. I still use a UNION and a COUNT but this time I've broken the complexity by using the WITH clause:
```
WITH unsynced_or_recently_updated_subscriptions AS (
    SELECT *
    FROM subscriptions
    WHERE (last_sync_at IS NULL OR last_sync_at < updated_at) 
    AND updated_at < $1
    ORDER BY last_sync_at NULLS FIRST
    LIMIT $2
),
oldest_synced_subscriptions AS (
    SELECT *
    FROM subscriptions
    WHERE id NOT IN (SELECT id FROM unsynced_or_recently_updated_subscriptions)
    ORDER BY last_sync_at NULLS FIRST
    LIMIT $2 - (SELECT COUNT(*) FROM unsynced_or_recently_updated_subscriptions)
)
SELECT *
FROM unsynced_or_recently_updated_subscriptions
UNION ALL
SELECT *
FROM oldest_synced_subscriptions;
```
Explanation:

With this approach the second query is limited to [the number of rows of the first query] - [the limit]. Which is way effective. Because in case the first query fullfiled the limit, the second query will have a limit of 0 and will return immediately.

Performance Considerations:

From my understanding, this combined approach is generally more efficient than executing two separate queries, as it reduces the overhead associated with multiple database connections.

The only scenario where two separate queries might be more performant is when the first query consistently fulfills the limit, in which case the second query is unnecessary. However, this situation is unlikely to occur frequently in my case.

Also the NOT IN clause should have minimal impact on performance since the id column is indexed.

(Edit)

- TimBiegeleisen
- August 22, 2024 at 12:48 pm
- 0 votes
0
One trick you can use is to add a computed column to keep track of each half of the union. Then subquery and order limit again using that computed column to provide the ordering:
```
SELECT *
FROM
(
    (
        SELECT *, 1 AS src
        FROM subscriptions
        WHERE (last_sync_at IS NULL OR last_sync_at < updated_at) AND
          updated_at < $1
        ORDER BY last_sync_at NULLS FIRST
        LIMIT $2
    )
    UNION
    (
        SELECT *, 2
        FROM subscriptions
        ORDER BY last_sync_at NULLS FIRST
        LIMIT $1
    )
) t
ORDER BY src
LIMIT 100;
```
The above will take records from the first query on priority to fill the limit quota of 100. Only if that first query returns less than 100 records would records from the second query be used.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Execute second SQL query if the first one returns less rows than the limit

Answers

Explanation:

Performance Considerations: