I am using postgresql database. I have a table for requests:
CREATE TABLE request (
request_id varchar(64) NOT NULL,
company_id varchar(64) NOT NULL,
department_id varchar(64) NOT NULL,
created_time timestamp NULL DEFAULT CURRENT_TIMESTAMP,
last_updated_time timestamp NULL,
status varchar(10) NULL,
retry_counter int4 NULL,
next_retry_time timestamp NULL,
lock_status varchar(10) NULL,
request_type varchar(16) NULL,
request_data varchar(16384) NULL,
priority int4 NULL
CONSTRAINT request_pkey PRIMARY KEY (company_id, request_id)
)
This is a staging area table where the requests to be processed are dumped. and cleaned up periodically upon completion. There are different tables to track the request status, triggering retries etc.
Data volume: This table can potentially have Millions of records(worst case scenario).
I need to retrieve data using SQL from table such that, in each query:
- max 100 requests for each company, department is fetched (company – department is one to many relationship)
- max 500 requests for each company can be fetched(c1d1 + c1d2 + c1d3 +c1d4 + c1d5 + c1d6 + c1d7 <= 500)
- total requests should be max 2000 in one query
- company, department has a priority assigned, which is captured against each request, requests for company, department needs to be fetch on descending order of priority.
Open to suggestion to add indexes as required
2
Answers
Try this query, it will provide the functionality you want. This method uses ROW_NUMBER() and a common table expression (CTE) to prioritize the requests within each department and company. The last query then chooses the best 500 records from each split, gives them order based on priority, and limits to 2000 rows on the overall number of results. To improve the query’s efficiency, an index covering the required columns is built.
Hope it’s helpful 🙂
With an index on "priority", it should be efficient to select the highest priority few thousand rows:
And then plug this CTE (in place of the table "request") into a query using window functions, like the one given in the other answer (which I’ve not tested, but it looks plausible).
This might short you some rows if many rows are filtered out based on the other row count restrictions, and then can’t be replaced due to inner LIMIT shown here kicking in. That is why I made the inner LIMIT generous at 3000 rather than 2000, but maybe you want it even more than 3000 if you routinely get shorted rows. Of course the higher it is, the less efficient the query will be.
If this doesn’t give you the performance you need, please share the
EXPLAIN (ANALYZE, BUFFERS)
with us.