skip to Main Content

I am using postgresql database. I have a table for requests:

CREATE TABLE request (
    request_id varchar(64) NOT NULL,
    company_id varchar(64) NOT NULL,
    department_id varchar(64) NOT NULL,
    created_time timestamp NULL DEFAULT CURRENT_TIMESTAMP,
    last_updated_time timestamp NULL,
    status varchar(10) NULL,
    retry_counter int4 NULL,
    next_retry_time timestamp NULL,
    lock_status varchar(10) NULL,
    request_type varchar(16) NULL,
    request_data varchar(16384) NULL,
    priority int4 NULL
    CONSTRAINT request_pkey PRIMARY KEY (company_id, request_id)
)

This is a staging area table where the requests to be processed are dumped. and cleaned up periodically upon completion. There are different tables to track the request status, triggering retries etc.

Data volume: This table can potentially have Millions of records(worst case scenario).

I need to retrieve data using SQL from table such that, in each query:

  • max 100 requests for each company, department is fetched (company – department is one to many relationship)
  • max 500 requests for each company can be fetched(c1d1 + c1d2 + c1d3 +c1d4 + c1d5 + c1d6 + c1d7 <= 500)
  • total requests should be max 2000 in one query
  • company, department has a priority assigned, which is captured against each request, requests for company, department needs to be fetch on descending order of priority.

Open to suggestion to add indexes as required

2

Answers


  1. Try this query, it will provide the functionality you want. This method uses ROW_NUMBER() and a common table expression (CTE) to prioritize the requests within each department and company. The last query then chooses the best 500 records from each split, gives them order based on priority, and limits to 2000 rows on the overall number of results. To improve the query’s efficiency, an index covering the required columns is built.

      -- Creating an index to optimize the query
    CREATE INDEX idx_company_department_priority
    ON request (company_id, department_id, priority DESC);
    
    -- Using ROW_NUMBER() to assign a row number based on priority within each company and department
    WITH ranked_requests AS (
        SELECT
            request_id,
            company_id,
            department_id,
            created_time,
            last_updated_time,
            status,
            retry_counter,
            next_retry_time,
            lock_status,
            request_type,
            request_data,
            priority,
            ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY priority DESC) AS row_num_company,
            ROW_NUMBER() OVER (PARTITION BY company_id, department_id ORDER BY priority DESC) AS row_num_department
        FROM
            request
    )
    
    -- Now selecting the top 100 requests for each company and department, with a total limit of 2000 rows
    SELECT
        request_id,
        company_id,
        department_id,
        created_time,
        last_updated_time,
        status,
        retry_counter,
        next_retry_time,
        lock_status,
        request_type,
        request_data,
        priority
    FROM
        ranked_requests
    WHERE
        row_num_company <= 100 AND row_num_department <= 5
    ORDER BY
        company_id,
        department_id,
        priority DESC
    LIMIT 2000;
    

    Hope it’s helpful 🙂

    Login or Signup to reply.
  2. With an index on "priority", it should be efficient to select the highest priority few thousand rows:

    with top_requests as (select * from request order by priority desc limit 3000)
    

    And then plug this CTE (in place of the table "request") into a query using window functions, like the one given in the other answer (which I’ve not tested, but it looks plausible).

    This might short you some rows if many rows are filtered out based on the other row count restrictions, and then can’t be replaced due to inner LIMIT shown here kicking in. That is why I made the inner LIMIT generous at 3000 rather than 2000, but maybe you want it even more than 3000 if you routinely get shorted rows. Of course the higher it is, the less efficient the query will be.

    If this doesn’t give you the performance you need, please share the EXPLAIN (ANALYZE, BUFFERS) with us.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search