skip to Main Content

I keep running into this one over and over again, and seeking if there is a better (much more optimized) way to achieve this.

Requirement :
From a table XYZ select a random n rows (n can be one or any number) out of first X rows ordered by updated_at

So, in short I am not looking for just random rows, but random 5 rows among last used 20 rows.

Reason :
We do multiple bulk operation and if we just select oldest update value, it will end up in multiple threads doing everything. So, some degree of randomization is needed.

Current Way :
At present we do it as follows,

SELECT * FROM TABLE ORDER BY updated_at ASC LIMIT 20

And then in the APP we use randomization to select random 5 of them.

Which is in-efficient, as we are literally throwing away 15 results that db gave us.

We are talking about almost a million row table here, so solution needs to be scale worthy.

2

Answers


  1. Use a subquery:

    SELECT *
    FROM  (
       SELECT * FROM tbl
       ORDER  BY updated_at
       LIMIT  20
       ) sub
    ORDER  BY random()
    LIMIT  5;
    

    Postgres has the function random(), MySQL has rand().

    Requires an index on (updated_at) to be fast, obviously.

    If you actually want to lock selected rows, consider (Postgres only):

    To optimize performance for generic random selection:

    Login or Signup to reply.
  2. I am not sure what you want to do, but maybe it relates to this answer here:
    How to randomly select rows in SQL?

    if you want to do updates with these rows, you should use
    select * from table mytable where orderstatus=’notfilled’ FOR UPDATE LIMIT 5;

    if you select with FOR UPDATE, other parallel processes that do the same query, do not get the rows locked by the first query.
    see here: When to use SELECT … FOR UPDATE?

    you release them when you update and commit or rollback the transaction.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search