I keep running into this one over and over again, and seeking if there is a better (much more optimized) way to achieve this.
Requirement :
From a table XYZ select a random n rows (n can be one or any number) out of first X rows ordered by updated_at
So, in short I am not looking for just random rows, but random 5 rows among last used 20 rows.
Reason :
We do multiple bulk operation and if we just select oldest update value, it will end up in multiple threads doing everything. So, some degree of randomization is needed.
Current Way :
At present we do it as follows,
SELECT * FROM TABLE ORDER BY updated_at ASC LIMIT 20
And then in the APP we use randomization to select random 5 of them.
Which is in-efficient, as we are literally throwing away 15 results that db gave us.
We are talking about almost a million row table here, so solution needs to be scale worthy.
2
Answers
Use a subquery:
Postgres has the function
random()
, MySQL hasrand()
.Requires an index on
(updated_at)
to be fast, obviously.If you actually want to lock selected rows, consider (Postgres only):
To optimize performance for generic random selection:
I am not sure what you want to do, but maybe it relates to this answer here:
How to randomly select rows in SQL?
if you want to do updates with these rows, you should use
select * from table mytable where orderstatus=’notfilled’ FOR UPDATE LIMIT 5;
if you select with FOR UPDATE, other parallel processes that do the same query, do not get the rows locked by the first query.
see here: When to use SELECT … FOR UPDATE?
you release them when you update and commit or rollback the transaction.