skip to Main Content

I’ve a very big table (around 19.000.000 records), and I want to select 10 rows randomly.

I found this solution:

select *
from my_table as one
where one.id >=
    (
        select FLOOR(rand() * (select max(two.id) from my_table as two)) as currid
    )
limit 10

That should work.
But, if I execute only the random part I get very often big ids (5, 6, 12, 15 milions, for example), but the complete query return always low ids (2k, 5k, 8k, …).

If I replace the where condition with a static value (e.g. 12 milion) i get the correct result.

Can’t understand how it is possible.

Regards

EDIT

Maybe I explained myself wrong.
Executing query more and more times, i got ALWAYS results with "random" id from 1 to 10k.
Statistically, on 19 million records, I’ve 0.05% of probability to get an id under 10k. Multiplied x10 I’ve 0.5% chances to get one of ten (my LIMIT 10) ids under 10k, heavily incongruous compared to 100% of times I’m getting ids < 10k.

Instead, if I execute subquery alone, I get ids that seems to be like equally probably distribuited.

2

Answers


  1. You have 19 million data.
    Now, you expected the result within 10 million.
    So, the probability to get 10 million within 19 million, 10/19 = 0.52.
    Try with this query:

    select '19 M' as total_record,
        concat(FLOOR((rand()*19)), ' M') as result_with_in19M;
    

    Check how many times you get record within 10M. It should be almost 50% approximately.

    This is not an issue. This is your misconception.

    Think logically.


    There are lots of logic to get random value.
    For example, you want to get lower value.

    Now we know, if we multiply two decimal values(x, y) which are less than 1 then, there multiplication result (x*y) is near to zero and both number is greater than multiplication result.

    So, according to this logic you may modify you query like:

    select '19 M' as total_record,
        concat(FLOOR((rand()*rand()*19)), ' M') as low_result_with_in19M;
    

    This will give you approximate low value than previous query.

    I hope this will correct your misconception.
    Thank you!!!!!!

    Login or Signup to reply.
  2. The reason why your query is not working is because the sub-query (select FLOOR(rand() * (select max(two.id) from my_table as two)) as currid) is executed multiple times.

    To get the first 10 random number do this:

    SELECT
      id,
      FLOOR(rand() * (select max(two.id) from my_table as two)) as currid
    FROM my_table
    ORDER BY currid
    LIMIT 10;
    

    see: DBFIDDLE

    NOTE: rounding/flooring the results of RAND() is not even needed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search