skip to Main Content

I have a table that looks like this:

view   weight
A      1
B      1
C      2
D      1
E      1
F      1
G      3

I would like to sample from this table, but use the weight column to determine the probability of being chosen. For the above table, A, B, D, E, and F would all be chosen with probability 0.1, C would be with probability 0.2 and G would be probability 0.3

Answers with python or SQL would be great.

2

Answers


  1. It can be done with mysql using something like

    SELECT view, -LOG(RAND()) / weight AS priority FROM your_table ORDER BY priority LIMIT 1
    

    Although I haven’t tested it. I’d always prefer to to it in python, as I find it much more readable and extendable. In python, you can use numpy’s random.choice

    I got the following code to work

    import numpy as np
    
    views = ["A", "B", "C", "D", "E", "F", "G"]
    weights = [1, 1, 2, 1, 1, 1, 3]
    
    weights = np.array(weights) / np.sum(weights)
    
    sampled_view = np.random.choice(views, size=1, p=weights, replace=False)[0]
    
    
    print(f"Sampled view: {sampled_view}")
    
    Login or Signup to reply.
  2. You can accomplish this by multiplying each row by its respective weight, and then randomly selecting a single row, so a view assigned a weight of 3 will be represented three times, increasing its likelihood of being selected.:

    SELECT t.*
    FROM mytable t
    INNER JOIN (
      (VALUES ROW(1), ROW(2), ROW(3)) 
    ) d (weight) on d.weight <= t.weight
    ORDER BY RAND()
    LIMIT 1;
    

    Demo here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search