skip to Main Content

I have

User table[user_id]
Disperse Table [order_id, disperse_amount]
Pool Table [user_id, order_id]

For each order_id in Disperse, I want to insert x random rows of [user.user_id, disperse.order_id] into pool table. (x is the current row’s disperse_amount)

Might be more understandable with the query I have so far:

WITH p as (
    SELECT *
    FROM pool p
    WHERE p.order_id= **disperse.order_id** (from current disperse row)
)
INSERT INTO pool (order_id, user_id)
SELECT **disperse.order_id**, user_id
FROM users
WHERE NOT EXISTS ( 
    SELECT 1 
    FROM p 
    WHERE user_id = users.user_id
    )
ORDER BY random()
LIMIT **disperse.disperse_amount** (from the current disperse row)

This could work for ONE row in the distributer table.

I just need to iterate through all rows of distributer table [order_id, disperse_amount], and for each row, perform this query.

There may be a much more efficient way to do this, but I wouldn’t know as I’m a novice to postgres and webdev in general.

2

Answers


  1. you can make use of a PL/pgSQL function in PostgreSQL. This solution leverages PostgreSQL’s procedural language to efficiently manage the insertion process with the required logic for random selection and duplication avoidance, offering a straightforward approach to performing bulk, conditional, and randomized data insertion operations within a PostgreSQL database:

    CREATE OR REPLACE FUNCTION disperse_users_to_pool()
    RETURNS void AS $$
    DECLARE
        row RECORD;
    BEGIN
        FOR row IN SELECT * FROM Disperse LOOP
            -- Perform the insert operation for each disperse row
            INSERT INTO Pool (user_id, order_id)
            SELECT u.user_id, row.order_id
            FROM (
                SELECT user_id 
                FROM User 
                WHERE NOT EXISTS (
                    SELECT 1
                    FROM Pool
                    WHERE user_id = User.user_id AND order_id = row.order_id
                )
                ORDER BY random()
                LIMIT row.disperse_amount
            ) u;
        END LOOP;
    END;
    $$ LANGUAGE plpgsql;
    

    Then you can use it like this:

    SELECT disperse_users_to_pool();
    
    
    Login or Signup to reply.
  2. Any time you want to iterate in SQL, there’s a better way.

    Here we can use a cross join lateral to match up each entry in disbursements with the appropriate random picks from users. A cross join matches every row in a one table with every row in another table. lateral evaluates the subquery for each row; this allows us to use limit d.disburse_amount.

    select
      u.id as user_id,
      d.order_id as order_id
    from disbursements d
    cross join lateral (
      select id
      from users
      order by random()
      limit d.disburse_amount
    ) u
    

    What if there are already entries in order_pool? Let’s assume we don’t want to go over the disburse_amount and we don’t want to repeat any user_ids for the same order.

    First, we need to know how many disbursements remain for each order.

    select
      d.order_id,
      d.disburse_amount - count(op.user_id) as remaining_disbursements
    from disbursements d
    left join order_pool op on op.order_id = d.order_id
    
    -- For this group by to work, order_id must be the primary key
    -- of disbursements. Then PostgreSQL knows each order_id can have
    -- only one disburse_amount; disburse_amount is 
    -- "functionally dependent" on order_id.
    -- If not, `sum(d.disburse_amount)`.
    group by d.order_id
    

    Then we select from the query above instead of disbursements, and cross join lateral as before with the extra clause to exclude users who are already in the pool. Cross join lateral, as before, lets us do this for each order.

    with order_disbursements_remaining as (
      select
        d.order_id,
        d.disburse_amount - count(op.user_id) as remaining_disbursements
      from disbursements d
      left join order_pool op on op.order_id = d.order_id
      group by d.order_id
    )
    select u.id, odr.order_id
    from order_disbursements_remaining odr
    cross join lateral (
      select id
      from users
      where id not in (
        select user_id
        from order_pool
        where order_id = odr.order_id
      )
      order by random()
      limit odr.remaining_disbursements
    ) u
    

    This can safely be run as an insert trigger on the disbursements table to keep the order pool up to date on new disbursements. Deletes and updates must be handled separately.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search