skip to Main Content

We store volunteer hours in a database for every user, i’m trying to figure out how to structure a Postgres SQL query to find the percentage of total hours a specific user has spent at a location vs the total number of hours at that location (by any user)

CREATE TABLE entries (
  id INT,
  userId INT,
  volunteerHours DOUBLE PRECISION NOT NULL,
  location INT
);

INSERT INTO entries (id, userId, volunteerHours, location) VALUES (1, 1, 3.0, 2);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (2, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (3, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (4, 2, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (5, 2, 3.0, 1);

There could be hundreds of these entries in the db, these are shown just for examples

Pseudo code:

SELECT
(SELECT location from entries ORDER BY id DESC FETCH NEXT 1 ROWS ONLY) as lastEntryLocation,
(SELECT SUM(volunteerHours) from entries WHERE userId = 2 AND location = lastEntryLocation) as userHours,
(SELECT SUM(volunteerHours) from entries AND location = lastEntryLocation) as totalHours,
(SELECT userHours / totalHours) as userContributionPercentage 

I’m still learning how to write custom SQL queries, hoping someone could at least point me in the right direction of how I should be structuring this query correctly, to return in this situation 0.50

Thought process from sudo query:

  1. Pull the latest entry in the db to determine the last location ID value inserted
  2. Use last location value ID to then query the SUM of all userId 2‘s hours
  3. Use last location value ID to query the SUM of all users hours
  4. Divide userId 2‘s total hours by the total hours at that specific location

I would like to try and do all of this in a SQL query if possible (I could break it out into separate SQL query calls and programmatically do the math, but I would prefer to learn how to do this through one SQL query call if at all possible.

Created a db fiddle as well for this:
https://www.db-fiddle.com/f/7dfFZTa2yN3jpSMVLcFXWc/4

If you could also explain why you chose a specific method used to create the query for this, that would greatly help me to understand the reasoning behind it and know where to look further to learn more myself. Thank you!

EDIT: Looking for a way to not have to do the query twice for the latest location (as it feels like the wrong way to do this)

2

Answers


  1. This is how you can write a query
    but this is not the only(best) way

    SELECT
      ( SELECT
          ( SELECT  SUM(volunteerHours) 
            from    entries 
            WHERE   userId = 2 
            AND     location = (
                SELECT  location 
                from    entries 
                ORDER BY id 
                DESC FETCH NEXT 1 ROWS ONLY))
            / 
          ( SELECT  SUM(volunteerHours) 
            from    entries 
            Where   location = (
                SELECT  location 
                from    entries 
                ORDER BY id DESC 
                FETCH NEXT 1 ROWS ONLY))
        ) as userContributionPercentage
    
    Login or Signup to reply.
  2. Window functions with sums for each row’s partition/group:

    select distinct on (e.userId) 
        sum(e.volunteerHours) over (partition by userId,location,id)
          / sum(e.volunteerHours) over (partition by userId,location) 
                as percentage_volunteered_here
    from entries e
    where userId=1
    order by userId, id desc;
    

    Distinct on (a) a,c...order by a,b lets you take the first row for each a, according to your desired order/priority – in your case, the latest location.

    An equivalent using subqueries:

    select distinct on (userId) 
      (  select sum(volunteerHours) 
         from entries e2 
         where e1.userId=e2.userId 
         and e1.location=e2.location
         and e1.id=e2.id) --hours_during_this_visit_at_location
      /( select sum(volunteerHours) --total_hours_at_this_location
         from entries e2 
         where e1.userId=e2.userId
         and e1.location=e2.location) as percentage_volunteered_here
    from entries e1
    where userId=1
    order by userId, id desc;
    

    You might enjoy how your line of thought can be mapped directly, step for step, into a few WITH common table expressions:

    with 
     last_location_per_user as (
        select distinct on (userId) 
                userId, 
                location,
                volunteerHours
        from entries
        where userId=1
        order by userId, id desc
     ),
    total_hours_at_last_location_per_user as (
        select  e.userId, 
                sum(e.volunteerHours)
        from    entries e
            inner join last_location_per_user l
                using (userId,location)
        group by userId
    )
    select  l.volunteerHours/t.sum as percentage_volunteered_here
    from    last_location_per_user l,
            total_hours_at_last_location_per_user t
    where   t.userId=l.userId;
    

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search