We store volunteer hours in a database for every user, i’m trying to figure out how to structure a Postgres SQL query to find the percentage of total hours a specific user has spent at a location vs the total number of hours at that location (by any user)
CREATE TABLE entries (
id INT,
userId INT,
volunteerHours DOUBLE PRECISION NOT NULL,
location INT
);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (1, 1, 3.0, 2);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (2, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (3, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (4, 2, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (5, 2, 3.0, 1);
There could be hundreds of these entries in the db, these are shown just for examples
Pseudo code:
SELECT
(SELECT location from entries ORDER BY id DESC FETCH NEXT 1 ROWS ONLY) as lastEntryLocation,
(SELECT SUM(volunteerHours) from entries WHERE userId = 2 AND location = lastEntryLocation) as userHours,
(SELECT SUM(volunteerHours) from entries AND location = lastEntryLocation) as totalHours,
(SELECT userHours / totalHours) as userContributionPercentage
I’m still learning how to write custom SQL queries, hoping someone could at least point me in the right direction of how I should be structuring this query correctly, to return in this situation 0.50
Thought process from sudo query:
- Pull the latest entry in the db to determine the last location ID value inserted
- Use last location value ID to then query the
SUM
of alluserId 2
‘s hours - Use last location value ID to query the
SUM
of all users hours - Divide
userId 2
‘s total hours by the total hours at that specific location
I would like to try and do all of this in a SQL query if possible (I could break it out into separate SQL query calls and programmatically do the math, but I would prefer to learn how to do this through one SQL query call if at all possible.
Created a db fiddle as well for this:
https://www.db-fiddle.com/f/7dfFZTa2yN3jpSMVLcFXWc/4
If you could also explain why you chose a specific method used to create the query for this, that would greatly help me to understand the reasoning behind it and know where to look further to learn more myself. Thank you!
EDIT: Looking for a way to not have to do the query twice for the latest location (as it feels like the wrong way to do this)
2
Answers
This is how you can write a query
but this is not the only(best) way
Window functions with sums for each row’s partition/group:
Distinct on (a) a,c...order by a,b
lets you take the first row for eacha
, according to your desired order/priority – in your case, the latest location.An equivalent using subqueries:
You might enjoy how your line of thought can be mapped directly, step for step, into a few
WITH
common table expressions:Demo