I’m trying to join a new column to my current query that uses aggregate functions. I create this column with a new query that also uses an aggregate function from a different table but I’m not sure if a JOIN
will work for me since I need to join it to its respective row.
TABLE A (employees that are enrolled or were enrolled in a project)
ID | DEPARTMENT | ENROLLED | PROJECT |
---|---|---|---|
1 | MARKETING | Yes | ARQ |
2 | MARKETING | Yes | TC |
3 | MARKETING | No | ARQ |
4 | MARKETING | No | TC |
5 | FINANCE | Yes | ARQ |
6 | FINANCE | Yes | TC |
7 | FINANCE | No | ARQ |
8 | FINANCE | Yes | TC |
This table has more departments and more projects, but I simplified.
TABLE B (relation with departments and employees)
ID | DEPARTMENT | TOTAL_EMPLOYEES |
---|---|---|
1 | MARKETING | 2 |
2 | MARKETING | 3 |
3 | FINANCE | 4 |
4 | FINANCE | 8 |
In my first query I was asked to achieve the following result – using only table A:
(employees enrolled) (employees not enrolled)
DEPARTMENT | ARQ_E | TC_E | TOTAL_ENROLLED | ARQ_N | TC_N | TOTAL_NOT_ENROLLED | TOTAL |
---|---|---|---|---|---|---|---|
MARKETING | 1 | 1 | 2 | 1 | 1 | 2 | 4 |
FINANCE | 1 | 1 | 2 | 1 | 1 | 2 | 4 |
Using the following query:
SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total
FROM tableA
GROUP BY tableA.department;
My second query gets departments and their total employees from table B:
DEPARTMENT | TOTAL_EMPLOYEES |
---|---|
MARKETING | 5 |
FINANCE | 12 |
Using the following query:
SELECT tableB.department,
sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department;
I need to add the column TOTAL_EMPLOYEES
to my first query, next to TOTAL
will be TOTAL_EMPLOYEES
. But it has to be placed with its respective department row. I need this to compare this 2 columns and see how many employees were not assigned to any project.
This is my expected result.
(employees enrolled) (employees not enrolled)
DEPARTMENT | ARQ_E | TC_E | TOTAL_ENROLLED | ARQ_N | TC_N | TOTAL_NOT_ENROLLED | TOTAL | T_EMPL |
---|---|---|---|---|---|---|---|---|
MARKETING | 1 | 1 | 2 | 1 | 1 | 2 | 4 | 5 |
FINANCE | 1 | 1 | 2 | 1 | 1 | 2 | 4 | 12 |
I have tried to achieve this using the following query:
SELECT tableA.department,
sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
count (*) AS Total,
sum (tableB.total_employees) AS T_EMPL
FROM tableA
JOIN tableB
ON tableA.department = tableB.department
GROUP BY tableA.department;
But the numbers I get in my query are completely wrong since the JOINS repeat my rows and my SUMS duplicate.
I don’t know if I really need to use a join or a subquery to place my sum(tableB.department)
in its respective row.
I’m using PostgreSQL but since I’m using Standard 92 any SQL solution will help.
3
Answers
Join the results of the two queries, using sub-queries, don’t join the tables.
That way you’re joining 1 row of enrollment data per department to 1 row of employee data per department.
As the join will multiply the summ, you can first sum the values and then join them
Your main issue stemmed from inadvertently multiplying rows with the join,
and has already been addressed. See:
But use the standard SQL aggregate
FILTER
clause. It’s shorter, cleaner, and noticeably faster. See:enrolled
should be aboolean
column. Make it so if it isn’t. Then you can use it directly. Smaller, faster, cleaner, shorter code.I replaced the
[INNER] JOIN
with aLEFT [OUTER] JOIN
on a suspicion. Typically, you want to keep all results, even if the same department is not found in the other table. Maybe even aFULL [OUTER] JOIN
?Also,
USING (department)
as join condition conveniently outputs that column only once, so we can make do withSELECT *
in the outerSELECT
.Finally, subqueries are shorter and faster than CTEs. Not much since Postgres 12, but still. See: