skip to Main Content

I’m trying to join a new column to my current query that uses aggregate functions. I create this column with a new query that also uses an aggregate function from a different table but I’m not sure if a JOIN will work for me since I need to join it to its respective row.

TABLE A (employees that are enrolled or were enrolled in a project)

ID DEPARTMENT ENROLLED PROJECT
1 MARKETING Yes ARQ
2 MARKETING Yes TC
3 MARKETING No ARQ
4 MARKETING No TC
5 FINANCE Yes ARQ
6 FINANCE Yes TC
7 FINANCE No ARQ
8 FINANCE Yes TC

This table has more departments and more projects, but I simplified.

TABLE B (relation with departments and employees)

ID DEPARTMENT TOTAL_EMPLOYEES
1 MARKETING 2
2 MARKETING 3
3 FINANCE 4
4 FINANCE 8

In my first query I was asked to achieve the following result – using only table A:

             (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL
MARKETING 1 1 2 1 1 2 4
FINANCE 1 1 2 1 1 2 4

Using the following query:

SELECT  tableA.department, 
        sum(case when enrolled = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total               
FROM    tableA
GROUP BY tableA.department;

My second query gets departments and their total employees from table B:

DEPARTMENT TOTAL_EMPLOYEES
MARKETING 5
FINANCE 12

Using the following query:

SELECT  tableB.department,
        sum(tableB.total_employees) AS TOTAL_EMPLOYEES
FROM tableB
GROUP BY tableB.department;

I need to add the column TOTAL_EMPLOYEES to my first query, next to TOTAL will be TOTAL_EMPLOYEES. But it has to be placed with its respective department row. I need this to compare this 2 columns and see how many employees were not assigned to any project.

This is my expected result.

                  (employees enrolled)               (employees not enrolled)     
DEPARTMENT ARQ_E TC_E TOTAL_ENROLLED ARQ_N TC_N TOTAL_NOT_ENROLLED TOTAL T_EMPL
MARKETING 1 1 2 1 1 2 4 5
FINANCE 1 1 2 1 1 2 4 12

I have tried to achieve this using the following query:

SELECT  tableA.department, 
        sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
        sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
        sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
        sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
        sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
        sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
        count (*) AS Total,
        sum (tableB.total_employees) AS T_EMPL               
FROM    tableA
JOIN    tableB
ON tableA.department = tableB.department
GROUP BY tableA.department;

But the numbers I get in my query are completely wrong since the JOINS repeat my rows and my SUMS duplicate.

I don’t know if I really need to use a join or a subquery to place my sum(tableB.department) in its respective row.

I’m using PostgreSQL but since I’m using Standard 92 any SQL solution will help.

3

Answers


  1. Join the results of the two queries, using sub-queries, don’t join the tables.

    That way you’re joining 1 row of enrollment data per department to 1 row of employee data per department.

    SELECT
      *
    FROM
    (
      SELECT  tableA.department, 
              sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
              sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
              sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
              sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
              sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
              sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
              count (*) AS Total
                   
      FROM    tableA
      GROUP BY tableA.department
    )
      AS enroll
    INNER JOIN
    (
      SELECT  tableB.department,
              sum(tableB.total_employees) AS Total_EMPLOYEES
      FROM tableB
      GROUP BY tableB.department
    )
      AS employee
        ON employee.department = enroll.department
    
    Login or Signup to reply.
  2. As the join will multiply the summ, you can first sum the values and then join them

    WITH CTE1 as (SELECT  tableA.department, 
            sum(case when enrolled  = 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_E,
            sum(case when enrolled  = 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_E,
            sum(case when enrolled  = 'Yes' then 1 else 0 end) as TOTAL_ENROLLED,
            sum(case when enrolled  != 'Yes' and tableA.project = 'ARQ' then 1 else 0 end) as ARQ_N,
            sum(case when enrolled  != 'Yes' and tableA.project = 'TC' then 1 else 0 end) as TC_N,
            sum(case when enrolled  != 'Yes' then 1 else 0 end) as TOTAL_NOT_ENROLLED,
            count (*) AS Total
                   
    FROM    tableA
    GROUP BY tableA.department),
    CTE2 as (SELECT  tableB.department,
            sum(tableB.total_employees) AS TOTAL_EMPLOYEES
    
    FROM tableB
    GROUP BY tableB.department)
    SELECT
    CTE1.department,    ARQ_E,  TC_E,   TOTAL_ENROLLED,     ARQ_N,  TC_N,   TOTAL_NOT_ENROLLED,     TOTAL,  T_EMPL,CTE2.TOTAL_EMPLOYEES
    FROM CTE1 JOIN CTE2 ON CTE1.department = CTE2.department
    
    Login or Signup to reply.
  3. Your main issue stemmed from inadvertently multiplying rows with the join,
    and has already been addressed. See:

    But use the standard SQL aggregate FILTER clause. It’s shorter, cleaner, and noticeably faster. See:

    SELECT *
    FROM  (
       SELECT department
            , count(*) FILTER (WHERE enrolled AND project = 'ARQ')     AS arq_e
            , count(*) FILTER (WHERE enrolled AND project = 'TC')      AS tc_e
            , count(*) FILTER (WHERE enrolled)                         AS total_enrolled
            , count(*) FILTER (WHERE NOT enrolled AND project = 'ARQ') AS arq_n
            , count(*) FILTER (WHERE NOT enrolled AND project = 'TC')  AS tc_n
            , count(*) FILTER (WHERE NOT enrolled)                     AS total_not_enrolled
            , count(*) AS total
       FROM   tableA a
       GROUP  BY 1
       ) a
    LEFT JOIN  (  -- !
       SELECT department
            , sum(total_employees) AS total_employees
       FROM   tableB b
       GROUP  BY 1
       ) b USING (department);
    

    enrolled should be a boolean column. Make it so if it isn’t. Then you can use it directly. Smaller, faster, cleaner, shorter code.

    I replaced the [INNER] JOIN with a LEFT [OUTER] JOIN on a suspicion. Typically, you want to keep all results, even if the same department is not found in the other table. Maybe even a FULL [OUTER] JOIN?

    Also, USING (department) as join condition conveniently outputs that column only once, so we can make do with SELECT * in the outer SELECT.

    Finally, subqueries are shorter and faster than CTEs. Not much since Postgres 12, but still. See:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search