Postgresql - JOIN two SELECT queries with COUNT and derive additional column

drmrbrewer
November 5, 2022
251 views
2 votes
4 Answers

I have a table with the following structure:

 name | version | processed | processing | updated  | ref_time 
------+---------+-----------+------------+----------+----------
 abc  |       1 | t         | f          | 27794395 | 27794160
 def  |       1 | t         | f          | 27794395 | 27793440
 ghi  |       1 | t         | f          | 27794395 | 27793440
 jkl  |       1 | f         | f          | 27794395 | 27794160
 mno  |       1 | t         | f          | 27794395 | 27793440
 pqr  |       1 | f         | t          | 27794395 | 27794160

I can use the following query to count the total number within each ref_time:

SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time;

 ref_time | total 
----------+-------
 27794160 |  2259
 27793440 |  2259

And the following query to count the total number within each ref_time where processed=true:

SELECT ref_time, COUNT (*) AS processed FROM (SELECT * FROM status_table WHERE processed=true) AS _ GROUP BY ref_time;

 ref_time | processed 
----------+-----------
 27794160 |      1057
 27793440 |      2259

I then try to merge the information using an INNER JOIN on ref_time:

SELECT * FROM
(SELECT ref_time, COUNT (*) AS total
 FROM (SELECT * FROM status_table) AS _ 
 GROUP BY ref_time) result_total
INNER JOIN
(SELECT ref_time, COUNT (*) AS processed
 FROM (SELECT * FROM status_table WHERE processed=true) AS _
 GROUP BY ref_time) result_processed
ON result_total.ref_time = result_processed.ref_time;

 ref_time | total | ref_time | processed 
----------+-------+----------+-----------
 27794160 |  2259 | 27794160 |      1057
 27793440 |  2259 | 27793440 |      2259

First question: how do I avoid the duplicated ref_time column?

Second question: how do I add an additional percent column derived as (100 * processed / total) (to one d.p.), i.e. to give:

 ref_time | total | processed | percent 
----------+-------+-----------+---------
 27794160 |  2259 |      1057 |    46.8
 27793440 |  2259 |      2259 |   100.0

Third question: is there a more efficient way to do this? Can I avoid making two separate SELECT queries?

Answers

- Sergey
- November 5, 2022 at 5:50 pm
- 0 votes
0
```
SELECT ref_time,count(*)as total,
SUM
(
  CASE
   WHEN processed='t' then 1
   else 0
  END
)processed
FROM YOUR_TABLE
GROUP BY ref_time
```
Login or Signup to reply.

- trillion
- November 5, 2022 at 6:01 pm
- 0 votes
0
```
with main as (
    select
    ref_time,
    sum(case when processed = 'true' then 1 else 0 end ) as total_processed,
    count(*) as total
    
    from <table_name>
    group by 1
)
select *, round((total_processed::numeric / nullif(total::numeric,0))  * 100),2) as percent from main
```
Login or Signup to reply.

- GMB
- November 5, 2022 at 11:40 pm
- 0 votes
0
Postgres has expressive aggregate functions.

To do the conditional count, we can use the standard filter clause directly against column processed, which is a boolean. As for the percentage (or ratio), we can cast the boolean to an integer (which yields 0 or 1, as you would expect), and take the average of that.

So:
```
select ref_time, 
    count(*) cnt_total,
    count(*) filter(where processed) cnt_processed,
    avg(processed::int) ratio_processed
from mytable
group by ref_time
```
Here is a demo with your sample data, which returns:

ref_time cnt_total cnt_processed ratio_processed

27794160 3 1 0.33333333333333333333

27793440 3 3 1.00000000000000000000
Login or Signup to reply.

- ErwinBrandstetter
- November 10, 2022 at 10:07 am
- 0 votes
0
```
SELECT ref_time
     , count(*) AS total
     , count(*) FILTER (WHERE processed) AS processed
     , round(count(*) FILTER (WHERE processed) * 100.0 / count(*), 2) AS percent
FROM   status_table
GROUP  BY 1;
```
fiddle

Returning actual, rounded percentage values.

avg(processed::int), like the currently accepted answer does, is smart but the additional aggregate function plus involved casting adds cost. (The repeated count(*) is only computed once.) This is typically faster.

About the aggregate FILTER clause:
- Aggregate columns with additional (distinct) filters
Aside: "name" is not a good name. Reconsider.
Login or Signup to reply.