I have a table with the following structure:
name | version | processed | processing | updated | ref_time
------+---------+-----------+------------+----------+----------
abc | 1 | t | f | 27794395 | 27794160
def | 1 | t | f | 27794395 | 27793440
ghi | 1 | t | f | 27794395 | 27793440
jkl | 1 | f | f | 27794395 | 27794160
mno | 1 | t | f | 27794395 | 27793440
pqr | 1 | f | t | 27794395 | 27794160
I can use the following query to count the total number within each ref_time
:
SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time;
ref_time | total
----------+-------
27794160 | 2259
27793440 | 2259
And the following query to count the total number within each ref_time
where processed=true
:
SELECT ref_time, COUNT (*) AS processed FROM (SELECT * FROM status_table WHERE processed=true) AS _ GROUP BY ref_time;
ref_time | processed
----------+-----------
27794160 | 1057
27793440 | 2259
I then try to merge the information using an INNER JOIN
on ref_time
:
SELECT * FROM
(SELECT ref_time, COUNT (*) AS total
FROM (SELECT * FROM status_table) AS _
GROUP BY ref_time) result_total
INNER JOIN
(SELECT ref_time, COUNT (*) AS processed
FROM (SELECT * FROM status_table WHERE processed=true) AS _
GROUP BY ref_time) result_processed
ON result_total.ref_time = result_processed.ref_time;
ref_time | total | ref_time | processed
----------+-------+----------+-----------
27794160 | 2259 | 27794160 | 1057
27793440 | 2259 | 27793440 | 2259
First question: how do I avoid the duplicated ref_time
column?
Second question: how do I add an additional percent
column derived as (100 * processed / total)
(to one d.p.), i.e. to give:
ref_time | total | processed | percent
----------+-------+-----------+---------
27794160 | 2259 | 1057 | 46.8
27793440 | 2259 | 2259 | 100.0
Third question: is there a more efficient way to do this? Can I avoid making two separate SELECT
queries?
4
Answers
Postgres has expressive aggregate functions.
To do the conditional count, we can use the standard
filter
clause directly against columnprocessed
, which is a boolean. As for the percentage (or ratio), we can cast the boolean to an integer (which yields0
or1
, as you would expect), and take the average of that.So:
Here is a demo with your sample data, which returns:
fiddle
Returning actual, rounded percentage values.
avg(processed::int)
, like the currently accepted answer does, is smart but the additional aggregate function plus involved casting adds cost. (The repeatedcount(*)
is only computed once.) This is typically faster.About the aggregate
FILTER
clause:Aside: "name" is not a good name. Reconsider.