I have a PostgreSQL query that aggregates data from multiple tables and generates a report. Currently it works perfectly fine and it is quite simple. Here is the query I’m using:
SELECT
a.name AS account_name,
c.name AS campaign_name,
ad.name AS ad_name,
g.cpm AS cpm,
SUM(g.impressions) AS impressions,
SUM(g.clicks) AS clicks,
SUM(g.conversions) AS conversions,
SUM(g.impressions) * g.cpm AS cost,
SUM(g.revenue) AS order_revenue
FROM google_analytics g
JOIN ad ad ON ad.id = g.ad_id
JOIN campaign c ON c.id = ad.campaign_id
JOIN account a ON a.id = c.account_id
WHERE g.created_at >= ? AND g.created_at <= ?
GROUP BY 1, 2, 3, 4
ORDER BY 2
And my table schema:
CREATE TABLE public.google_analytics (
created_at TIMESTAMP WITHOUT TIME ZONE NULL,
ad_id INTEGER NULL,
impressions INTEGER NULL,
clicks INTEGER NULL,
conversions INTEGER NULL,
ecpm INTEGER NULL,
commission SMALLINT NULL,
revenue INTEGER NULL
);
The current report aggregates data properly with the SUM and GROUP BY clauses, and works as expected. However, the application code extends this report by handling additional calculations and aggregations, such as rollups, averaging certain fields, and building a hierarchical structure.
For example, the application generates a report like this:
account_name | campaign_name | ad_name | cpm | impressions | clicks | conversions | cost | order_revenue |
---|---|---|---|---|---|---|---|---|
Account A | 1.75 CPM | 2200 | 20 | 2 | 1.45 | $150 | ||
Campaign A | Ad 1 | 1.50 CPM | 1000 | 10 | 1 | 0.50 | $100 | |
Campaign A | Ad 2 | 2.00 CPM | 1200 | 10 | 1 | 2.40 | $50 |
In this structure:
- Impressions, clicks, and revenue are summed
- CPM and cost are averaged
- The data is grouped hierarchically by account, campaign, and ad
The application code handling these calculations is very large and complex, making it difficult to maintain. I am planning to replace this application logic with a SQL-based solution to simplify maintenance and improve performance.
I have experimented with the ROLLUP and CUBE features in PostgreSQL, and I got excited about the capabilities they offer. These features seem to perform much of the grunt work that the application currently does. My initial results were very promising and I got the very same tree-like structure for summing up all the metrics.
However, I am unsure if PostgreSQL’s built-in rollup supports averaging and summing at different levels of the hierarchy. My goal is to use these features to replace the application code and handle the calculations directly within the database.
For instance:
- I was unable to calculate the average instead of the sum: how can the rollup use average for a column, and sum for another column? It seems to always sum things
- I was unable to calculate the sum of cost, since cost is already a calculated field
In order to contextualize even more, and sake of understanding the tree, this is the final JSON I am planning to generate:
[
{
"name": "Account A",
"impressions": 236797,
"cpm": 2.48,
"children": [
{
"name": "Campaign A",
"impressions": 236797,
"cpm": 2.48,
"children": [
{
"name": "Ad 1",
"impressions": 236797,
"cpm": 2.48,
"children": []
}
]
}
]
}
]
2
Answers
It does: there’s a
grouping
function that tells you which "level" you’re on, and based on that you can alternate between theAVG
andSUM
the way you wanted: demo at db<>fiddleThe
avg_cpm
andsum_cpm
next tocpm
(same forcost
) shows you how thatcase
alternates between them on different levels based on thegrouping
:That being said, it seems that you could just plug in a
/count(*)
to yourSUM
s: in individual rows, that’ll be 1, so you’ll keep getting the sum. On higher levels, it’ll jump up to how many rows got aggregated, andSUM
divided byCOUNT
will result in anAVG
:Creating a SQL-based solution that involves different aggregation strategies such as summing some fields and averaging others while also producing hierarchical data is challenging.
PostgreSQL’s ROLLUP can simplify some parts of this, but doesn’t directly support applying different aggregation functions on different columns within a single query.
We can approach to accomplish the desired output: