I’ve been trying to make a query more efficient by bringing together counts from multiple tables. I have a schema similar to this:
create table "impressions" (
id serial primary key,
ip inet,
ad_id integer references ad(id),
user uuid references user(id),
created_at timestamp default now()
)
create table "purchases" (
id serial primary key,
ip inet,
ad_id integer references ad(id),
user uuid references user(id),
// ... few other columns
created_at timestamp default now()
)
create table "clicks" (
id serial primary key,
ip inet,
ad_id integer references ad(id),
user uuid references user(id),
created_at timestamp default now()
)
And I’ve been trying to get a count(*)
of impressions
, purchases
and clicks
grouped by user
. And later by ad_id
. I’m currently doing 3 separate queries to the database and aggregating in application code, but wondered if there is a more efficient way of doing this.
2
Answers
You can combine all three counts into a single query to make it a single round-trip to the DB server, which saves some overhead, and reduces I/O.
OTOH, joining rows adds some cost (depending on the number of distinct users). But if you are going to do that anyway, it’s typically much cheaper in Postgres than in the client. Postgres is optimized for the task and can work with original data types.
By basing the query on the
"user"
table, users without any entries in the three subsidiary tables are listed, too. If you don’t want those, combine the subqueries withFULL [OUTER] JOIN
instead:Either query reports
null
where no rows are found. You may want to convert that to0
withCOALESCE(i.impressions_count, 0) AS impressions_count
etc.If table rows are wide (and your tables are vacuum’ed enough) you can gain performance from an index ("user") on each subsidiary table. You probably have that anyway.
Aside: "user" is a fully reserved word in SQL. I’d rather not use it as identifier, even if that’s allowed with double-quotes. Too error-prone.
Maybe I’ve misunderstood, but is this what you want?
See https://dbfiddle.uk/yGX0Pv0k