I have two Postgres tables: ‘user’ with unique rows, and ‘address’ where a single user can have multiple addresses. I want to join specific ‘user’ row with one row from ‘address’ where the address.id is the highest max(id)
.
I wrote the following query which works OK but I suspect that when there are many addresses for one user, this query will produce a really large intermediate aggregate before it returns the the final results. Is there a better way to write this query?
select "user".id,
"user".username,
"user".trader_id,
address.id,
address.street,
address.trader_id
from "user", address
where "user".id = 6
and address.trader_id = "user".trader_id
order by address.id desc
limit 1;
3
Answers
Your query should work just fine and be performant, especially if you use appropriate indexes, such as on
trader_id
for both tables. (There are certain improvements that can be made, such as dropping the secondtrader_id
column and using an alias for one or bothid
columns.)The query will fail, however, if you want to get the information for multiple users, because only a single row will be returned due to the
LIMIT 1
clause. A more general solution then, where you can get the desired information for multiple"user".id
s at a time would use a window function:In terms of performance, you should focus on indexes rather than query structure – the query planner will deal with the latter.
You can skip using order by and limit in your query by using multicolumn subquery in order to get max id from address table of all/any user.
step 1: Write a get max address id of all user from address table .
Step 2: put the above query inside another query to fetch necessary column of address table.
Step 3: Join the above query with "user" table. You can get any users latest address id from this using where condition.
Your query is pretty good already for getting a single user. Definitely much faster than running a window function over the whole
address
table in a subquery.Your query, only slightly simplified with a
USING
clause:A multicolumn index on
address(trader_id, id)
will give you optimal performance. Plus, obviously, an index on"user"(id)
. (No "intermediate aggregate" like you apprehended withLIMIT 1
!) See:Alternatively, the same technique in a
LATERAL
subquery. Works for retrieving one or more users:About
LATERAL
subqueries:Also using
LEFT JOIN
to preserve users without any address.If you are going to use a window function, use
row_number()
rather thanrank()
. Do it in aLATERAL
subquery while only retrieving a single (or few) user(s), to only involve relevant rows. And, unless you run Postgres 16 or later, add the frame clauseROWS UNBOUNDED PRECEDING
for performance:Why
ROWS UNBOUNDED PRECEDING
? See:Or use
DISTINCT ON
:See:
__
Aside: Rather don’t use reserved words like "user" as identifier.