Let’s say I want to show a list of companies, and also a first/random employee of that company.
I could do something like:
SELECT
company.id,
company.name,
MIN(person.id) AS employee_person_id,
MIN(person.name) AS employee_person_name
FROM company
LEFT OUTER JOIN person ON (person.company_id = company.id)
GROUP BY company.id;
But I think with my code above, MIN(person.id)
and MIN(person.name)
could give info about two different people, right?
Is there a better way of retrieving just a “first” (or random) employee and showing that person’s ID and name?
4
Answers
I’d use the
row_number
window function to assign a numbering within each company, and then use that to query the first person:This is how to get the correct name :
You will have to join your result with the person table by
person.id = s.employee_person_id
In Postgres, I would recommend
distinct on
:For each company, this brings the person with the smallest id; you control which person is picked with the
ORDER BY
clause (if you wanted the person with the greatest id, you would useORDER BY c.id, p.id DESC
).This is correct:
You can see that happening in the demo linked below.
GMB‘s
distinct on
is typically the most recommended, obvious choice, but it requires ordering, same as Mureinik‘s. Meanwhile, you can let each company just get any singleperson
, without having to order them first: demoEven simpler if you don’t have
companies
with nopersons
, or if you ignore suchcompanies
:It’s "retrieving just a “first” (or random) employee" the convenient way: it takes whatever it happens to find first, without having to find and order all possible matches before picking one. Each company just fetches any one of their people, which seemed to be the idea.
Thanks to not doing the extra work, it’s faster (16’000x on 300’000 row sample) and it scales in proportion to only the
company
, pretty much disregarding theperson
table’s size and growth.