Below, I’m using the following code to query two databases and retrieve information.
select p.id, p.first, p.last, p.employer, a.city, a.state, a.county
from "Person" p
LEFT JOIN "Address" a ON p.id = a.person
LEFT JOIN "Calllist" cl on p.id = cl.person
This is great and works well, with the exception that it will duplicate rows occasionally. This happens in the event where a person object is associated with multiple address objects — each duplicate will represent the identical person name with a different address. A much simpler example is below.
Name | State |
---|---|
George | Virginia |
Alex | Virginia |
Alex | Maryland |
Sam | West Virginia |
Instead of that being the result, where there are two lines of output for Alex, I want the results to aggregate by id and only show one output line per id.
Name | State |
---|---|
George | Virginia |
Alex | Virginia |
Sam | West Virginia |
I do not care which city or state shows up in Alex’s result. All I want is for his name to only show up once.
3
Answers
You can limit the results of the joins further to only include a single match which will prevent the duplication on the prior joins.
If you use tables like this
Then you can limit the addresses results by doing something like this
Which will give the results
You can use PostgreSQL’s brilliant distinct on (you may see the answer by MatBailie).
Don’t multiply rows in the outer
SELECT
to begin with. That’s expensive, especially when done repeatedly. See:De-duplicate early:
That’s assuming entries in
"Calllist"
are unique per person. Else, that table gets the same treatment.About
DISTINCT ON
: