Remove duplicates from query after LEFT JOIN - Postgresql

user2097371
March 29, 2023
167 views
2 votes
3 Answers

Below, I’m using the following code to query two databases and retrieve information.

select p.id, p.first, p.last, p.employer, a.city, a.state, a.county 
from "Person" p 
  LEFT JOIN "Address" a ON p.id = a.person 
  LEFT JOIN "Calllist" cl on p.id = cl.person

This is great and works well, with the exception that it will duplicate rows occasionally. This happens in the event where a person object is associated with multiple address objects — each duplicate will represent the identical person name with a different address. A much simpler example is below.

Name	State
George	Virginia
Alex	Virginia
Alex	Maryland
Sam	West Virginia

Instead of that being the result, where there are two lines of output for Alex, I want the results to aggregate by id and only show one output line per id.

Name	State
George	Virginia
Alex	Virginia
Sam	West Virginia

I do not care which city or state shows up in Alex’s result. All I want is for his name to only show up once.

Tags: postgresql

Answers

You can limit the results of the joins further to only include a single match which will prevent the duplication on the prior joins.

If you use tables like this

CREATE TABLE person (
  id INT PRIMARY KEY,
  name VARCHAR(25) NOT NULL
);

CREATE TABLE address (
  id INT PRIMARY KEY,
  person INT NOT NULL,
  state VARCHAR(25) NOT NULL
);

INSERT INTO person VALUES
(1, 'George'),
(2, 'Alex'),
(3, 'Sam');

INSERT INTO address VALUES
(1, 1, 'Virginia'), /* George */
(2, 2, 'Virginia'), /* Alex */
(3, 2, 'Maryland'), /* Alex */
(4, 3, 'West Virginia'); /* Sam */

Then you can limit the addresses results by doing something like this

SELECT
  p.name, a.state
FROM person AS p
LEFT JOIN address AS a
  /* only include the first address */
  ON a.id = (SELECT MIN(id) FROM address WHERE person = p.id)
;

Which will give the results

| George | Virginia      |
| Alex   | Virginia      |
| Sam    | West Virginia |

- Stefanovsm
- March 29, 2023 at 10:05 pm
- 0 votes
0
You can use PostgreSQL’s brilliant distinct on (you may see the answer by MatBailie).
```
select DISTINCT ON (p.id)
       p.id, p.first, p.last, p.employer, a.city, a.state, a.county
from "Person" p 
LEFT JOIN "Address" a ON p.id = a.person 
LEFT JOIN "Calllist" cl on p.id = cl.person;
```
Login or Signup to reply.

- ErwinBrandstetter
- March 29, 2023 at 11:34 pm
- 0 votes
0
Don’t multiply rows in the outer SELECT to begin with. That’s expensive, especially when done repeatedly. See:
- Two SQL LEFT JOINS produce incorrect result
De-duplicate early:
```
SELECT p.id, p.first, p.last, p.employer, a.*
FROM   "Person" p
LEFT   JOIN (
   SELECT DISTINCT ON (1)
          a.person AS id, a.city, a.state, a.county 
   FROM  "Address" a   
   ) a USING (id)
LEFT   JOIN "Calllist" cl ON p.id = cl.person;  -- seems unused?
```
That’s assuming entries in "Calllist" are unique per person. Else, that table gets the same treatment.

About DISTINCT ON:
- Select first row in each GROUP BY group?
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Remove duplicates from query after LEFT JOIN – Postgresql

Answers