Postgresql - Removing duplicate rows per each user and customer

EminM
April 19, 2023
231 views
1 vote
2 Answers

In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:

blog table:

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history table:

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2
 companyA             | [email protected] |       2
 companyA             | [email protected] |       2
 companyA             | [email protected] |       3
 companyB             | [email protected] |       3
 companyB             | [email protected] |       3
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2

After deleting duplicates, result should be like this:

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2
 companyB             | [email protected] |       3
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2

So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.

Answers

- klin
- April 19, 2023 at 6:09 pm
- 0 votes
0
You need a column that would identify each row. Add the primary key to the table:
```
alter table blog_history
    add id int generated always as identity primary key;
```
Now you can easily identify rows with the lowest id in groups of duplicated rows.
```
delete from blog_history
where id not in (
    select distinct on (customer_username, created_by, post_id) id
    from blog_history
    order by customer_username, created_by, post_id, id
    );
```
Test it in db<>fiddle.
Login or Signup to reply.

delete 
from t
where ctid in
(
select ctid
from
(select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
from   t
) t2
where rn > 1
);

select *
from   t

customer_username	created_by	post_id
companyA	[email protected]	1
companyA	[email protected]	3
companyA	[email protected]	2
companyB	[email protected]	3
companyA	[email protected]	1
companyA	[email protected]	3
companyA	[email protected]	2

Fiddle

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Removing duplicate rows per each user and customer

Answers