skip to Main Content

In postgres, I have two tables: blog and blog_history. blog_history is to keep track of users and which blogs they read.
Tables structures are as follows:

blog table:

CREATE TABLE blog (
    id serial PRIMARY KEY,
    title text,
    description text,
    body text,
    created_at timestamp without time zone,
);

blog_history table:

CREATE TABLE blog_history (
    customer_username text NOT NULL,
    created_by text NOT NULL,
    created_at timestamp without time zone,
    post_id integer,
    CONSTRAINT fk_post_id
        FOREIGN KEY(post_id) 
        REFERENCES blog(id)
);

I have some duplicate rows in blog_history table that has the same post_id which is unnecessary.
I want to remove all duplicate rows by keeping one distinct for each created_by and each customer_username.
Example:

SELECT customer_username, created_by, post_id from blog_history;

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2
 companyA             | [email protected] |       2
 companyA             | [email protected] |       2
 companyA             | [email protected] |       3
 companyB             | [email protected] |       3
 companyB             | [email protected] |       3
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2

After deleting duplicates, result should be like this:

 customer_username |    created_by     | post_id 
-------------------+-------------------+---------
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2
 companyB             | [email protected] |       3
 companyA             | [email protected] |       1
 companyA             | [email protected] |       3
 companyA             | [email protected] |       2

So, I want to leave only one distinct post_id and delete all duplicate ones with the same post_id for same customer_username and same created_by.

2

Answers


  1. You need a column that would identify each row. Add the primary key to the table:

    alter table blog_history
        add id int generated always as identity primary key;
    

    Now you can easily identify rows with the lowest id in groups of duplicated rows.

    delete from blog_history
    where id not in (
        select distinct on (customer_username, created_by, post_id) id
        from blog_history
        order by customer_username, created_by, post_id, id
        );
    

    Test it in db<>fiddle.

    Login or Signup to reply.
  2. delete 
    from t
    where ctid in
    (
    select ctid
    from
    (select *, ctid, row_number() over(partition by customer_username,created_by,post_id order by post_id) as rn
    from   t
    ) t2
    where rn > 1
    );
    
    select *
    from   t
    
    customer_username created_by post_id
    companyA [email protected] 1
    companyA [email protected] 3
    companyA [email protected] 2
    companyB [email protected] 3
    companyA [email protected] 1
    companyA [email protected] 3
    companyA [email protected] 2

    Fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search