skip to Main Content

Given a postgres table containing the following items:

tag_id,  duplicate_tag_id, tag_created_at_timestamp
14175,   14178,            ...
14175,   14177,            ...
14176,   null,             ...
14177,   14178,            ...
14178,   14179,            ...
14179,   null,             ...
14180,   null,             ...
14181,   null,             ...

Is it possible to write a query that will return tag_ids that don’t have an entry in duplicate_tag_id column Unless that entry is corresponds to a tag_id that exists in the duplicate_tag_id in a previous row

So, in the case above, I would like the result to be:

14175
14176
14179
14180
14181

14179 is included because it was the duplicate_tag for 14178, which was already a duplicate_tag of 14175

Thanks!

2

Answers


  1. Just select the tag_id’s not duplicated but keeping the ones that are equal to its previous duplicate_tag_id, but not appear more than once in duplicate_tag_id, like:

    select distinct tag_id from tags
    where tag_id not in --duplicated
    (   select duplicate_tag_id from tags 
        where duplicate_tag_id is not null)
    or tag_id in --id = to prev duplicate_tag_id
    (   select
        case when 
        tag_id=lag(duplicate_tag_id) 
        over (order by tag_id) then tag_id 
        end tg
        from tags) 
    and tag_id not in --not more than once
    (   select duplicate_tag_id 
        from tags t 
        group by duplicate_tag_id 
        having count(duplicate_tag_id)>1)
    order by 1;
    

    Result

    tag_id
    14175
    14176
    14179
    14180
    14181

    Fiddle to test

    Login or Signup to reply.
  2. I am not sure if I understood completely the question, the following query return all tag_id if duplicate_tag_id is null (it does not exist) or if duplicate_tag_id equal to tag_id of the previous row .

    Query

    WITH previous_tag_id AS (
      SELECT tag_id,
           duplicate_tag_id,
           tag_created_at_timestamp,
           LAG(tag_id, 1) OVER (ORDER BY tag_created_at_timestamp ASC) previous_tag_id
      FROM myTable 
    ) SELECT DISTINCT tag_id
    FROM previous_tag_id 
    WHERE duplicate_tag_id IS NULL OR previous_tag_id = tag_id ; 
    

    See example

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search