Getting duplicate records in result set even after distinct keyword - Postgresql

Pratik
March 7, 2023
263 views
3 votes
3 Answers

I am having a table as follows.

entity_id|effective_date|value|
A        |2023-09-09    |234  |
A        |2023-09-06    |345  |
B        |2023-09-02    |341  |
C        |2023-09-01    |347  |

I want to find all unique entity IDs with the maximum effective date and their respective value. I am trying the following query.

select distinct entity_id, value, max(effective_date) start_date
from refdata.investment_raw ir
where attribute_id  = 232
  and entity_id in (select invest.val as investment_id  
                    from refdata.ved soi
                    inner join refdata.ved invest
                      on soi.entity_id = invest.entity_id 
                     and current_date  between invest.start_date and invest.end_date
                     and invest.attribute_code = 'IssuerId'
                     and soi.attribute_code = 'SO'
                     and  soi.val in ('1','2')
                     and current_date between soi.start_date and soi.end_date)
group by entity_id, value

With this I am getting following in the result set.

    entity_id|effective_date|value|
    A        |2023-09-09    |234  |
    A        |2023-09-06    |345  |
    B        |2023-09-02    |341  |
    C        |2023-09-01    |347  |

My expected result set is

    entity_id|effective_date|value|
    A        |2023-09-09    |234  |
    B        |2023-09-02    |341  |
    C        |2023-09-01    |347  |

In the expected, you can see the maximum effective date and unique entity IDs with their respective values. In the actual result set, I am getting entity ID A as duplicate record. When I remove the distinct value from the query and group by clause, I get my expected result but without the value column. I want the respective distinct entity ID value as well without duplication. What is wrong with my query?

Tags: postgresql sql

Answers

WITH YOUR_TABLE_DATA(ENTITY_ID,EFFECTIVE_DATE,VALUE) AS
(
   SELECT 'A','2023-09-09'::DATE,234 UNION ALL 
   SELECT 'A','2023-09-06'::DATE,345 UNION ALL
   SELECT 'B','2023-09-02'::DATE,341 UNION ALL 
   SELECT 'C','2023-09-01'::DATE,347
)
SELECT SQ.ENTITY_ID,SQ.EFFECTIVE_DATE,SQ.VALUE
FROM
 (
    SELECT C.ENTITY_ID,C.EFFECTIVE_DATE,C.VALUE,
     ROW_NUMBER()OVER(PARTITION BY C.ENTITY_ID ORDER BY C.EFFECTIVE_DATE DESC)AS XCOL
    FROM YOUR_TABLE_DATA AS C
)SQ WHERE SQ.XCOL=1

You can use ROW_NUMBER-filtering approach

- SelVazi
- March 7, 2023 at 2:36 pm
- 0 votes
0
We produce first a list of entities with their max effective date using group by, then we join the table with this list :
```
select t.*
from mytable t
inner join (
  select entity_id, max(effective_date) as max_effective_date
  from mytable
  group by entity_id
) as s on s.entity_id = t.entity_id and s.max_effective_date = t.effective_date
```
Demo here
Login or Signup to reply.

- jjanes
- March 7, 2023 at 8:46 pm
- 0 votes
0
It seems like you want the PostgreSQL-specific DISTINCT ON, which is not the same thing as DISTINCT.
```
select distinct on (entity_id) entity_id, value, effective_date as start_date
from ...
ORDER BY entity_id, effective_date desc 
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Getting duplicate records in result set even after distinct keyword – Postgresql

Answers