Postgresql - How can I request unique records (DISTINCT) together with a mechanism for selecting the desired value among the same ones?

Artem
September 8, 2024
71 views
0 votes
2 Answers

I need to select rows from the database that are unique by date. Ideally, there will be very few such strings, but if they come across, then you need to select only one value among them. DISTINCT assumes just a filter by uniqueness, but it does not focus on which records it selects.
Let’s use an example instead:

+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date                      |
+--+---------+---------------+---------------+--------------------------+
|1 |1        |3              |null           |2024-01-01 00:00:00.000000|
|2 |1        |null           |false          |2024-02-01 00:00:00.000000|
|3 |1        |3              |true           |2024-03-01 00:00:00.000000|
|4 |1        |3              |null           |2024-03-01 00:00:00.000000|
|5 |1        |3              |null           |2024-03-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+

And so, here you need to return all records by credit_id (for the test, they are all 1) in descending order of date and if date is the same, then priority in such a sublist – let’s call it that – is given to records with some_property_2 != null (if there are such > 1, then any of them). That is, if date is equal, only one record will be output according to the above condition. You need to output all columns.

The result of the request should be as follows:

+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date                      |
+--+---------+---------------+---------------+--------------------------+
|3 |1        |3              |true           |2024-03-01 00:00:00.000000|
|2 |1        |null           |false          |2024-02-01 00:00:00.000000|
|1 |1        |3              |null           |2024-01-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+

The database is PostgreSQL, but it is desirable to write a portable query, i.e. a standard one for all databases. If this is not possible, then, of course, I will be glad to use the native one. I also indicated Spring and Java in the labels – since I am writing this logic using these tools and I will be glad to solve it in the language, but there are subtleties here:

Pagination is needed
Just doing a "head-on" search for all records is not suitable, since there may be 10 thousand records, and, conditionally, only 20 need to be brought to the front and the performance of such a query will be zero.

Answers

Her is solution

with uniq as (
    select credit_id, date, max(some_property_2) some_property_2 from tmp
    group by credit_id, date
)
select t.* from uniq u left join tmp t 
 on u.credit_id=t.credit_id 
and u.date=t.date
and (u.some_property_2=t.some_property_2
    or u.some_property_2 is null and t.some_property_2 is null);

- TheImpaler
- September 8, 2024 at 2:46 pm
- 0 votes
0
A portable solution is to use ROW_NUMBER():
```
select *
from (
  select t.*, row_number() over(partition by credit_id, date 
                                order by some_property_2) as rn
  from t
) x
where rn = 1
```
Alternatively, a more performance solution (although not portable) could be:
```
select distint on (credit_id, date) *
from t
order by credit_id, date, some_property_2
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How can I request unique records (DISTINCT) together with a mechanism for selecting the desired value among the same ones?

Answers