skip to Main Content

I need to select rows from the database that are unique by date. Ideally, there will be very few such strings, but if they come across, then you need to select only one value among them. DISTINCT assumes just a filter by uniqueness, but it does not focus on which records it selects.
Let’s use an example instead:

+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date                      |
+--+---------+---------------+---------------+--------------------------+
|1 |1        |3              |null           |2024-01-01 00:00:00.000000|
|2 |1        |null           |false          |2024-02-01 00:00:00.000000|
|3 |1        |3              |true           |2024-03-01 00:00:00.000000|
|4 |1        |3              |null           |2024-03-01 00:00:00.000000|
|5 |1        |3              |null           |2024-03-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+

And so, here you need to return all records by credit_id (for the test, they are all 1) in descending order of date and if date is the same, then priority in such a sublist – let’s call it that – is given to records with some_property_2 != null (if there are such > 1, then any of them). That is, if date is equal, only one record will be output according to the above condition. You need to output all columns.

The result of the request should be as follows:

+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date                      |
+--+---------+---------------+---------------+--------------------------+
|3 |1        |3              |true           |2024-03-01 00:00:00.000000|
|2 |1        |null           |false          |2024-02-01 00:00:00.000000|
|1 |1        |3              |null           |2024-01-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+

The database is PostgreSQL, but it is desirable to write a portable query, i.e. a standard one for all databases. If this is not possible, then, of course, I will be glad to use the native one. I also indicated Spring and Java in the labels – since I am writing this logic using these tools and I will be glad to solve it in the language, but there are subtleties here:

  1. Pagination is needed
  2. Just doing a "head-on" search for all records is not suitable, since there may be 10 thousand records, and, conditionally, only 20 need to be brought to the front and the performance of such a query will be zero.

2

Answers


  1. Her is solution

    with uniq as (
        select credit_id, date, max(some_property_2) some_property_2 from tmp
        group by credit_id, date
    )
    select t.* from uniq u left join tmp t 
     on u.credit_id=t.credit_id 
    and u.date=t.date
    and (u.some_property_2=t.some_property_2
        or u.some_property_2 is null and t.some_property_2 is null);
    
    Login or Signup to reply.
  2. A portable solution is to use ROW_NUMBER():

    select *
    from (
      select t.*, row_number() over(partition by credit_id, date 
                                    order by some_property_2) as rn
      from t
    ) x
    where rn = 1
    

    Alternatively, a more performance solution (although not portable) could be:

    select distint on (credit_id, date) *
    from t
    order by credit_id, date, some_property_2
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search