I need to select rows from the database that are unique by date. Ideally, there will be very few such strings, but if they come across, then you need to select only one value among them. DISTINCT assumes just a filter by uniqueness, but it does not focus on which records it selects.
Let’s use an example instead:
+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date |
+--+---------+---------------+---------------+--------------------------+
|1 |1 |3 |null |2024-01-01 00:00:00.000000|
|2 |1 |null |false |2024-02-01 00:00:00.000000|
|3 |1 |3 |true |2024-03-01 00:00:00.000000|
|4 |1 |3 |null |2024-03-01 00:00:00.000000|
|5 |1 |3 |null |2024-03-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+
And so, here you need to return all records by credit_id
(for the test, they are all 1) in descending order of date
and if date
is the same, then priority in such a sublist – let’s call it that – is given to records with some_property_2 != null
(if there are such > 1
, then any of them). That is, if date
is equal, only one record will be output according to the above condition. You need to output all columns.
The result of the request should be as follows:
+--+---------+---------------+---------------+--------------------------+
|id|credit_id|some_property_1|some_property_2|date |
+--+---------+---------------+---------------+--------------------------+
|3 |1 |3 |true |2024-03-01 00:00:00.000000|
|2 |1 |null |false |2024-02-01 00:00:00.000000|
|1 |1 |3 |null |2024-01-01 00:00:00.000000|
+--+---------+---------------+---------------+--------------------------+
The database is PostgreSQL
, but it is desirable to write a portable query, i.e. a standard one for all databases. If this is not possible, then, of course, I will be glad to use the native one. I also indicated Spring
and Java
in the labels – since I am writing this logic using these tools and I will be glad to solve it in the language, but there are subtleties here:
- Pagination is needed
- Just doing a "head-on" search for all records is not suitable, since there may be 10 thousand records, and, conditionally, only 20 need to be brought to the front and the performance of such a query will be zero.
2
Answers
Her is solution
A portable solution is to use
ROW_NUMBER()
:Alternatively, a more performance solution (although not portable) could be: