I am trying to fetch unique aocDetailIds in spring boot should I use distinct keyword(Code Block 1) or should I create a SET of strings(Code Block 2) , which will be more optimised ?
@Query(value = "SELECT DISTINCT a.aocDetailId FROM aocParsedDetail a WHERE a.dateCreated >= ?1 AND a.aocParsingStatus = ?2 ORDER BY a.dateCreated")
List<String> findAocParsedDetailIdsOrderByDateCreated(@NotNull long dateCreated, EventStatus aocParsingStatus);
@Query(value = "SELECT a.aocDetailId FROM aocParsedDetail a WHERE a.dateCreated >= ?1 AND a.aocParsingStatus = ?2 ORDER BY a.dateCreated")
Set<String> findAocParsedDetailIdsOrderByDateCreated(@NotNull long dateCreated, EventStatus aocParsingStatus);
2
Answers
In database queries, the efficiency of using the DISTINCT keyword is often relatively low because it requires sorting and deduplication of the result set. If the data volume is large, this operation may consume more time and resources.
In contrast, using Set to receive query results is more efficient because Set has the feature of automatic deduplication. It can perform deduplication operations directly in memory without the need to sort the entire result set.
So from an efficiency perspective, using Set to receive query results may be more efficient. However, it should be noted that using Set will lose the order of query results. If you need to preserve the order of query results, please use List. If the data volume is small, using Distinct will not have a significant impact on performance.
Overall, if you don’t have high requirements for the order of the results and the amount of data is not particularly large, you can try using Set to receive query results for higher efficiency. If there are requirements for the order of the results or if the data volume is large, it is recommended to use a List to receive the results after deduplication.
This composite index should help performance significantly:
The two queries you present are not equivalent.
potentially returns a ‘short’ list of different values of
d
. It is slower because of having to de-dup.return all values, even duplicates. It is slower because of returning more rows.
Maybe this is what you want?
This will always be as fast or faster than either of the other formulations, but it returns only one value for
aocDetailId
that matches theWHEREs
. This is especially useful if expect only one distinct value or want simply "any" value.