Here is a simple question I got but I want to know if there is anyone could tell me the mechanism behind the distinct and group by in window function.
I’ve read several posts but didn’t get any related to my doubt.
Here is my question:
Regarding the following input/output:
enter image description here
This was coming out by the following SQL query:
select
player_id
, first_value(event_date) over(partition by player_id order by event_date) as first_login
from Activity
group by player_id
I know that it can be solved by simple aggregation, don’t get me wrong.
I just wonder why the ‘group by’ seems not working in this question, while the ‘distinct’ can work and pass the testing. Thank you in advance.
select
DISTINCT player_id
, first_value(event_date) over(partition by player_id order by event_date) as first_login
from Activity
If I use cute or subquery, then the ‘group by’ works:
select
*
from
(select
player_id
, first_value(event_date) over(partition by player_id order by event_date) as first_login
from Activity) as cte
group by player_id, first_login
Dummy Data
WITH
Activity AS (
SELECT 1 AS player_id, 2 AS device_id, '2016-03-01' AS event_date, 5 AS games_played UNION ALL
SELECT 1, 2, '2016-05-02', 6 UNION ALL
SELECT 1, 3, '2015-06-25', 1 UNION ALL
SELECT 3, 1, '2016-03-02', 0 UNION ALL
SELECT 3, 4, '2016-02-03', 5 )
2
Answers
The explanation of the difference between your queries is that there is a sequence to the operations.
player_id
). In your example, this results in an erroneous value, ‘2016-03-01’ which is not the minimum value, it’s just the value from the first row read during execution. This error is caused because you didn’t usesql_mode=ONLY_FULL_GROUP_BY
.DISTINCT
applies after window functions, reducing rows to those with a unique combination of values. This eliminates repeating rows.