I have my table like this:
event_date | user_id |
---|---|
2021-10-01 | 1 |
2021-10-01 | 2 |
2021-10-02 | 1 |
2021-10-02 | 2 |
2021-10-02 | 3 |
2021-10-03 | 3 |
2021-10-03 | 4 |
2021-10-03 | 4 |
2021-10-04 | 5 |
2021-10-04 | 2 |
2021-10-06 | 3 |
2021-10-06 | 5 |
Sample data
WITH my_table (event_date, user_id) AS (
values ('2021-10-01', 1),
('2021-10-01', 2),
('2021-10-02', 1),
('2021-10-02', 2),
('2021-10-02', 3),
('2021-10-03', 3),
('2021-10-03', 4),
('2021-10-03', 4),
('2021-10-04', 5),
('2021-10-04', 2),
('2021-10-06', 3),
('2021-10-06', 5)
)
SELECT *
FROM my_table
And I want to calculate moving count distinct user_id for the last 3 days.
Desired table should look like this:
event_date | count_distinct_user_id_last_3_days |
---|---|
2021-10-01 | 2 |
2021-10-02 | 3 |
2021-10-03 | 4 |
2021-10-04 | 5 |
2021-10-05 | 4 |
2021-10-06 | 4 |
Thanks for your help!
2
Answers
You can get an approximate moving count distinct via the following query below. Unfortunately,
DISTINCT
is not currently supported in window functions, so you can’t useCOUNT(DISTINCT user_id) OVER (...)
Since you are missing some days in the data (and you want to show all dates) and Athena version of Presto does not support date ranges you can try approach inserting missing dates with window functions based on rows and leveraging array functions:
Output: