I have the following code in PostgreSQL, which I need to reproduce on Amazon Redshift:
select *
from
( select distinct
a.id
, sum(case when a.is_batch_empty then 1 else 0 end)
over (partition by a.client_id order by a.id) as empty_count
from my_temp_table a
) a
where a.id = 111
In Amazon Redshift, this code execution fails with the following error:
Aggregate window functions with an
ORDER BY
clause require a frame clause
So, on Redshift, I need to add ROWS BETWEEN
section.
The problem is that I cannot find the formula which gives the same result as on PostgreSQL.
What I tried so far:
-
Without
ORDER BY
-
With a frame
ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW
-
With a frame
ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
I have different result with every of these options, and no one of them matches the PostgreSQL solution.
So, what I have to do with this formula to get identical results on PostgreSQL and Redshift?
2
Answers
Just remove the "order by a.id" clause. It doesn’t matter in which order you sum the values.
In PostgreSQL
order by
in window spec impliesbetween unbounded preceding and current row
frame clause. If you removeorder by
, the frame spans the whole partition. Quoting the doc:And that’s mentioned right after what you were looking for:
Your second attempt was the closest, but note that the default is
RANGE
, notROWS
and I don’t seeRANGE
in Redshift window function doc. Ifa.id
isn’t unique, the results will differ as demonstrated here.You could try to emulate the
RANGE
frame clause behaviour with a correlated scalar subquery: