I have a dataset looks like this:
Link_number | Houband | Time | Mean_speed | Sample_Number |
---|---|---|---|---|
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:30 | 40 | 4 |
I need to do duplicate each row X number of times, X = Sample_Number. So my result will look like this:
Link_number | Houband | Time | Mean_speed | Sample_Number |
---|---|---|---|---|
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:00 | 52 | 2 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 1 | 8 | 8:30 | 55 | 5 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:00 | 20 | 3 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
Link 2 | 9 | 9:30 | 40 | 4 |
So then I can calculate the percentile of speed in each hourband for each link.
I have been able to do this in Postgresql using generate_series:
select Link_number, Hourband, percentile_cont(0.85) within group (order by Mean_Speed) as Speed_85th
from table, generate_series(1,Sample_Number)
How do I do the same in Amazon Athena?
2
Answers
As @Serg recommended, you can use a tally table to generate a sequence of numbers then use a cross join to duplicate the rows based on these values.
You can use
unnest
+sequence
(which is kind of analog ofgenerate_series
based on what I see):