I would like to create a table that will be automatically sorted by a column, but with null
values last.
I thought i could do it with sortkey
like this, but when querying the table it still comes back just in the order it was filled:
create temp table temp (
id integer,
date date sortkey ,
month integer,
cost integer);
insert into temp values
(1, '2022-09-01', 1, 5),
(3, '2023-01-01', 2, 3),
(1, '2022-09-01', 2, 20),
(2, '2022-11-01', 3, 26),
(1, null, 3, 33),
(1, null, 4, 39),
(1, '2022-09-01', 6, 60),
(1, null, 7, 72),
(2, '2022-11-01', 1, 11),
(2, null, 2, 14),
(2, '2022-11-01', 4, 55),
(2, '2022-11-01', 5, 99),
(3, null, 1, 1),
(1, '2022-09-01', 5, 44),
(3, '2023-01-01', 3, 4);
.
2
Answers
First off Redshift is a cluster and portions of the table are stored on each “slice” of the cluster. So at some level these slices aren’t “sorted” with respect to each other. You don’t specify the size of the cluster you are using so it is not clear how many slices are in play. When the data comes back it will be in the order in which the slices return their data.
Next, the data stored on Redshift’s disks are only resorted on a VACUUM command. Vacuum rewrites all the blocks of disk storage with blocks that are in sort order. So if you didn’t Vacuum the table no change will happen.
Lastly Redshift will skip the sort phase of vacuum if less than the threshold number of rows are unsorted – 95% by default. So you will need to vacuum “TO 100 PERCENT” to be sure that the entire table is sorted by vacuum.
I am a Redshift specialist.
If you’re just starting with Redshift, this PDF may be of interest;
https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.pdf