skip to Main Content

I would like to create a table that will be automatically sorted by a column, but with null values last.

I thought i could do it with sortkey like this, but when querying the table it still comes back just in the order it was filled:

create temp table temp (
    id integer,
    date date sortkey ,
    month integer,
    cost integer);

insert into temp values
    (1, '2022-09-01', 1, 5),
    (3, '2023-01-01', 2, 3),
    (1, '2022-09-01', 2, 20),
    (2, '2022-11-01', 3, 26),
    (1, null, 3, 33),
    (1, null, 4, 39),
    (1, '2022-09-01', 6, 60),
    (1, null, 7, 72),
    (2, '2022-11-01', 1, 11),
    (2, null, 2, 14),
    (2, '2022-11-01', 4, 55),
    (2, '2022-11-01', 5, 99),
    (3, null, 1, 1),
    (1, '2022-09-01', 5, 44),
    (3, '2023-01-01', 3, 4);

.

2

Answers


  1. First off Redshift is a cluster and portions of the table are stored on each “slice” of the cluster. So at some level these slices aren’t “sorted” with respect to each other. You don’t specify the size of the cluster you are using so it is not clear how many slices are in play. When the data comes back it will be in the order in which the slices return their data.

    Next, the data stored on Redshift’s disks are only resorted on a VACUUM command. Vacuum rewrites all the blocks of disk storage with blocks that are in sort order. So if you didn’t Vacuum the table no change will happen.

    Lastly Redshift will skip the sort phase of vacuum if less than the threshold number of rows are unsorted – 95% by default. So you will need to vacuum “TO 100 PERCENT” to be sure that the entire table is sorted by vacuum.

    Login or Signup to reply.
  2. I am a Redshift specialist.

    If you’re just starting with Redshift, this PDF may be of interest;

    https://www.redshiftresearchproject.org/white_papers/downloads/introduction_to_the_fundamentals_of_amazon_redshift.pdf

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search