Is it possible to use 2 dist keys in a table?
I have a query which makes use of 3 tables.
Table 1 - Table 2 join on ID1
and Table 2 - Table 3 join on ID2
.
I need to display the most recent data in this case.
So, according to my analysis sortkey
can be updated_time
since I’d be limiting data on the basis of this field.
Further, dist key
should the attribute which would be most used for joins
.
That means, for Table 1 distkey
can be ID1
, for Table 3 distkey
can be ID2.
Can there be 2 distkeys for table 2 – ID1 and ID2?
2
Answers
No, it is not possible to specify multiple DISTKEYs.
From CREATE TABLE – Amazon Redshift:
The distribution key is used to distribute rows amongst servers. Preferably, when JOINing tables, they should have the same
DISTKEY
so that data is co-located on the same server and does not need to be sent between servers.You will need to optimize for your most important table. It is not always possible to optimize for all tables and queries.
It is also possible to specify an
AUTO
distribution style. From Distribution styles – Amazon Redshift:I recommend studying Tuning query performance – Amazon Redshift and watching some AWS Reinvent videos about Redshift to discover how to best optimize tables and queries.
Adding to John R’s correct answer.
You should consider making either table 2 or table 3 DISTSTYLE ALL. You don’t mention sizes or other use patterns for these tables so this isn’t a slam dunk. It looks like these tables are likely dimensional in nature and the space increase for DISTSTYLE ALL is often minor or even a net win when dealing with smaller tables.