New to DynamoDB.
Suppose I am building an E-commerce site with single table design, where there are many entities like orders, deliveries, etc. I start with Partition Key and Sort Key with generic names: "PK" and "SK", since it makes no sense to name partition key as "order_id" when you also store delivery entities into it.
For example, if I have an order entity with attributes below:
{"id": "O#123", "customer_id": "C#456", "order_name": "test", "order_date": "xx-xx-xx"}
If I put order id as PK, customer id as SK, and I also give it an entity type attribute in dynamoDB, it should be something like this:
| PK | SK | order_name | order_date | entity_type |
| O#123 | C#456 | test | xx-xx-xx | order |
Then, I follow the same pattern to build a GSI-PK and GSI-SK, also with generic name, since I will place different entity attributes into it if my current PK and SK cannot satisfy the querying needs.
Now, based on the inserting syntax in python, suppose in this same record above, I want to use entity_type as GSI-PK and order_date as GSI-SK, how should I insert?
Should I insert with this?
dynamodb.put_item(
TableName='YourTableName',
Item={
'pk': {'S': 'O#123'},
'sk': {'S': 'C#456'},
'order_name': {'S': 'test'},
'GSI-SK': {'S': 'xx-xx-xx'},
'GSI-PK': {'S': 'order'},
})
If so, this is really confusing to read for others, since they do not know GSI-PK is "referring" to the old "entity_type" column. Also, other entities need to have an "entity_type" attributes, but not all records need to have a GSI on them (GSI is sparse). Then if I want to see what entity this record belongs to, why should I look for two different attributes (some store their type in GSI-PK, some store in entity_type)?
Or should I insert like this by duplicating two attributes:
dynamodb.put_item(
TableName='YourTableName',
Item={
'pk': {'S': 'O#123'},
'sk': {'S': 'C#456'},
'order_name': {'S': 'test'},
'GSI-SK': {'S': 'xx-xx-xx'},
'GSI-PK': {'S': 'order'},
'order_date': {'S': 'xx-xx-xx'},
'entity_type': {'S': 'order'},
})
If so, am I storing a lot of duplicated information? what if I need another GSI? I duplicate again?
Is there something wrong with my understanding? Or is this just how "single table design" looks like?
2
Answers
Your understanding of single-table design is correct, and yes, duplicating attributes for GSIs is common in DynamoDB.
DynamoDB requires that attributes you want to query in a GSI (Global Secondary Index) must exist in the item as
GSI-PK
andGSI-SK
. These indexes are separate from the main table, so the data needs to be explicitly added, even if it feels like duplication. It’s a trade-off for the high performance and flexibility DynamoDB offers.About Your Two Scenarios
1 – Without duplication: if you only add
GSI-PK
andGSI-SK
without keeping the original attributes (entity_type
andorder_date
), it works technically, but it’s harder to understand. It’s not clear whatGSI-PK
refers to unless someone knows the schema well, making your design harder to maintain.2 – With duplication: including both the original attributes (
entity_type
andorder_date
) and their duplicates (GSI-PK
andGSI-SK
) makes the schema more readable and consistent. You might worry about wasting storage or something, but DynamoDB is designed for this kind of duplication as it’s negligible in terms of cost and helps with clarity and querying.If you need another GSI, you’ll just duplicate the attributes required for that index. For example, if you want to index by
delivery_status
, you might add something like this:This doesn’t mean duplicating all attributes, rather only the ones that help you query efficiently.
The question you’re asking is down to personal preference. Many people choose to duplicate the key attributes for each item, which allows them to make different indexing decisions later, should you wish to build a sparse index on only one type of entity for example.
Not duplicating the items is also fine, so long as the application reading the data knows how to parse it, some customers tend to store a mapping item, which they can change dynamically to understand the schema.
But it’s all personal preference, do what is best for you and your application. Don’t let some duplication of data throw you off, so long as it doesn’t cause your average item size to go onto the next 1KB which will increase your write costs.