Amazon web services - DynamoDB insertion with GSI

Hang
December 1, 2024
60 views
1 vote
2 Answers

New to DynamoDB.

Suppose I am building an E-commerce site with single table design, where there are many entities like orders, deliveries, etc. I start with Partition Key and Sort Key with generic names: "PK" and "SK", since it makes no sense to name partition key as "order_id" when you also store delivery entities into it.

For example, if I have an order entity with attributes below:

{"id": "O#123", "customer_id": "C#456", "order_name": "test", "order_date": "xx-xx-xx"}

If I put order id as PK, customer id as SK, and I also give it an entity type attribute in dynamoDB, it should be something like this:

|  PK   |   SK   |  order_name  |  order_date  |  entity_type  | 
| O#123 | C#456  |   test       |  xx-xx-xx    |      order    |

Then, I follow the same pattern to build a GSI-PK and GSI-SK, also with generic name, since I will place different entity attributes into it if my current PK and SK cannot satisfy the querying needs.

Now, based on the inserting syntax in python, suppose in this same record above, I want to use entity_type as GSI-PK and order_date as GSI-SK, how should I insert?

Should I insert with this?

dynamodb.put_item(
TableName='YourTableName',
Item={
    'pk': {'S': 'O#123'},
    'sk': {'S': 'C#456'},
    'order_name': {'S': 'test'},
    'GSI-SK': {'S': 'xx-xx-xx'},
    'GSI-PK': {'S': 'order'},
})

If so, this is really confusing to read for others, since they do not know GSI-PK is "referring" to the old "entity_type" column. Also, other entities need to have an "entity_type" attributes, but not all records need to have a GSI on them (GSI is sparse). Then if I want to see what entity this record belongs to, why should I look for two different attributes (some store their type in GSI-PK, some store in entity_type)?

Or should I insert like this by duplicating two attributes:

dynamodb.put_item(
TableName='YourTableName',
Item={
    'pk': {'S': 'O#123'},
    'sk': {'S': 'C#456'},
    'order_name': {'S': 'test'},
    'GSI-SK': {'S': 'xx-xx-xx'},
    'GSI-PK': {'S': 'order'},
    'order_date': {'S': 'xx-xx-xx'},
    'entity_type': {'S': 'order'},
})

If so, am I storing a lot of duplicated information? what if I need another GSI? I duplicate again?

Is there something wrong with my understanding? Or is this just how "single table design" looks like?

Tags: amazon-dynamodb amazon-web-services

Answers

- FediBounouh
- December 1, 2024 at 1:41 am
- 0 votes
0
Your understanding of single-table design is correct, and yes, duplicating attributes for GSIs is common in DynamoDB.

DynamoDB requires that attributes you want to query in a GSI (Global Secondary Index) must exist in the item as GSI-PK and GSI-SK. These indexes are separate from the main table, so the data needs to be explicitly added, even if it feels like duplication. It’s a trade-off for the high performance and flexibility DynamoDB offers.

About Your Two Scenarios

1 – Without duplication: if you only add GSI-PK and GSI-SK without keeping the original attributes (entity_type and order_date), it works technically, but it’s harder to understand. It’s not clear what GSI-PK refers to unless someone knows the schema well, making your design harder to maintain.

2 – With duplication: including both the original attributes (entity_type and order_date) and their duplicates (GSI-PK and GSI-SK) makes the schema more readable and consistent. You might worry about wasting storage or something, but DynamoDB is designed for this kind of duplication as it’s negligible in terms of cost and helps with clarity and querying.

If you need another GSI, you’ll just duplicate the attributes required for that index. For example, if you want to index by delivery_status, you might add something like this:
```
'GSI2-PK': 'delivered',
'GSI2-SK': 'O#123'
```
This doesn’t mean duplicating all attributes, rather only the ones that help you query efficiently.
Login or Signup to reply.

- LeeroyHannigan
- December 1, 2024 at 10:41 am
- 0 votes
0
The question you’re asking is down to personal preference. Many people choose to duplicate the key attributes for each item, which allows them to make different indexing decisions later, should you wish to build a sparse index on only one type of entity for example.

Not duplicating the items is also fine, so long as the application reading the data knows how to parse it, some customers tend to store a mapping item, which they can change dynamically to understand the schema.

But it’s all personal preference, do what is best for you and your application. Don’t let some duplication of data throw you off, so long as it doesn’t cause your average item size to go onto the next 1KB which will increase your write costs.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – DynamoDB insertion with GSI

Answers