I use Dynamodb
to store User
data. Each user has many fields like age, gender, first/last name, address etc
. I need to support a query API which response first, last, middle name
only, without other fields.
In order to provide a better performance, I have two solutions:
-
Create a GSI which only includes those query fields. It will make each row very small.
-
Query the table with projection fields parameter including those query fields.
The item size is 1KB with 20 attributes. 1MB is the maximum data returned from one query. So I should receive 1024 items from querying the main index. If I use field projection to reduce the number of fields, will it give me more items in the response?
Based on dynamodb only response maximum 1MB data, which solution is better for me to use?
2
Answers
What you are trying to achieve is called "Sparse indexes".
Without knowing the table traffic pattern and historical amount of data. Another consideration is the amount of RCU (read capacity units) used for the operation.
With that in mind, the amount of RCU used by the
FilterExpression
solution will grow based on the number of fields/data the item has.You are increasing your costs over time and need to worry about the item size and amount of fields it has.
A review of how RCU works:
You can use GSI to have a separate throughput and control the used RCU capacity. The amount of data that will be transferred can be predictable. The RCU utilization will be based on the index entries only (
first
,last
,middle
andname
)You will need to update your application to use the new index and work with eventually consistent reads. GSI doesn’t have support for a strongly consistent read.
Returning to your question: "which solution is better for me to use?"
Do you need strongly consistent reads? You need to use the table base index with
FilterExpression
. Otherwise, use GSI.A good reading is this article: When to use (and when not to use) DynamoDB Filter Expressions
First of all it’s important to note that DynamoDBs 1MB limit is not a blocker, it’s there for performance reasons.
Your use case seems to want to unnecessarily reduce your payload to below the 1MB limit. However, you should just introduce pagination.
Ref
GSI or ProjectionExpression
This ultimately depends on what you need. For example, if you simply just want certain attributes and the base tables keys are suitable for your access patterns then I would 100% use a ProjectionExpression and paginate the results until I have all the data.
You should only create a GSI should the keys of the base table not suit your access pattern needs. GSI will increase your table costs and you will be storing more data and consuming extra throughput when your use-case doesn’t need to.