skip to Main Content

I use Dynamodb to store User data. Each user has many fields like age, gender, first/last name, address etc. I need to support a query API which response first, last, middle name only, without other fields.

In order to provide a better performance, I have two solutions:

  • Create a GSI which only includes those query fields. It will make each row very small.

  • Query the table with projection fields parameter including those query fields.

The item size is 1KB with 20 attributes. 1MB is the maximum data returned from one query. So I should receive 1024 items from querying the main index. If I use field projection to reduce the number of fields, will it give me more items in the response?

Based on dynamodb only response maximum 1MB data, which solution is better for me to use?

2

Answers


  1. What you are trying to achieve is called "Sparse indexes".

    Without knowing the table traffic pattern and historical amount of data. Another consideration is the amount of RCU (read capacity units) used for the operation.

    FilterExpression is applied after a Query finishes, but before the results are returned.

    Link to Documentation

    With that in mind, the amount of RCU used by the FilterExpression solution will grow based on the number of fields/data the item has.

    You are increasing your costs over time and need to worry about the item size and amount of fields it has.

    A review of how RCU works:

    DynamoDB read requests can be either strongly consistent, eventually consistent, or transactional.

    • A strongly consistent read request of an item up to 4 KB requires one read request unit.
    • An eventually consistent read request of an item up to 4 KB requires one-half read request unit.
    • A transactional read request of an item up to 4 KB requires two read request units.

    Link to documentation

    You can use GSI to have a separate throughput and control the used RCU capacity. The amount of data that will be transferred can be predictable. The RCU utilization will be based on the index entries only (first, last, middle and name)

    You will need to update your application to use the new index and work with eventually consistent reads. GSI doesn’t have support for a strongly consistent read.

    Global secondary indexes support eventually consistent reads, each of which consume one half of a read capacity unit. This means that a single global secondary index query can retrieve up to 2 × 4 KB = 8 KB per read capacity unit.

    For global secondary index queries, DynamoDB calculates the provisioned read activity in the same way as it does for queries against tables. The only difference is that the calculation is based on the sizes of the index entries, rather than the size of the item in the base table.

    Link to documentation


    Returning to your question: "which solution is better for me to use?"

    Do you need strongly consistent reads? You need to use the table base index with FilterExpression. Otherwise, use GSI.

    A good reading is this article: When to use (and when not to use) DynamoDB Filter Expressions

    Login or Signup to reply.
  2. First of all it’s important to note that DynamoDBs 1MB limit is not a blocker, it’s there for performance reasons.

    Your use case seems to want to unnecessarily reduce your payload to below the 1MB limit. However, you should just introduce pagination.

    DynamoDB paginates the results from Query operations. With pagination, the Query results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on.

    The LastEvaluatedKey from a Query response should be used as the ExclusiveStartKey for the next Query request. If there is not a LastEvaluatedKey element in a Query response, then you have retrieved the final page of results. If LastEvaluatedKey is not empty, it does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is empty.

    Ref

    GSI or ProjectionExpression

    This ultimately depends on what you need. For example, if you simply just want certain attributes and the base tables keys are suitable for your access patterns then I would 100% use a ProjectionExpression and paginate the results until I have all the data.

    You should only create a GSI should the keys of the base table not suit your access pattern needs. GSI will increase your table costs and you will be storing more data and consuming extra throughput when your use-case doesn’t need to.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search