skip to Main Content

I’m using Azure.Data.Tables (12.6.1) and I need to query a single record from multiple partitions of a single table (so the result would be multiple records, 1 from each partition). Each entity needs to be looked up by its partition key and row key – for a single TableClient.GetEntity() call this would be a point query.

After reading the documentation I’m confused if it’s efficient or not to call TableClient.QueryAsync() with multiple partition key / row key pairs and the search results I found provide contradicting suggestions.

Is it efficient to do this (for a number of partition key / row key combinations, up to ~50) or is it just better to call GetEntity() one by one, for each entity?

var filter = "(PartitionKey eq 'p1' And RowKey eq 'r1') Or " +
    "(PartitionKey eq 'p2' And RowKey eq 'r2') Or ...";
var results = await tableClient.QueryAsync(filter, 500, null, cancelToken);

2

Answers


  1. I don’t know if there is a definitive answer here as it probably depends on your specific requirements. I would suggest testing different options and tune accordingly.

    Just for reference, here is a general reference about query performance for tables https://learn.microsoft.com/azure/storage/tables/table-storage-design-for-query

    Login or Signup to reply.
  2. I settled on parallelizing point queries for this scenario, and has given good results. I have heavy-burst read scenarios, I may have 10’s/100’s of 1000’s of lookups to do against 100’s of millions of records). I prefer that over a query with a series of OR’s, as those were tending to give worse throughput (I don’t have any stats to hand now….)

    For me parallelization happens through 2 means:

    1. lower level: awaiting a batch of Tasks, each making an individual point query
    2. higher level: architecting a particularly heavy workload to scale out over multiple instances, each making parallel queries via 1)
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search