I’m using Azure.Data.Tables (12.6.1) and I need to query a single record from multiple partitions of a single table (so the result would be multiple records, 1 from each partition). Each entity needs to be looked up by its partition key and row key – for a single TableClient.GetEntity()
call this would be a point query.
After reading the documentation I’m confused if it’s efficient or not to call TableClient.QueryAsync()
with multiple partition key / row key pairs and the search results I found provide contradicting suggestions.
Is it efficient to do this (for a number of partition key / row key combinations, up to ~50) or is it just better to call GetEntity()
one by one, for each entity?
var filter = "(PartitionKey eq 'p1' And RowKey eq 'r1') Or " +
"(PartitionKey eq 'p2' And RowKey eq 'r2') Or ...";
var results = await tableClient.QueryAsync(filter, 500, null, cancelToken);
2
Answers
I don’t know if there is a definitive answer here as it probably depends on your specific requirements. I would suggest testing different options and tune accordingly.
Just for reference, here is a general reference about query performance for tables https://learn.microsoft.com/azure/storage/tables/table-storage-design-for-query
I settled on parallelizing point queries for this scenario, and has given good results. I have heavy-burst read scenarios, I may have 10’s/100’s of 1000’s of lookups to do against 100’s of millions of records). I prefer that over a query with a series of OR’s, as those were tending to give worse throughput (I don’t have any stats to hand now….)
For me parallelization happens through 2 means: