skip to Main Content

I’m using Azure AI Search and need advice on handling data submission efficiently. My system has hundreds of GBs of data, and I could generate multiple entries per second—potentially dozens depending on traffic.

Here Are My Challenges:

  • Each individual data entry is small (e.g., user creation, messages).

  • Undecided on whether to send each entry immediately or batch them.

  • Concerned about sending too many requests to Azure AI Search simultaneously. Could this cause issues?

Main Questions:

  1. Should I send each individual data entry immediately, or accumulate and send them every 10-20-30 seconds?

  2. If I send entries individually, what frequency is considered too frequent and might lead to throttling or performance issues?

  3. If I batch the data, what strategies work best for interim storage? Should I use a database, Redis, or something else?

  4. What are the best practices for optimizing data submission intervals and ensuring efficient indexing?

Any insights or experiences with managing high-volume data submissions in Azure Cognitive Search would be greatly appreciated!

2

Answers


  1. Chosen as BEST ANSWER

    I wanted to share the solution I found for improving performance when submitting documents to Azure AI Search.

    The documentation wasn't very clear on this, but It was suggested that batching data before sending significantly improves performance. Using indexers wasn't suitable for my needs since they sync at a minimum of once every 5 minutes.

    I conducted a test using a free-tier Azure AI Search instance. I sent 10, 50, 100, and eventually 200 concurrent POST calls. I even scaled up to sending 200 concurrent calls 100 times. Despite the free-tier limitations, the overall response time remained stable, and I didn't encounter any 503 errors.

    In conclusion, even with the free tier, the service's compute power and throughput are robust enough to handle 500-1000 push requests effectively. I hope this helps anyone facing similar challenges!


  2. You store each individual data entry either in sql database or in azure blob storage as they arrives.

    In blob it can Json , text, csv etc. here are the supported format

    Next, index it by following the steps mentioned in this document for blob storage as data source.

    When new data arrives to blob storage, you just run the indexer in regular interval like daily or every hour.

    It automatically adds new documents to index.

    enter image description here

    If you want to reindex entire documents then Reset and Run indexer.

    Next, if you want to remove the document from index you enable deletion tracking in data source.

    enter image description here

    This removes the document from index when you rerun the indexer after deleting documents from blob storage.

    By using above approach, you can index data from blob storage in a regular interval.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search