So I have a large cosmodb database of documents, I have an index and indexer that will iterate through an build this index. All works great.
But if a document is removed from the source database the search index still contains it.
I understand that there is a data deletion policy but that seems to indicate that the source database needs a property to indicate a soft delete. But the document has been deleted for real, no soft delete in the database.
So why can I not get the indexer to remove all documents that are no longer in the source data?
2
Answers
Because it doesn’t know a document has been deleted. You can think of it this way. It basically keeps tracking a cursor containing the last processed change based on
_ts
(last modification to a document). After the scheduler triggers a query is done to check the latest changes based on that value. It can detect updates and inserts, but deletions cannot be detected as the query doesn’t return any changes for them.If you want it to work you there’s a few things you can do:
isDeleted
). That is updated in CosmosDb and notifies Azure Search that the document should be removed.ttl
and time to live policy on your CosmosDb so the item is also deleted from CosmosDb some time in the future with a timespan large enough that the scheduler is ‘garantueed’ to remove the item first.An Indexer will help you extract data from data sources. It’s not a two way binding though. Whenever you delete data in you Azure Cosmos DB, somehow you need to do it too in your Azure Cognitive Search.
The easiest way is to use Change Feed, Subscribe to it, then push changes / delete data in your Azure Cognitive Search.
More info:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/read-change-feed#azure-functions