We have a collection that may contain hundreds of billions of data.
Now we want to get the count of it.
Thanks!
When I’m using count()
to count the number:
ref = db.collection('my_collection').count()
print(ref.get())
it always returns error like this:
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.DeadlineExceeded: 504 Aggregation query timed out. Please try either limiting the entities scanned, or running with an updated index configuration.
It’s timeout all the time. Am I missing something? What’s the proper way to count a large collection?
2
Answers
When dealing with a large collection in Firestore, counting the number of documents using
count()
may lead to timeouts or performance issues because it tries to count all the documents in the collection in one go. Firestore’s query and data retrieval operations are optimized for smaller result sets. Make sure you have relevant indexes configured for your collection. Firestore needs proper indexes for efficient queries. You can check the Firestore console to ensure that indexes are in place for the fields you are querying or filtering on. Without proper indexes, the queries will be slower. Instead of counting all documents at once, paginate through the collection and count documents in smaller chunks. Here’s a general outline of how you might approach it:If you have in a single collection, hundreds of billions of documents, then
count()
doesn’t seem to be the right solution. If thecount()
function cannot return a result within 60 seconds, then aDEADLINE_EXCEEDED
error is thrown. Please note that the performance always depends on the size of the collection. So a possible workaround is to use counters for such large collections. Alternatively, you can create and maintain your own counter as explained in this resource.