skip to Main Content

We have a collection that may contain hundreds of billions of data.
Now we want to get the count of it.

Thanks!

When I’m using count() to count the number:

ref = db.collection('my_collection').count()
print(ref.get())

it always returns error like this:

raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.DeadlineExceeded: 504 Aggregation query timed out. Please try either limiting the entities scanned, or running with an updated index configuration.

It’s timeout all the time. Am I missing something? What’s the proper way to count a large collection?

2

Answers


  1. When dealing with a large collection in Firestore, counting the number of documents using count() may lead to timeouts or performance issues because it tries to count all the documents in the collection in one go. Firestore’s query and data retrieval operations are optimized for smaller result sets. Make sure you have relevant indexes configured for your collection. Firestore needs proper indexes for efficient queries. You can check the Firestore console to ensure that indexes are in place for the fields you are querying or filtering on. Without proper indexes, the queries will be slower. Instead of counting all documents at once, paginate through the collection and count documents in smaller chunks. Here’s a general outline of how you might approach it:

    import firebase_admin
    from firebase_admin import credentials
    from google.cloud import firestore
    
    # Initialize Firebase Admin SDK
    cred = credentials.Certificate('path-to-serviceAccountKey.json')
    firebase_admin.initialize_app(cred)
    
    # Initialize Firestore client
    db = firestore.Client()
    
    # Your collection reference
    collection_ref = db.collection('my_collection')
    
    batch_size = 1000  # Adjust as needed
    total_count = 0
    query = collection_ref
    
    while True:
        documents = query.limit(batch_size).stream()
        batch_count = len(list(documents))
        
        total_count += batch_count
        
        if batch_count < batch_size:
            break
    
    print("Total Count:", total_count)
    
    Login or Signup to reply.
  2. If you have in a single collection, hundreds of billions of documents, then count() doesn’t seem to be the right solution. If the count() function cannot return a result within 60 seconds, then a DEADLINE_EXCEEDED error is thrown. Please note that the performance always depends on the size of the collection. So a possible workaround is to use counters for such large collections. Alternatively, you can create and maintain your own counter as explained in this resource.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search