skip to Main Content
coll = MongoClient(uri)[database][collection]
cursor = coll.find(query, sort=sort) # I want the query to start executing here!
# It doesnt matter to my application if we're stuck here for a couple seconds
# ...
doc = next(cursor) # this must not be slow!

Note: I dont want to fetch all documents right away, just get the first document/batch of documents to avoid the lazy loading.

2

Answers


  1. You might try using hasNext immediately after creating the cursor.

    cursor = coll.find(query, sort=sort)
    cursor.hasNext()
    

    hasNext will return true if there is a document in the iterator, false if not. The point is that it must consult the server, and therefore get the first batch of results, before it can return.

    Note that hasNext will block until an answer is available.

    Login or Signup to reply.
  2. Here’s how you could so it; set up a class which gets the first item in the cursor. Then use a generator to yield the first item then the remaining items in a cursor:

    class MyCursor:
        def __init__(self, query: dict, sort: list):
            self.cursor = coll.find(query, sort=sort)
            self.first_item = next(self.cursor)
            print(self.first_item)
    
        def find(self):
            yield self.first_item
            yield from self.cursor
    

    In a worked example with some random data:

    from pymongo import MongoClient, ASCENDING
    import random
    import string
    import time
    
    client = MongoClient()
    db = client["mydatabase"]
    coll = db["mycollection"]
    
    coll.insert_many([
        {
            "age": random.randint(0, 100),
            "email": ''.join(random.choices(string.ascii_lowercase, k=8)) + "@example.com",
        }
        for _ in range(100000)
    ])
    
    
    class MyCursor:
        def __init__(self, query: dict, sort: list):
            self.cursor = coll.find(query, sort=sort)
            self.first_item = next(self.cursor)
            print(f"Here's the first record to show we return it. Remove this line of code when happy: {self.first_item}")
    
        def find(self):
            yield self.first_item
            yield from self.cursor
    
    
    query = {'age': {'$gte': 60}}
    sort = [('email', ASCENDING)]
    start_time = time.time()
    my_cursor = MyCursor(query, sort)
    print(f'Initial setup: {(time.time() - start_time) * 1000} milliseconds.')
    start_time = time.time()
    for index, doc in enumerate(my_cursor.find()):
        print(f'Document {index}: {(time.time() - start_time) * 1000} milliseconds.: {doc}')
        start_time = time.time()
        if index >= 2:
            break
    

    prints:

    Here's the first record to show we return it. Remove this line of code when happy: {'_id': ObjectId('65745a45c4e22463bfbfbb7b'), 'age': 96, 'email': '[email protected]'}
    Initial setup: 90.20113945007324 milliseconds.
    Document 0: 0.0 milliseconds.: {'_id': ObjectId('65745a45c4e22463bfbfbb7b'), 'age': 96, 'email': '[email protected]'}
    Document 1: 0.0 milliseconds.: {'_id': ObjectId('65745a45c4e22463bfc0a9e2'), 'age': 67, 'email': '[email protected]'}
    Document 2: 0.0 milliseconds.: {'_id': ObjectId('65745a45c4e22463bfc002fc'), 'age': 83, 'email': '[email protected]'}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search