skip to Main Content

I want to delete data from multiple collection based on ids received using match query from aggregation .

Currently in python i am doing like this but it is taking a lot of time to execute .
motor is used

data = studentSource.aggregate([
            {"$match": {'primary_source.utm_source': source_name}},
            {'$project': {'student_id': 1, '_id': 0}}
        ])

        students = [stud async for stud in data]
        if len(students) != 0:
            for i in range(len(students)):
                await studentsPrimaryDetails.delete_many({'_id': students[i]['student_id']})
                await studentSecondaryDetails.delete_many({'student_id': students[i]['student_id']})
                await studentTimeline.delete_many({'student_id': students[i]['student_id']})
                await studentApplicationForms.delete_many({'student_id': students[i]['student_id']})
                await queries.delete_many({'student_id': students[i]['student_id']})
                await leadsFollowUp.delete_many({'student_id': students[i]['student_id']})
                await lead_details.delete_many({'student_id': students[i]['student_id']})

            await studentSource.delete_many({'primary_source.utm_source': source_name})

    ```
Is there any improvement can be done to execute this much faster . Is bulkWrite() useful . But i have to delete data from multiple collection

2

Answers


  1. You can put the ID’s into an array and use the array for condition. I have no clue about python, the JavaScript syntax would be this one:

    student_ids = db.studentSource.aggregate([
        {"$match": {'primary_source.utm_source': source_name}},
        {'$project': {'student_id': 1, '_id': 0}}
    ]).toArray().map( x => x.student_id)
    
    db.studentsPrimaryDetails.deleteMany({'_id': {$in: student_ids }})
    db.studentSecondaryDetails.deleteMany({'_id': {$in: student_ids }})
    db.studentTimeline.deleteMany({'_id': {$in: student_ids }})
    db.studentApplicationForms.deleteMany({'_id': {$in: student_ids }})
    db.queries.deleteMany({'_id': {$in: student_ids }})
    db.leadsFollowUp.deleteMany({'_id': {$in: student_ids }})
    db.lead_details.deleteMany({'_id': {$in: student_ids }})
    
    Login or Signup to reply.
  2. I second Wernfried Domscheit. Delete all matching documents with $in. In Python would be something like this:

        data = studentSource.aggregate([
            {"$match": {'primary_source.utm_source': source_name}},
            {'$project': {'student_id': 1, '_id': 0}}
        ])
    
        students = await data.to_list(length=None)
        ids = [id for id in set([s.get('student_id') for s in students ]) if id ] 
        deletes = [
            studentsPrimaryDetails.delete_many({'_id': {'$in': ids }}),
            studentSecondaryDetails.delete_many({'student_id': {'$in': ids }}),
            studentTimeline.delete_many({'student_id': {'$in': ids }}),
            studentApplicationForms.delete_many({'student_id': {'$in': ids }}),
            queries.delete_many({'student_id': {'$in': ids }}),
            leadsFollowUp.delete_many({'student_id': {'$in': ids }}),
            lead_details.delete_many({'student_id': {'$in': ids }}),
            studentSource.delete_many({'primary_source.utm_source': source_name})
        ]
        await asyncio.gather(*deletes)
    

    asyncio.gather fires all 8 requests in parallel, but depending on the driver’s connection pool settings, some may wait for a free socket.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search