skip to Main Content

We have a collection with documents called notes like:

{
  "_id": {
    "$oid": "64b42008f622157dd5aead17"
  },
  "isbn": 9789401466097,
  "chapter": 2,
  "notes": [
    {
      "note": "a note",
      "dt": 2374238742345,
    },
    {
      "note": "another note about the same book and chapter",
      "dt": 2345234234,
    }
  ]
}

and a second similar document (and more)

{
  "_id": {
    "$oid": "64b42234f622157dd5aead1c"
  },
  "isbn": 9789401466097,
  "chapter": 1,
  "notes": [
    {
      "note": "A note about the same book but another chapter",
      "dt": 23742387423
    },
  ]
}

{
  "_id": {
    "$oid": "64b42234f6223de7dd5aead1c"
  },
  "isbn": 9789401488088,
  "chapter": 1,
  "notes": [
    {
      "note": "something about another book",
      "dt": 23742384555
    },
    {
      "note": "something else",
      "dt": 23452333333
      }
    }
  ]
}

Other similar documents may have the same or other isbs or other chapters.

The combination of isbn and chapter form the unique key for a document.

Then there’s the collection of books (hence the isbn). So with:

collection = 'books' # using pythons pymongo
match = {'$match': {'isbn': isbn}}
projection = {'$project':
        {
            '_id': '$_id',
            'isbn': '$isbn',
            'title': '$title',
        }
    }

pipeline = [
        {'$facet':
            {
                'the_book':
                    [
                        match,
                        projection,
                    ],
            }
        }
    ]

I consult and project the collection of books.

What I try to accomplish is to count all of the notes that are made for all of the chapters for the book with a given isbn.

What I’ve tried:

lookup_notes = {'$lookup':
        {
            'from': 'notes',
            'localField': 'isbn',
            'foreignField': 'isbn',
            'as': 'note',
        },
    }
project_notes = {'$project':
        {
            "teller": {
                "$size": '$note.notes'
            }
        }
    }
pipeline = [
        {'$facet':
            {
                'the_book':
                    [
                        match,
                        projection,
                    ],
                'the_notes':
                    [
                        lookup_notes,
                        match,
                        project_notes,
                    ],
            }
        }
    ]

as well as a lot of other attempts using $group, $size and $count. They result in errors or only counting the amount of relevant documents in de notes collection, but not the combined amount of notes inside all of the relevant documents.

For the above example, with isbn=9789401466097 I should get the result 3. Not 2 or 5.

Thanks for the help.

2

Answers


  1. isbn = 9789401466097
    match = {"$match": {"isbn": isbn}}
    
    # join 'notes' collection.
    lookup_notes = {
        "$lookup": {
            "from": "notes",
            "localField": "isbn",
            "foreignField": "isbn",
            "as": "related_notes",
        },
    }
    # deconstruct the notes array.
    unwind_notes = {
        "$unwind": "$related_notes"
    }
    # deconstruct the notes inside the related_notes.
    unwind_inner_notes = {
        "$unwind": "$related_notes.notes"
    }
    # count the notes.
    group_notes = {
        "$group": {
            "_id": "$isbn",
            "count": {"$sum": 1}
        },
    }
    pipeline = [match, lookup_notes, unwind_notes, unwind_inner_notes, group_notes]
    result = books_collection.aggregate(pipeline)
    
    Login or Signup to reply.
  2. One option is:

    1. Start from the notes collection, as there are more documents there. Group them by isbn and count the notes.
    2. Add the book data using $lookup
    3. Format the result
    db.notes.aggregate([
      {$group: {
          _id: "$isbn",
          the_notes: {$sum: {$size: "$notes"}}
      }},
      {$lookup: {
          from: "books",
          localField: "_id",
          foreignField: "isbn",
          as: "the_book",
          pipeline: [{$project: {_id: "$_id", isbn: "$isbn", title: "$title"}}]
      }},
      {$project: {the_book: {$first: "$the_book"}, the_notes: 1, _id: 0}}
    ])
    

    See how it works on the playground example

    Note that $facet is a costly option, is does not support indexing.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search