skip to Main Content

I have a source collection with the following documents indexed on the first 4 fields.

[{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'}, 
{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'}, 
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'}, 
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'}, 
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'}, 
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'}, 
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'}, 
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'}]

I’d like to copy the distinct documents from my source collection (based on the first 4 fields) over to another collection. The new collection should have the documents below. My source collection is huge and so performance will be an important consideration on how to effect this move.

[{state: 'NY', city: 'New York', zip: '10000', store: '1234', item: '1234', size: 'L'}, 
{state: 'NY', city: 'New York', zip: '10100', store: '1234', item: '1234', size: 'L'}, 
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345', item: '2345', size: 'M'}, 
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345', item: '2345', size: 'M'}]

I tried the aggregate pipeline but could only get to listing the distinct values for the documents but not the entire document.

[{state: 'NY', city: 'New York', zip: '10000', store: '1234'}, 
{state: 'NY', city: 'New York', zip: '10100', store: '1234'}, 
{state: 'NJ', city: 'Newark', zip: '08800', store: '2345'}, 
{state: 'NJ', city: 'Newark', zip: '08810', store: '2345'}]

2

Answers


  1. Use $first with Root to get the first (or last) document in the grouping criteria:

    db.collection.aggregate([
      {
        $group: {
          "_id": {
            state: "$state",
            city: "$city",
            zip: "$zip",
            store: "$store"
          },
          doc: { $first: "$$ROOT" }
        }
      },
      { $replaceWith: "$doc" },
      { $project: { _id: 0 } },  // to have new IDs in the target collection
      { $out: "output-collection" }
    ])
    

    Note $out will replace the specified collection if it exists.

    Mongo Playground

    Resulting docs:

    [
      { "city": "Newark", "item": "2345", "size": "M", "state": "NJ", "store": "2345", "zip": "08800" },
      { "city": "Newark", "item": "2345", "size": "M", "state": "NJ", "store": "2345", "zip": "08810" },
      { "city": "New York", "item": "1234", "size": "L", "state": "NY", "store": "1234", "zip": "10000" },
      { "city": "New York", "item": "1234", "size": "L", "state": "NY", "store": "1234", "zip": "10100" }
    ]
    
    Login or Signup to reply.
  2. You can use $setWindowFields to compute $rank within the partition (i.e. the first 4 fields). Then select those with rank = 1 and $merge to output to another collection.

    db.collection.aggregate([
      {
        "$setWindowFields": {
          "partitionBy": {
            "state": "$state",
            "city": "$city",
            "zip": "$zip",
            "store": "$store"
          },
          "sortBy": {
            "_id": 1
          },
          "output": {
            "rank": {
              "$rank": {}
            }
          }
        }
      },
      {
        "$match": {
          "rank": 1
        }
      },
      {
        "$unset": "rank"
      },
      {
        "$merge": {
          "into": "new_collection",
          "on": "_id"
        }
      }
    ])
    

    Mongo Playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search