skip to Main Content

I am trying to search my MongoDB of products. The dataset has multiple of each product to record price over time. I would like to search for a phrase then limit the results to 1 of each UPC. My current code works well but will return multiple of the same UPC.

Current Code, will return multiple of the same UPC:

response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

Example Data Set:

{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.99,
  "timestamp": "2023-08-05 22:29:04 EDT-0400",

  }
}
{
  "_id": {
    "$oid": "64cf057c7844ef1a25ee57fd"
  },
  "upc": "048894970887",
  "name": "Basic Window Fan - Holmes",
  "salePrice": 54.99,
  "available": false,
  "timestamp": "2023-08-05 22:29:16 EDT-0400",
 
    }
  }
}
{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.97,
  "timestamp": "2023-08-04 13:25:09 EDT-0400",

  }
}

Not sure if I should be using distinct, or find?

2

Answers


  1. I filtered the array of results by using a dictionary storing the upc id, and append into a list of documents if upc id is not existing.

    import pymongo
    
    myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
    mydb = myclient["mydatabase"]
    mycol = mydb["product"]
    mycol.drop()
     
    data=[
     {
      "_id": {
        "oid": "64cf05707844ef1a25ee57fa"
      },
      "upc": "032622013625",
      "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
      "salePrice": 29.99,
      "timestamp": "2023-08-05 22:29:04 EDT-0400"
    
     },
     {
      "_id": {
        "oid": "64cf057c7844ef1a25ee57fd"
      },
      "upc": "048894970887",
      "name": "Basic Window Fan - Holmes",
      "salePrice": 54.99,
      "available": False,
      "timestamp": "2023-08-05 22:29:16 EDT-0400"
     
    },
    {
      "_id": {
        "oid": "64cf05707844ef1a25ee57fb"
      },
      "upc": "032622013625",
      "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
      "salePrice": 29.97,
      "timestamp": "2023-08-04 13:25:09 EDT-0400"
    
    }
    ]
    
    for d in data:   
        x = mycol.insert_one(d)
        
    resp=mycol.create_index(
        [
             ("upc", "text")
        ]
    )
    print(resp)   
        
    search="032622013625"
    response = mycol.find( { "$text": { "$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)
    
    list_upc_already_seen=[]
    list_documents=[]
    
    for doc in response:
        upc=doc.get("upc")
        if upc not in list_upc_already_seen:
            list_documents.append(doc)
            list_upc_already_seen.append(upc)
    
    print(list_documents)
    

    [{‘upc’: ‘032622013625’, ‘name’: ‘Luigi Bormioli Michelangelo Beverage 20oz Set of 4’, ‘salePrice’: 29.99, ‘timestamp’: ‘2023-08-05 22:29:04 EDT-0400’}]

    Login or Signup to reply.
  2. You could use "$top" with "$group" in an aggregation pipeline to get your result. If you only want certain fields returned, you could use a "$project" stage.

    response = self.DB.aggregate([
      {
        "$match": {'$text': {'$search': f'/{search}/'}}
      },
      {
        "$group": {
          "_id": "$upc",
          "mostRecent": {
            "$top": {
              "sortBy": {
                "timestamp": -1
              },
              "output": "$$ROOT"
            }
          }
        }
      },
      {
        "$replaceWith": "$mostRecent"
      }
    ])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search