I need each document in a collection to be updated only if its content is different, regardless of the order of the elements in nested lists.
Fundamentally, two versions should be the same if the elements are identical regardless of their order. MongoDB does not do that, by default.
def upsert(query, update):
# collection is a pymongo.collection.Collection object
result = collection.update_one(query, update, upsert=True)
print("tFound match: ", result.matched_count > 0)
print("tCreated: ", result.upserted_id is not None)
print("tModified existing: ", result.modified_count > 0)
query = {"name": "Some name"}
update = {"$set": {
"products": [
{"product_name": "a"},
{"product_name": "b"},
{"product_name": "c"}]
}}
print("First update")
upsert(query, update)
print("Same update")
upsert(query, update)
update = {"$set": {
"products": [
{"product_name": "c"},
{"product_name": "b"},
{"product_name": "a"}]
}}
print("Update with different order of products")
upsert(query, update)
Output:
First update
Found match: False
Created: True
Modified existing: False
Same update
Found match: True
Created: False
Modified existing: False
Update with different order of products
Found match: True
Created: False
Modified existing: True
The last update does modify the document because the order of products are indeed different.
I did find a working solution which is to compare a sorting of the queried document’s content and a sorting of the new one.
Thanks to Zero Piraeus‘s response for the short and convenient way to sort for comparison.
def ordered(obj):
if isinstance(obj, dict):
return sorted((k, ordered(v)) for k, v in obj.items())
if isinstance(obj, list):
return sorted(ordered(x) for x in obj)
else:
return obj
I apply it to compare the current and the new versions of the document. If their sorting are different, I apply the update.
new_update = {
"products": [
{"product_name": "b"},
{"product_name": "c"},
{"product_name": "a"}]
}
returned_doc = collection.find_one(query)
# Merging remote document with local dictionary
merged_doc = {**returned_doc, **new_update}
if ordered(returned_doc) != ordered(merged_doc):
upsert(query, {"$set": new_update})
print("Updated")
else:
print("Not Updated")
Output:
Not Updated
That works, but that relies on python to do the comparison, introducing a delay between the read and the write.
Is there a way to do it atomically ? Or, even better, a way to set a MongoDB Collection to adopt some kind of "order inside arrays doesn’t matter" mode ?
This is part of a generic implementation. Documents can have any kind of nesting in their structure.
2
Answers
EDIT:
According to your comment (and the resemblance to other question) I suggest:
See how it works on the playground example
Original answer:
One option is to use the query part of the update to work only on documents that are matching your condition:
See how it works on the playground example
@nimrodserok correctly pointed out a flaw in my first answer. Here’s my updated answer that’s a slight variation of his answer.
This should also preserve the
upsert
option.Here’s a mongoplayground.net example to demonstrate the concept. You can change the
"name"
value to verify theupsert
option.I’m curious what the
result
values are for thisupdate_one
.