The upsert statement below does not work and get:
MongoServerError: E11000 duplicate key error collection: db.emails index: _id_ dup key: { _id: "8hh58975fw" }
My goal is to edit an existing document ( or create if exists ) with the same _id
The code in api is:
await db.collection('emails')
.updateOne(
{
_id: userId,
type: "profileCompletion",
time: new Date(),
},
{
$inc: { "count": 1},
},
{upsert: true},
)
But I coped it from this which works:
await db.collection('analytics')
.updateOne(
{
_id: getDateMonYear(new Date(), "mmddyyyy"),
type: "Geo",
},
{
$inc: { "count": 1},
},
{upsert: true},
)
2
Answers
The solution was to remove:
from the query:
That was indeed the difference between the working and non working queries in my question.
Summary: Upsert with same _id is possible no matter what the _id is as long as there is a unique index. In my case,
_id
was being derived and was the same plus it was the unique index.The link posted by the user who commented above is valid but the comments about multi threads etc are not relevant for this question.
As @Kal shared in the answer, the main issue in this particular question was associated with the usage of
time: new Date()
. This answer is intended to help explore why that is the case, help clarify how concurrency could still be a concern with the adjustedupdate
in that answer, and what a saferupdate
operation may look like.Upsert Modifications
When a new document is inserted as the result of an
upsert
, there are a few ways the new document gets constructed. This is currently documented here. Relevant to this situation is the "Set" behavior (due to the change being defined as{ $inc: { "count": 1}, }
) which uses an update operator. The relevant paragraph:So given a (new)
userId
of123
, anupsert
executed on January 1st would query the collection looking for a document that matches the following:Finding none, the
upsert
behavior would trigger and the operation would insert a document similar to the following into the collection:Now we move on to the what happens on a subsequent execution of the operation.
Upsert Matching
Let’s say that the same operation (with
userId
of123
is executed a month later at the beginning of February. The collection would be searched for a document that matches the following:While a document with the given
_id
(andtype
) does exist, the overall document does not match due to the difference intime
. Similar to before, this triggers theupsert
behavior and the operation attempts to insert the following document:This of course fails with the aforementioned duplicate key error due to the existing document that contains the
_id
value of123
. Stated another way, one of the problems is the fact that the timestamp is different every time this operation is executed.Therefore the change to remove
time: new Date()
from the<query>
portion of the operation does two things at the same time:time
field from being present in the document that gets originally inserted.<query>
when matching resulting in the existing document being updated as expected.Concurrency
I was (obviously) incorrect when I claimed in the comments that the problem that you were experiencing was the result of concurrency. More correctly I should have stated that it may be the result of concurrency. This remains true after the change to remove
time
from the operation.The documentation about upsert includes the following note:
It then links to another section that begins with:
I originally pointed to that documentation as "proof" that concurrency can be an issue. I believe @Kal originally correctly pointed out that the following text at the end of the section implies that the second operation should
update
as expected (emphasis added):This was news to me! I went back in time and took a look at how that section was worded for version 4.0 (emphasis added again):
That text changed for version
4.2
. After digging around a bit I was able to find this improvement request titled "Retry on predicate unique index violations of update + upsert -> insert when possible". The Issue Description suggests that there were some improvements made to this behavior in some situations for4.2
, more on this in the following section.Alternative (Safer) Rewrite
The aforementioned improvement shows that the server can now transparently retry an
upsert
that encounters a duplicate key exception under certain conditions. There is a table in that ticket that lays out what those conditions are, but they look to me like the ones that would be the result of concurrency between writes as opposed to other causes of the error including thetime: new Date()
problem in this particular question.In fact, we can see that even the modified operation in the other answer will not transparently retry if it encountered a duplicate key exception. If I understand correctly, the modified operation would now have a
<query
> of{ _id: userId, type: "profileCompletion" }
. This falls under row 6 described in that table astype
is not part of the unique index definition.In any case, while the modified operation no longer generates the duplicate key exception without concurrency, you may still wish to rewrite the operation. You can absolutely still add a timestamp to the document when it is inserted, but it shouldn’t be specified in the
<query>
portion of the operation. Rather, it should be expressed either via$set
or$setOnInsert
as part of the described change, depending on exactly what behavior you want when the operation performs anupdate
Something along these lines:
It feels like the documentation is a bit misleading so I’ll send some feedback on the site.