How to Create a Case-Insensitive Unique Compound Index in MongoDB?
Question Body:
I’m trying to create a compound unique text index in MongoDB with the following fields:
tenant_id: 1
seller_id: 1
seller_sku_code: text
However, I’m encountering an error when trying to create the index:
Index build failed: 4df81a22-042f-4e51-bbec-b0f8a7dafe2e: Collection product_db.skus ( 83cc8978-b0fe-4c4b-ae58-691e32db7f95 ) :: caused by :: E11000 duplicate key error collection: product_db.skus index: tenant_id_1_seller_id_1_seller_sku_code_text dup key: { tenant_id: "1", seller_id: "113", _fts: "acc", _ftsx: 0.75 }
I need to ensure that the seller_sku_code is unique within the database and is case-insensitive. For example:
"ABC" and "abc" should not be allowed simultaneously.
"ACC-2001" and "ACC-2000" should be allowed to coexist.
How can I create this unique, case-insensitive index on seller_sku_code while still ensuring the uniqueness of the tenant_id and seller_id fields?
2
Answers
The same index will do the job. Once the index creation statement has been successfully executed, all new documents will be stored according to the new rule you would like to be in place. To demonstrate the same point, please look at the action log below.
Please note that the second document did not create due to key violation. It means no new documents will be stored violating the rule you are looking for. The reason for this behavior can be understood from the quote below from the MongoDB manual. For more, refer link 1 below.
Now the issue at present you are facing is with the duplicated information already stored in the collection in the absence of this index. This needs to be addressed. The action required is to remove such documents from the collection and enable the same index. If it is a productive database, as you are aware, it requires careful analysis, action planning and execution.
Note:
As the final clarifications, the following two statements will be successful since the values for the index prefixed keys – a and b, are new.
Links:
1) Text Index Properties
Text index is probably not what you are looking for.
"ACC-2001" and "ACC-2000" are strings, and contain some alphanumeric characters, but they are not text.
To ensure that no two documents in the entire database have the same seller_sku_code, create a unique index on just that field, which a case-insensitive collation, like:
Use the locale that is appropriate, and strength of 1 or 2 for case-insensitive.
The problem you were encountering with the index build is the text index tokenizer sees the dash
-
as a separator, so instead of indexing the values "ACC-2001" it was indexing the values "ACC" and "2001". That would prevent any other document with the same tenant and seller_id from having a seller_sku_code starting with "ACC-".