I am inserting a word using the following sql query from a php script.
$sql = "INSERT IGNORE into cb_words (word, user_id) VALUES('$word', $user_id)";
The problem arises in this particular scenario –
The cb_words table contains words in Bangla language.
- word – আর which encoded to Unicode format is %u0986%u09B0
- Another word – আঁর which encoded to Unicode format is %u0986%u0981%u09B0
As one may notice from the Unicode encoding for both the words they are almost similar with an additional letter in the second word.
Now, the table – cb_words already contains the second word, but when I am trying to insert the first word it fails.
It does not insert the word and simply says 0 rows inserted.
INSERT IGNORE into cb_words (word, user_id) VALUES('আর', 2)
Shockingly enough, when I search in the table for the word as –
SELECT * FROM `cb_words` where `cb_words`.`word` = 'আর'
The returned result is –
N.B: The table only has the word (2) as described above. and I am trying to insert (1)
So, despite the fact, these two words are not 100% identical, why does mysql fail to differentiate between the two?
I am adding the table structure in case it may be necessary:
2
Answers
Insert being ignored, is at my guess because your user_id is set to unique? can’t tell from your screenshots and you’re inserting with IGNORE supressing any errors. Remove the unique index from user_id.
You can change your collation to utf8_bin.
Created your table, locally:
similar question with some interesting reading: How can I enter data using non English (Bangla) language into this database table?
That is, declare the column to have
COLLATE utf8mb4_0900_ai_ci
;The two item you mention have these HEX values:
E0A686 E0A6B0
andE0A686 E0A681 E0A6B0
E0A681
is a non-spacing-modifier "BENGALI SIGN CANDRABINDU". When collating with "Accent Insensitive" and (or?) "Case Insensitive", the two are considered equal.If you want them to be treated as not-equal, use a different collation.