skip to Main Content

I am inserting a word using the following sql query from a php script.

$sql = "INSERT IGNORE into cb_words (word, user_id) VALUES('$word', $user_id)";

The problem arises in this particular scenario –
The cb_words table contains words in Bangla language.

  1. word – আর which encoded to Unicode format is %u0986%u09B0
  2. Another word – আঁর which encoded to Unicode format is %u0986%u0981%u09B0

As one may notice from the Unicode encoding for both the words they are almost similar with an additional letter in the second word.

Now, the table – cb_words already contains the second word, but when I am trying to insert the first word it fails.
It does not insert the word and simply says 0 rows inserted.

INSERT IGNORE into cb_words (word, user_id) VALUES('আর', 2)

When tried from PHPMyAdmin
Insert query result

Shockingly enough, when I search in the table for the word as –

SELECT * FROM `cb_words` where `cb_words`.`word` = 'আর'

The returned result is –

The result of the SELECT SQL statement

N.B: The table only has the word (2) as described above. and I am trying to insert (1)

So, despite the fact, these two words are not 100% identical, why does mysql fail to differentiate between the two?

I am adding the table structure in case it may be necessary:
Table structure for cb_words

2

Answers


  1. Insert being ignored, is at my guess because your user_id is set to unique? can’t tell from your screenshots and you’re inserting with IGNORE supressing any errors. Remove the unique index from user_id.

    You can change your collation to utf8_bin.

    Created your table, locally:

    table entries
    result from your select

    similar question with some interesting reading: How can I enter data using non English (Bangla) language into this database table?

    Login or Signup to reply.
  2. mysql> select 'আর'='আর' COLLATE utf8mb4_0900_ai_ci;
    +-------------------------------------------------+
    | 'আর'='আঁর' COLLATE utf8mb4_0900_ai_ci           |
    +-------------------------------------------------+
    |                                               1 |
    +-------------------------------------------------+
    

    That is, declare the column to have COLLATE utf8mb4_0900_ai_ci;

    The two item you mention have these HEX values:
    E0A686 E0A6B0 and E0A686 E0A681 E0A6B0

    E0A681 is a non-spacing-modifier "BENGALI SIGN CANDRABINDU". When collating with "Accent Insensitive" and (or?) "Case Insensitive", the two are considered equal.

    If you want them to be treated as not-equal, use a different collation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search