skip to Main Content

BACKGROUND — I’m creating an Arabic-English dictionary that uses transliterations as the unique identifiers for terms (e.g. to distinguish between أكل /ʔakl/ & أكل /ʔakal/). Many Arabic letters don’t have Latin-script equivalents, so I use certain special characters like āēīōū, ṣḍṭẓ & so on.

I have the following query:

$asInflection = Inflection::where('translit', $term->slug);

I’ve just noticed that my use of special characters returns incorrect query results.

In one case, the $term->slug is ʔamal & there is an Inflection::where(‘translit’, ʔāmāl). Laravel returns a match, which it should not; it’s absolutely imperative for the dictionary’s proper functioning that these characters not be treated the same. I’m not sure if the issue is with PHP or with MySQL; I’m pretty sure the issue nothing to do with Laravel itself, but I imagine there is something I can do via Laravel to solve it.

Any advice is appreciated.

3

Answers


  1. I think there is a character recognition problem from the Database.

    if you want to match some speacial charcters then you should use utf8_unicode_ci type of Collation for translit type column instead of utf8_general_ci.

    So here you need to replace column collation type from utf8_general_ci to utf8_unicode_ci

    Login or Signup to reply.
  2. You can try this:

    $asInflection = Inflection::where('translit', $term->slug)->get();
    
    Login or Signup to reply.
  3. āēīōū, ṣḍṭẓ can be represented in utf8mb4, which is the MySQL CHARACTER SET that you should be using. latin1 should not be used for Arabic.

    select 'أكل /ʔakl/' = 'أكل /ʔakal/';
    

    returns false (0). And it only because of the last a.

    أكل /ʔakal/ is this in utf8mb4; here is the HEX() of it:

    D8A3 D983 D984 2F CA94 61 6B 61 6C 2F 
    

    so I don’t see the need for āēīōū, ṣḍṭẓ Maybe it is the keyboard-entry that is lacking?

    UNICODE is the controlling organization. PHP, MySQL, Laravel, etc, simply follow its rules. (I don’t know about Inflection.)

    Run this to see what collations you have. I don’t see any that are specific to Arabic:

    SHOW COLLATION LIKE 'utf8mb4%';
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search