skip to Main Content

In PHP,

mb_strtolower('İspanyolca');

returns

U+0069  i  LATIN SMALL LETTER I
U+0307  ̇   COMBINING DOT ABOVE
U+0073  s  LATIN SMALL LETTER S
U+0070  p  LATIN SMALL LETTER P
etc.

I need to get rid of the "U+0307 ̇ COMBINING DOT ABOVE";

I tried this:

$TheUrl=mb_strtolower('İspanyolca');
$TheUrl=normalizer_normalize($TheUrl,Normalizer::FORM_C);

The combining dot above persists.

Any help would be appreciated.

2

Answers


  1. You can try a custom function in PHP that performs Unicode normalization and then remove characters that are not part of the basic Latin alphabet.
    So for example –

    function removeDiacritics($str) {
        $normalizedStr = Normalizer::normalize($str, Normalizer::FORM_C);
        
        $cleanStr = preg_replace('/[^a-zA-Z]/', '', $normalizedStr);
        return $cleanStr;
    }
    
    $TheUrl = mb_strtolower('İspanyolca');
    $TheUrl = removeDiacritics($TheUrl);
    echo $TheUrl;
    
    Login or Signup to reply.
  2. To handle this case, you can use the strtr function to replace specific characters in the string like my example below

    $TheUrl = 'İspanyolca';
    $TheUrl = mb_strtolower($TheUrl, 'UTF-8');
    $TheUrl = strtr($TheUrl, array('i̇' => 'i', 'İ' => 'i'));
    

    This will replace the lowercase 'i' with a dot above and the uppercase 'İ' with a regular lowercase 'i'.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search