skip to Main Content

I’ve stumbled across a problem in PHP and it’s proving much harder to solve than I would have expected.

On the English version of my site, I have a plaintext-fragment:

about-us

which I can straightforwardly change into the capitalised text form:

About Us

using the following:

$Text_Array = explode('-', $Plain_Text_Fragment); // ['about', 'us']

for ($i = 0; $i < count($Text_Array); $i++) {
  $Text_Array[$i] = strtoupper($Text_Array[$i][0]) . substr($Text_Array[$i], 1);
}

$Capitalised_Text = implode(' ', $Text_Array); // 'About Us'

It turns out, it’s not nearly so straightforward to turn the plaintext fragment:

über-uns

into the capitalised text form:

&Uuml;ber Uns

TLDR: What’s the most straightforward approach in PHP to achieve this?


Problem #1 : Ascertaining whether the first letter is multi-byte

I only need to capitalise the first letter of each word in the plaintext-fragment, so, whilst I can easily tell that the plaintext-fragment contains one or more multibyte characters, using:

strlen('über') === mb_strlen('über') // FALSE

that still doesn’t tell me whether the first letter of the plaintext fragment is multibyte or not. (It might be one or more of any of the other letters).

I can’t isolate and test $Text_Array[$i][0] because, of course, the 'ü' in 'über' is both $Text_Array[$i][0] and $Text_Array[$i][1].

It also appears that mb_str_split() does not exist.


Problem #2 : Capitalising 'ü'

Once I am past Problem #1 (having confirmed that the first letter of 'über' is multibyte), it’s not clear to me how to capitalise it. I want to use mb_strtoupper() but I need to use this on both $Text_Array[$i][0] and $Text_Array[$i][1] and no other character (unless there are other multibyte characters in $Text_Array[$i].

I think I can solve Problem #2 something like this:

$Text_Array[$i] = mb_strtoupper(substr($Text_Array[$i], 0, 2)) . substr($Text_Array[$i], 2);

I have checked this and it definitely works. One down, two to go.


Problem #3 : Outputting &Uuml; instead of Ü

Although I am working using UTF-8 encoding, I’d much prefer to output the HTML-escape &Uuml; than a raw Ü. I figured there would be a PHP native function to allow me to convert between the two and there is:

htmlentities()

But I really can’t tell if htmlentities() is working or not because both my DOM Inspector and my View Source are telling me that they see Ü, not &Uuml;. I appreciate that they might be seeing the latter and they are just trying to be helpful, but I can’t be absolutely sure whether the PHP function htmlentities() is working or not.


Question:

What’s the most straightforward approach in PHP to convert:

über-uns

into:

&Uuml;ber Uns ?

2

Answers


  1. You are pretty close there, but stick to mb_* functions all the way:

    $Text_Array = explode('-', $Plain_Text_Fragment); // ['about', 'us']
    
    for ($i = 0; $i < count($Text_Array); $i++) {
        $Text_Array[$i] = mb_strtoupper(mb_substr($Text_Array[$i],0,1)) . mb_substr($Text_Array[$i], 1);
    }
    
    $Capitalised_Text = implode(' ', $Text_Array); // 'About Us'
    

    Problem 1: Use mb_substr()

    Use mb_substr to access the first character. Square bracket will access the first byte, not multibyte codepoint.

    Problem 2: Use mb_strtoupper()

    This is not an issue once you get the first multibyte character, just stick to mb_strtoupper and you are fine.

    Problem 3: Specify charset for htmlentities()

    This is sorted out by specifying the charset for htmlentities, e.g:

    htmlentities($Capitalised_Text,null,'UTF-8')
    

    Of course if your default_charset is set to UTF-8 you may skip this and use htmlentities() directly.

    Login or Signup to reply.
  2. Try using mb_convert_case

    $string = "über-uns";
    
    $string = str_replace("-", " ", $string);
    
    $capitalised = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
    
    echo htmlentities($capitalised, ENT_HTML5, "UTF-8");
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search