I’ve stumbled across a problem in PHP and it’s proving much harder to solve than I would have expected.
On the English version of my site, I have a plaintext-fragment:
about-us
which I can straightforwardly change into the capitalised text form:
About Us
using the following:
$Text_Array = explode('-', $Plain_Text_Fragment); // ['about', 'us']
for ($i = 0; $i < count($Text_Array); $i++) {
$Text_Array[$i] = strtoupper($Text_Array[$i][0]) . substr($Text_Array[$i], 1);
}
$Capitalised_Text = implode(' ', $Text_Array); // 'About Us'
It turns out, it’s not nearly so straightforward to turn the plaintext fragment:
über-uns
into the capitalised text form:
Über Uns
TLDR: What’s the most straightforward approach in PHP to achieve this?
Problem #1 : Ascertaining whether the first letter is multi-byte
I only need to capitalise the first letter of each word in the plaintext-fragment, so, whilst I can easily tell that the plaintext-fragment contains one or more multibyte characters, using:
strlen('über') === mb_strlen('über') // FALSE
that still doesn’t tell me whether the first letter of the plaintext fragment is multibyte or not. (It might be one or more of any of the other letters).
I can’t isolate and test $Text_Array[$i][0]
because, of course, the 'ü'
in 'über'
is both $Text_Array[$i][0]
and $Text_Array[$i][1]
.
It also appears that mb_str_split()
does not exist.
Problem #2 : Capitalising 'ü'
Once I am past Problem #1 (having confirmed that the first letter of 'über'
is multibyte), it’s not clear to me how to capitalise it. I want to use mb_strtoupper()
but I need to use this on both $Text_Array[$i][0]
and $Text_Array[$i][1]
and no other character (unless there are other multibyte characters in $Text_Array[$i]
.
I think I can solve Problem #2 something like this:
$Text_Array[$i] = mb_strtoupper(substr($Text_Array[$i], 0, 2)) . substr($Text_Array[$i], 2);
I have checked this and it definitely works. One down, two to go.
Problem #3 : Outputting Ü
instead of Ü
Although I am working using UTF-8 encoding, I’d much prefer to output the HTML-escape Ü
than a raw Ü
. I figured there would be a PHP native function to allow me to convert between the two and there is:
htmlentities()
But I really can’t tell if htmlentities()
is working or not because both my DOM Inspector and my View Source are telling me that they see Ü
, not Ü
. I appreciate that they might be seeing the latter and they are just trying to be helpful, but I can’t be absolutely sure whether the PHP function htmlentities()
is working or not.
Question:
What’s the most straightforward approach in PHP to convert:
über-uns
into:
Über Uns ?
2
Answers
You are pretty close there, but stick to mb_* functions all the way:
Problem 1: Use
mb_substr()
Use
mb_substr
to access the first character. Square bracket will access the first byte, not multibyte codepoint.Problem 2: Use
mb_strtoupper()
This is not an issue once you get the first multibyte character, just stick to
mb_strtoupper
and you are fine.Problem 3: Specify charset for
htmlentities()
This is sorted out by specifying the charset for
htmlentities
, e.g:Of course if your default_charset is set to UTF-8 you may skip this and use
htmlentities()
directly.Try using
mb_convert_case