I’m building a CLI app in PHP that has a method to output text:
$out->line('Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus porttitor.');
I’m limiting the line output to 80 characters within line()
via:
public function line(string $text): void
{
$this->rawLine(wordwrap($text, 80, PHP_EOL));
}
This prints the output across multiple lines:
Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia
bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id
elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus
porttitor.
Now, I can also style parts of the text using ANSI escape codes:
$out->line('Morbi leo risus, ' . Style::inline('porta ac consectetur', ['color' => 'blue', 'attribute' => 'bold']) . ' ac, vestibulum at eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur blandit tempus porttitor.');
Which gets converted to this:
Morbi leo risus, x1b[34;1mporta ac consecteturx1b[39;22m ac, vestibulum at
eros. Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh
ultricies vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur.
Curabitur blandit tempus porttitor.
And when passed to line()
, printed out like this:
Morbi leo risus, porta ac consectetur ac, vestibulum at eros.
Aenean lacinia bibendum nulla sed consectetur. Nullam id dolor id nibh ultricies
vehicula ut id elit. Aenean lacinia bibendum nulla sed consectetur. Curabitur
blandit tempus porttitor.
Where "porta ac consectetur ac" is blue and bold, but if you notice, the line is shorter than before and doesn’t break at the same place.
Even though these are non-printing characters, wordwrap()
(and strlen()
) has issues calculating the length appropriately.
The first line is originally 76 characters without ANSI escape codes:
Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Aenean lacinia
But after adding styles, it comes back as 97 characters:
Morbi leo risus, x1b[34;1mporta ac consecteturx1b[39;22m ac, vestibulum at eros. Aenean lacinia
In other parts of the app, like a table, I "solved" this by having a method to set the column value and then a separate method to style said column. That way, I can reliably get the length, but also output the text in the defined style.
I could pass both an unstyled version and then a style version of the text, but that doesn’t feel right. Nor does it solve the problem of then splitting the style version accurately.
To solve the issue with line()
, I thought about stripping out the ANSI escape codes to get actual length, then add the PHP_EOL
break where needed, and then inject the style back in, but that doesn’t feel like the right solution and it seems complicated– how would I even go about doing that?
So my question is: How can I reliably split text containing ANSI escape codes based on text length?
2
Answers
This is the input:
The following method strips out escape codes from styled text and saves a copy as clean text.
The clean text is used to add line breaks using
wordwrap
based on desired column width.It loops over styled text and injects a line break after every word in which PHP added a line break in clean text.
Note that this can't easily be tested on the web, since the escape codes only style text appropriately when used via a CLI.
Based on an approach I’ve used to truncate text in another answer (Truncate a multibyte String to n chars), counting the length of segments just needs to ignore the ANSI sequences while counting characters.
To have clean breaks in the text, the snippet below will only replaces spaces with newlines (it is not designed to break on hyphens).
Code: (Demo) (Regex101 Demo)
Effectively, the script replaces all newlines with spaces, then injects new newlines where deemed appropriate to return:
I’ve added the character counts at the end of each line for clarity.
Which will be visually presented without ANSI sequences as:
Patter Breakdown: