It’s very late and I think I’ve been staring at this too long to figure out, but: I have been provided a bunch of raw text where anything within in tildes (~
) is a title, and everything else is just plain text. However, the text may or may not include newlines; for example:
Title & text on the same line:
~THE BURGER MINI~A tiny little burger patty in a tiny little bun.
Title & text on different lines:
~THE BURGER MAX~
A gigantic hunk of steak in between two toasted baguettes, each stuffed with beef & cheese`
A combination of both:
~THE BURGER ZERO~
No burger, no bun, just air.
~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.
Ultimately the kind of output I’m trying to achieve would be something like:
Array
(
[0] => Array
(
[title] => THE BURGER ZERO
)
[1] => Array
(
[text] => No burger, no bun, just air.
)
[2] => Array
(
[title] => THE BURGER ITALIANO
)
[3] => Array
(
[text] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
)
[4] => Array
(
[title] => NOTE
)
[5] => Array
(
[text] => This is basically giant ravioli.
)
)
…so I can then differentiate between titles & text, but crucially in the order they appear.
I can split the string in newlines into an array with the following:
$tempArray = preg_split('/s*Rs*/', trim($str), NULL, PREG_SPLIT_NO_EMPTY);
But after that, I get stuck. Using preg_split
on any group within tildes (preg_split('/~(.*?)~/uim', $line);
) will give me all of the paragraph text, but loses the titles (as they’re being used for the split). I’ve been banging my head against various forms of preg_match
& preg_match_all
but all I’m getting is a headache.
Is there a straightforward way to get what I’m after that would work with all of the above examples?
3
Answers
So, match a tilde, followed by one or more of anything but a tilde, followed by another tilde. Capture what’s between the tildes:
Followed by zero or more newlines:
Followed by one or more of anything but tildes and newlines. And capture that.
This will give you the titles in
$match[1]
and the descriptions in$match[2]
:Which you might then combine into a single array:
This solution is without the usage of any regex.
How it works is that
The output is:
A way with
preg_split
that has the useful optionPREG_SPLIT_DELIM_CAPTURE
that returns captured parts of the delimiter:demo