$string = 'Audi MODEL 80 ENGINE 1.9 TDi';
list($make,$model,$engine) = preg_split('/( MODEL | ENGINE )/',$string);
Anything before "MODEL" would be considered "MAKE string".
Anything before "ENGINE" will be considered "MODEL string".
Anything after "ENGINE" is the "ENGINE string".
But we usually have more information in this string.
// possible variations:
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';
$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 NOTE this engine needs custom stage GEAR auto';
$string = 'Audi MODEL 80 ENGINE 1.9 TDi GEAR man YEAR 1996';
$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 DRIVE 2wd';
MODEL
and ENGINE
is always present, and is always the start of the string.
The rest (POWER
,TORQUE
,GEAR
,DRIVE
,YEAR
,NOTE
) may vary, both in sorting order, and if they’re even there or not.
Since we can’t know for sure how the ENGINE string ends, or which of the other keywords will be the first to come right after, I thought it would be possible to create an array with the keywords.
Then do some sort of a string search for first occurrence of a word that matches one of the keyword in the array.
I do need to keep the matched word.
Another way of putting this might be: "How to split the string on/before each occurrence of words in array"
3
Answers
To keep the "bits" intact with the keyword included, you can use
preg_split
with a lookahead that will split on a space followed by any one of your keywords. For example:Results in:
If you want to parse these into key/value pairs, it’s simple:
Results in:
If you’d prefer a non-RegEx method, you could also just break into individual tokens (words) and build an array. The code below makes some presumptions about whitespace which, if it is a problem, could be addressed with a replace possibly.
Outputs:
Demo: https://3v4l.org/D4pvl
If you’d like to have a dynamic associative array:
MAKE
to the stringpreg_match_all()
to capture pairs of labels and values in the formatted stringarray_column()
to restructure the columns of matches into an associative array.Code: (Demo)
Output:
This is not a new concept/technique. The only adjustment to make is how to identify the keys/labels in the original string. Instead of
[A-Z]+
you may wish to explicitly name each label and separate them in the pattern with pipes. See these other demonstrations:Alternatively, instead of using a regex to parse the string, you could manipulate the string into a standardized format that a native PHP function can parse. (Demo)