I need to parse strings like
k1=v1, k2=v2, ... kn=vn
with primary delimiter (,) and key-value-delimiter (=) may vary.
The values may be quoted (" or ‘) and may contain the primary delimiter within the quotes.
As I saw no restrictions for this kind of usage in the PHP docs, I found str_getcsv()
suitable.
But with a first test I found out, that the enclosure character was ignored:
$ php -v
PHP 8.1.2-1ubuntu2.14 (cli) (built: Aug 18 2023 11:41:11) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.2, Copyright (c) Zend Technologies
with Zend OPcache v8.1.2-1ubuntu2.14, Copyright (c), by Zend Technologies
$ php -a
Interactive shell
php > $S2 = 'name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);"';
php > var_dump($S2);
string(60) "name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);""
php > var_dump(str_getcsv($S2));
array(5) {
[0]=>
string(9) "name=ABCD"
[1]=>
string(11) " value=17.3"
[2]=>
string(7) " autoID"
[3]=>
string(23) " onclick="getinfo($this"
[4]=>
string(6) " 99);""
}
php > $S3 = 'name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|';
php > var_dump($S3);
string(60) "name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|"
php > var_dump(str_getcsv($S3, ',', '|'));
array(5) {
[0]=>
string(9) "name=ABCD"
[1]=>
string(11) " value=17.3"
[2]=>
string(7) " autoID"
[3]=>
string(23) " onclick=|getinfo($this"
[4]=>
string(6) " 99);|"
}
The quoted string "getinfo($this, 99);" include the primary delimiter (,) and is therefore split into two elements ignoring the quotes. Usage of other enclosure character does not change anything.
So I’m wondering if this is a bug, a wrong usage or if my assumption, that str_getcsv is suitable for this operation, is wrong. Any help and comments appreciated.
Tested on different PHP environments with same result.
2
Answers
It’s more a case of where the delimiters are. The are meant to enclose the whole field and not just part of it.
So replace it with
will give you the result
It feels like a pretty unstable string format that you are playing with.
In a pinch, you might get away with a regex pattern, but I wouldn’t use this approach in a professional application.
The pattern quickly parses substrings that contain no double quotes or commas before a delimiting comma. Otherwise, it endeavours to match pairs of double quotes. There will be scenarios where this approach will have false positive splitting.
Code: (Demo)
Output:
I wonder how this comma-space-delimited string of element attributes is being generated. I advise that you tackle that earlier problem rather than burning time on trying to grapple the current string format. Maybe you could scape the attributes in a different fashion, then pass that payload to php as a json string.