skip to Main Content

I need to parse strings like
k1=v1, k2=v2, ... kn=vn with primary delimiter (,) and key-value-delimiter (=) may vary.
The values may be quoted (" or ‘) and may contain the primary delimiter within the quotes.

As I saw no restrictions for this kind of usage in the PHP docs, I found str_getcsv() suitable.

But with a first test I found out, that the enclosure character was ignored:

$ php -v
PHP 8.1.2-1ubuntu2.14 (cli) (built: Aug 18 2023 11:41:11) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.2, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.2-1ubuntu2.14, Copyright (c), by Zend Technologies
$ php -a
Interactive shell

php > $S2 = 'name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);"';
php > var_dump($S2);
string(60) "name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);""
php > var_dump(str_getcsv($S2));
array(5) {
  [0]=>
  string(9) "name=ABCD"
  [1]=>
  string(11) " value=17.3"
  [2]=>
  string(7) " autoID"
  [3]=>
  string(23) " onclick="getinfo($this"
  [4]=>
  string(6) " 99);""
}
php > $S3 = 'name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|';
php > var_dump($S3);
string(60) "name=ABCD, value=17.3, autoID, onclick=|getinfo($this, 99);|"
php > var_dump(str_getcsv($S3, ',', '|'));
array(5) {
  [0]=>
  string(9) "name=ABCD"
  [1]=>
  string(11) " value=17.3"
  [2]=>
  string(7) " autoID"
  [3]=>
  string(23) " onclick=|getinfo($this"
  [4]=>
  string(6) " 99);|"
}

The quoted string "getinfo($this, 99);" include the primary delimiter (,) and is therefore split into two elements ignoring the quotes. Usage of other enclosure character does not change anything.

So I’m wondering if this is a bug, a wrong usage or if my assumption, that str_getcsv is suitable for this operation, is wrong. Any help and comments appreciated.

Tested on different PHP environments with same result.

2

Answers


  1. It’s more a case of where the delimiters are. The are meant to enclose the whole field and not just part of it.

    So replace it with

    $S3 = 'name=ABCD, value=17.3, autoID, |onclick=getinfo($this, 99);|';
    

    will give you the result

    array(4) {
      [0]=>
      string(9) "name=ABCD"
      [1]=>
      string(11) " value=17.3"
      [2]=>
      string(7) " autoID"
      [3]=>
      string(27) "onclick=getinfo($this, 99);"
    }
    
    Login or Signup to reply.
  2. It feels like a pretty unstable string format that you are playing with.

    In a pinch, you might get away with a regex pattern, but I wouldn’t use this approach in a professional application.

    The pattern quickly parses substrings that contain no double quotes or commas before a delimiting comma. Otherwise, it endeavours to match pairs of double quotes. There will be scenarios where this approach will have false positive splitting.

    Code: (Demo)

    $S2 = 'name=ABCD, value=17.3, autoID, onclick="getinfo($this, 99);", height=6" tall, mouseover="alert("wow")", width=3" wide';
    var_export(preg_split('/(?>[^",]+|"[^"]*")*K, /', $S2));
    

    Output:

    array (
      0 => 'name=ABCD',
      1 => 'value=17.3',
      2 => 'autoID',
      3 => 'onclick="getinfo($this, 99);"',
      4 => 'height=6" tall',
      5 => 'mouseover="alert(\"wow\")"',
      6 => 'width=3" wide',
    )
    

    I wonder how this comma-space-delimited string of element attributes is being generated. I advise that you tackle that earlier problem rather than burning time on trying to grapple the current string format. Maybe you could scape the attributes in a different fashion, then pass that payload to php as a json string.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search