skip to Main Content

I am trying to parse 1 line that is constructed in this format:

Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)

I have this working perfectly in C# using named capture groups, but this is PHP and strictly on topic. So I have no idea how to separate each field and build a associative array I can iterate.

I can retrieve the first item in double-quotes "textfile1.txt" using

$string = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
preg_match("/(?:(?:"(?:\\"|[^"])+")|(?:'(?:\'|[^'])+'))/is", $string, $match);
print_r($match);
Array
(
    [0] => 'textfile1.txt'
)

I cant figure it out. I have tried different expressions to consider both the string/long fields but no luck.

Is there something I am missing?

End result is having each filename/size added to a array to access later.

Any help is appreciated

https://regex101.com/r/naSdng/1

My C# implementation looks like this:

MatchCollection result = Regex.Matches(file, @"(?:G(?!A)s*,s*|w+()(?:""(?<filename>.*?)""|'(?<filename>.*?)')s*,s*(?<filesize>d+)");
matchCol = result;
foreach (Match match in result)
{
    ListViewItem ItemArray = new(new string[] {
        match.Groups["filename"].Value.Trim(), BytesToReadableString(Convert.ToInt64(match.Groups["filesize"].Value)), "Ready"
    });
    fileList.Items.Add(ItemArray);
}

2

Answers


  1. Convert the input string into a valid json string and decode it to ensure that the numeric values are cast as integers. Chunk the flat array into pairs and assign each pair as an associative element in to the result array.

    Code: (Demo)

    var_export(
        array_reduce(
            array_chunk(
                json_decode('[' . substr($string, 6, -1) . ']'),
                2
            ),
            function ($result, $row) {
                $result[$row[0]] = $row[1];
                return $result;
            }
        )
    );
    

    or split the inner text on every second comma-space and parse the comma-separated strings with sscanf().

    Code: (Demo)

    var_export(
        array_reduce(
            preg_split('/[^,]+,[^,]+K, /', substr($string, 6, -1)),
            function ($result, $string) {
                [$key, $result[$key]] = sscanf($string, '"%[^"]", %d');
                return $result;
            }
        )
    );
    

    or use preg_match_all() with the G (continue metacharacter) then pair up the results in a foreach() so that you can explicitly cast the numbers as int-type values.

    Code: (Demo)

    $result = [];
    preg_match_all('/(?:^w+(|G, )"([^"]+)", (d+)/', $string, $matches, PREG_SET_ORDER);
    foreach ($matches as [1 => $key, 2 => $val]) {
        $result[$key] = (int) $val;
    }
    var_export($result);
    

    or iterate over each individual value after exploding the content inside of the parentheses. Then toggle the usage of the given string to determine keys and values.

    Code: (Demo)

    $result = [];
    foreach (explode(', ', substr($string, 6, -1)) as $val) {
        if (!isset($key)) {
            $key = trim($val, '"');
        } else {
            $result[$key] = (int) $val;
            unset($key);
        }
    }
    var_export($result);
    
    Login or Signup to reply.
  2. The regex you have shown in C# can be easily adapted to work in PHP as well.

    You may use:

    (?:w+(h*|(?<!A)Gh*,h*)"(?<filename>[^"]+)"h*,h*(?<filesize>d+)
    

    Note that I have refactored your regex a bit to make it more efficient.

    RegEx Demo

    Code Demo

    Code:

    <?php
    $s = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
    
    if (preg_match_all('/(?:w+(h*|(?<!A)Gh*,h*)"(?<filename>[^"]+)"h*,h*(?<filesize>d+)/', $s, $m)) {
       $out = array_combine ( $m['filename'], $m['filesize'] );
       print_r($out);
    }
    ?>
    

    Output:

    Array
    (
        [textfile1.txt] => 7268474425
        [textfile2.txt] => 661204928
        [textfile3.txt] => 121034
    )
    

    RegEx Details:

    • (?:: Start a non-capture group
      • w+(h*: Match 1+ word characters followed by ( and 0 or more whitespaces
      • |: OR
      • (?<!A)G: Start matching from end of the previous match
      • h*,h*: Match comma surrounded with 0 or more whitespaces
    • ): End non-capture group
    • "(?<filename>[^"]+)": Match double quoted string with named capture group filename to match 1+ of any char that is not a "
    • h*,h*: Match comma surrounded with 0 or more whitespaces
    • (?<filesize>d+): Named capture group filesize to match 1+ digits
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search