skip to Main Content

I have the following function to get all of the different types of end-of-line delimiters in a file. There may be one or more, so I want to return an array of all types.

function ddtt_get_file_eol( $file_contents, $incl_code = true ) {
    $types = [
        'rn',
        'n',
        'r'
    ];
    $found = [];
    foreach ( $types as $type ) {
        if ( $type == 'rn' ) {
            $regex = "/rn/";
        } elseif ( $type == 'n' ) {
            $regex = "/(?<!r)n/";
        } else {
            $regex = "/r(?!n)/";
        }
        if ( preg_match( $regex, $file_contents ) ) {
            $found[] = ( $incl_code ) ? '<code class="hl">'.$type.'</code>' : $type;
        }
    }
    return $found;
} // End ddtt_get_php_eol()

The problem I am having is that it is recognizing rn as two separate types and outputting [ 'n', 'r' ]. I want to output [ 'rn' ] if it is just using that type, or [ 'rn', 'n' ] if using both types, etc. How do I modify my code to correctly fetch all types used?

2

Answers


  1. In my opinion, your code is fine. It’s just your input which is a mixture.

    <?php
    
    $LINE_TYPES = [
        '\r\n' => '/\r\n/',
        '\n' => '/(?<!\r)\n/',
        '\r' => '/\r(?!\n)/',
    ];
    
    $inputs = [
        'Windows' => "DogrnCatrnMouse",
        'Linux' => "BicyclenCarnTrainnAirplane",
        'Mac' => "iPhoneriPodrMacBook",
        'Win + Linux' => "int main() {n   return 0;rn}n",
        'All mixed up' => "This is a Windows new linern, followed by a Linux new linen and finally an old Mac with a single carriage returnrat the end",
    ];
    
    foreach ($inputs as $label => $input) {
        $found_types = [];
        foreach ($LINE_TYPES as $type => $regex) {
            if (preg_match($regex, $input)) {
                $found_types[] = $type;
            }
        }
        print "Found types for $label is " . implode(', ', $found_types) . PHP_EOL;
    }
    
    

    Outputs the following:

    Found types for Windows is rn
    Found types for Linux is n
    Found types for Mac is r
    Found types for Win + Linux is rn, n
    Found types for All mixed up is rn, n, r
    

    which seems completely normal.

    You can test/play with it here: https://onlinephp.io/c/4ce47

    Login or Signup to reply.
  2. Let me guess, you are a developer who wants perfect identification of newline sequences regardless of the environment AND you want to keep all of your hair?

    PHP has had a solution for this for a long time and it doesn’t involve Minoxidil; just use R. I’ll replace each newline sequence with an asterisk to show how it reliably respects all possible newline sequences across all environments and treats them as whole newline sequences whenever appropriate.

    Code: (Demo)

    $inputs = [
        'Windows' => "DogrnCatrnMouse",
        'Linux' => "BicyclenCarnTrainnAirplane",
        'Mac' => "iPhoneriPodrMacBook",
        'Win + Linux' => "int main() {n   return 0;rn}n",
        'All mixed up' => "This is a Windows new linern, followed by a Linux new linen and finally an old Mac with a single carriage returnrat the end",
    ];
    
    var_export(
        preg_replace('/R/', '*', $inputs)
    );
    

    Output:

    array (
      'Windows' => 'Dog*Cat*Mouse',
      'Linux' => 'Bicycle*Car*Train*Airplane',
      'Mac' => 'iPhone*iPod*MacBook',
      'Win + Linux' => 'int main() {*   return 0;*}*',
      'All mixed up' => 'This is a Windows new line*, followed by a Linux new line* and finally an old Mac with a single carriage return*at the end',
    )
    

    Relevant reading on implementations of R:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search