skip to Main Content

I’m working with CSV files and need to check newline character.

This function works fine:

function detectNewlineCharacter(csvContent)
{
  if (csvContent.includes('rn')) {
    return 'rn'; // Windows-style CRLF
  } else if (csvContent.includes('n')) {
    return 'n'; // Unix-style LF
  } else if (csvContent.includes('r')) {
    return 'r'; // Old Mac-style CR
  } else {
    return null; // No recognizable newline characters found
  }
}

function fixCsv()
{
  // ...code...
  const newlineCharacter = detectNewlineCharacter(fileContent);
  const rows = fileContent.split(newlineCharacter);
  // ...code...

}

Problem:

csvContent is very large. I need a method that stops immediately at the first found, just like .some() for array. ChatGPT said .includes() stops at the first found, but I’m no sure, can’t find that in the documentation. So does .match(), .indexOf(), etc.

My last resort would be to limit the string using .substring() before searching.

3

Answers


  1. The specification of String.includes() includes the following step:

    Let index be StringIndexOf(S, searchStr, start).

    When you follow the link to the description of the StringIndexOf() abstract operation, it’s described as stepping through the string sequentually, returning the index as soon as it finds a match.

    So it’s expected to stop at the first match.

    However, since this is an abstract specification, any implementation that returns the same result would fit. But realistically there’s not likely to be any sane algorithm that works otherwise. If I think outside the box, I can imagine a parallel implemenation that distributes the search to multiple processors, each searching different parts of the string. But this gets complicated because of matches that straddle segment boundaries, and at the end it needs to find the result from the earliest segment.

    Login or Signup to reply.
  2. Here is a function that stops after finding the first newline of any type, using a regular expression:

    function detectNewlineCharacter(csvContent)
    {
      return /rn|n|r/.exec(csvContent)?.[0] ?? null;
    }
    

    Test cases:

    detectNewlineCharacter('') === null
    detectNewlineCharacter('ABCrnDEF') === 'rn'
    detectNewlineCharacter('AnBrnC') === 'n'
    detectNewlineCharacter('ABrCDnEFrn') === 'r'
    
    Login or Signup to reply.
  3. Well, if you are unwilling to trust system routines (no judgement implied), then you will need to write your own code. Perhaps it would look like this:

    function detectNewlineCharacter(s) {
      for (let i = 0; i < s.length; i++) {
        if (s[i] == 'n') return 'n';
        if (s[i] == 'r') {
          if (i + 1 < s.length && s[i+1] == 'n') return 'rn';
          return 'r';
        }
      }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search