skip to Main Content
let myString = "Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76;"

const myRegEx = / w+ "w* +" | ;w +; +/g  // This what i have figured but its not working :(

const splitedString = myString.split(myRegEx)

console.log(splitedString)

Epected Output: ["Hello", "How are you", "foo", "bar", abc, "Strings are cool", "d", "b", "s", "12gh-gh76"]

Letme try to explain more:

First off all split whole string on basis of a space " ", except string inside '' or ;;, like:
"Hello 'Yo what's up'" –> ["Hello", "Yo-what's-up"] (Notice Here’s extra ' in what's, so handle that too.)

Then if string is inside ;; then concat (i beleive thats the right name) it with - like:
Hello ;hi there; –> ["Hello", "hi-there"]

and in the end return a array of all the formatting done… as expected output.

3

Answers


  1. You can use matchAll instead of split, to find match content of quotes, pair of semicolons or separate words with regex ([';]).+?1|w+.

    And later remove wrapping and replace spaces where needed.

    const myRegEx = new RegExp(/([';]).+?1|w+/gm)
    
    const message = "Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76; ;a 'b c' d; 'a ;b c; d' d" // Try edit me
    
    const matches = Array.from(message.matchAll(myRegEx))
    
    const finalResult = matches.map(str => {
      const value = str.shift()
      if(value.match(/^;.*;$/))
        return value.substring(1, value.length-1).replaceAll(' ', '-')
      else if(value.match(/^'.*'$/))
        return value.substring(1, value.length-1)
      else
        return value
    })
    
    // Log to console
    console.log(finalResult)

    Notice, that this solution works on the assumption, that wrapper (quotes and semicolons) are not nested.

    If you need to account nested wrappers, regex is not best tool for the job, since you’ll need to check "parenthesis"-balance, and while it’s possible with regex, where easier ways to do that.

    Login or Signup to reply.
  2. You might capture the parts that you want to reformat and then after process them by checking for the capture group number:

    '([^']+(?:'[^s'][^']*)*)'|;([^;]+);|S+
    

    The pattern matches:

    • ' Match '
    • ( Capture group 1
      • [^']+' match 1+ chars other than ' followed by '
      • (?:'[^s'][^']*)* Optionally repeat a single non whitespace char other than ' followed by optional chars other than '
    • ) Close group
    • ' Match '
    • | Or
    • ;([^;]+); Match from ;...; and capture in group 2 what is inside
    • | Or
    • S+ Match 1+ whitspace chars

    Regex demo

    const regex = /'([^']+(?:'[^s'][^']*)*)'|;([^;]+);|S+/g;
    const s = `Hello  'Yo what's up'`;
    [
      `Hello 'Yo what's up'`,
      `Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76;`,
      `Hello ;hi there;`
    ].forEach(s =>
      console.log(
        Array.from(
          s.matchAll(regex), m => {
            if (m[1]) return m[1]
            else if (m[2]) return m[2].replace(/s+/g, "-");
            else return m[0];
          }
        )
      )
    );
    Login or Signup to reply.
  3. One needs at least a two folded approach

    • First one has to replace any semicolon delimited range by replacing each of its whitespace sequence(s) with a single dash which would look like …

      `Hello 'how\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
        .replace(/;([^;]*);/g, (match, capture) => capture.replace(/s+/g, '-'))
      

      … where the regex is … /;([^;]*);/g … and the result will be …

      "Hello 'how\'re you feeling' foo bar abc 'Strings are cool' d b s 12gh-gh76"
      
    • Secondly one needs to come up with a splitting regex which can handle both, splitting at any whitespace (sequence) but only if it is not part of a single quotes enclosed substring. The latter needs to be captured in order to be preserved while splitting. The above example code then continues to look like …

      `Hello 'how\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
        .replace(/;([^;]*);/g, (match, capture) => capture.replace(/s+/g, '-'))
        .split(/'(.*?(?<!\))'|s+/)
      

      … where the splitting regex is … /'(.*?(?<!\))'|s+/ … and the resulting array does contain a lot of empty values like empty string values and undefined values. Thus the split task needs to be accompanied by a reduce based cleanup task …

      `Hello 'how\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
        .replace(/;([^;]*);/g, (match, capture) => capture.replace(/s+/g, '-'))
        .split(/'(.*?(?<!\))'|s+/)
        .reduce((result, item) => item && result.concat(item) || result, [])
      

    The next provided example code just does proof the explanation of the above approach …

    const sampleString =
      `Hello 'how\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`;
    
    // see ... [https://regex101.com/r/ZShVPL/1]
    const regXSplitAlternation = /'(.*?(?<!\))'|s+/;
    
    // see ... [https://regex101.com/r/ZShVPL/2]
    const regXSemicolonRange = /;([^;]*);/g
    
    console.log(
      sampleString
        // first ... 
        // ... replace any semicolon delimited range by replacing 
        //     each of its whitespace sequence(s) with a single dash.
        .replace(regXSemicolonRange, (match, capture) => capture.replace(/s+/g, '-'))
    );
    console.log(
      sampleString
        .replace(regXSemicolonRange, (match, capture) => capture.replace(/s+/g, '-'))
        // second ...
        // ... split the intermediate replacement string at
        //      - either a single quoted character sequence (capturing it)
        //      - or a whitespace (sequence) (not capturing the latter).
        .split(regXSplitAlternation)
        // ... and third ... do omit any empty (undefined, empty string) item.
        .reduce((result, item) => item && result.concat(item) || result, [])
    );
    .as-console-wrapper { min-height: 100%!important; top: 0; }
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search