skip to Main Content

I have this string

PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC

and I want to split it up into groups like this

PENNER,JANET E TR

PENNER,MICHAEL G TR

SOURCE LLC

LARRY & FREDDY INC

I’m using javascript (node) with matchAll
and this is my attempt so far

/s*([^&]+) ((L.?L.?C.?)|(T.?R.?)|(I.?N.?C.?)|(C.?O.?)|(REV LIV))(?=-)?/g

The hard part is that some business names include ampersands (&) so how would I do this?

4

Answers


  1. A Positive Lookbehind (?<=% )& that matches a percentage operator and space followed by an ampersand operator and space => % & , then you can use the same logic with an OR regex operator | using the the literal string LLC and space followed by an ampersand space => LLC & . Then split the string using this regex => str.split(regex).

    const regex = / (?<=% )& | (?<=LLC )& /g;
    

    It is very straightforward and will not likely accommodate all potential occurrences of names in your string, though you can always add more logic if needed to the regex using the same positive look behind => ?<=

    const str = 'PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC & SOMEBUSINESS,JOHN T TR - 75%';
    // just add your positive lookbehind to accommodate new 'endings' in your business naming scheme
    // I add a positive look behind for INC with a space followed by an ampersand and space
    // provided this rule is always constant this approach should work
    const regex = / (?<=% )& | (?<=LLC )& | (?<=INC )& /g; 
    const results = str.split(regex);
    
    console.log(results);

    A dynamic version:

    const str = 'PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC & SOMEBUSINESS,JOHN T TR - 75%';
    
    // add an array with the matches for the possible endings
    // this should work provided you use the same logic of ending followed by a space + positive lookbehind of & with a space 
    const matches = ['%', 'LLC', 'INC']
    
    // function to take in the array of endings
    const dynamicRegex = (arr) => {
      // empty array to hold result
      let result = []
      // push each end into the result and build the regex string
      arr.forEach(end => result.push(` (?<=${end} )& `));
      // return regex string
      return result;
    }
    
    // join the result of the looped endings into a string
    const joined = `${dynamicRegex(matches).join('|')}`;
    // create a new regexp pattern and format 
    const reg = new RegExp(joined, 'g')
    // split the string using the regex and return an array of the desired output
    const results = str.split(reg);
    
    console.log(results)
    Login or Signup to reply.
  2. Judging by your description, maybe this regex could work for you:

    (?!s|[^A-Z]*&).+? (?:L.?L.?C.?|T.?R.?|I.?N.?C.?|C.?O.?|REV LIV)
    
    const text = 'PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC';
    
    const regex = /(?!s|[^A-Z]*&).+? (?:L.?L.?C.?|T.?R.?|I.?N.?C.?|C.?O.?|REV LIV)/g;
    
    const companies = text.match(regex);
    
    console.log(companies);
    Login or Signup to reply.
  3. If a positive lookbehind is supported for your environment, you could split using:

    (?:s*-s*d+%|(?<=bLLC))s*&s*
    

    The pattern matches:

    • (?: Non capture group for the 2 alternatives:
      • s*-s* Match a hyphen between optional whitespace chars
      • d+% Match 1+ digits followed by a percentage sign
      • | Or
      • (?<=bLLC) Positive lookbehind, assert LLC directly to the left
    • ) Close non capture group
    • s*&s* Match an ampersand between optional whitespace chars

    Regex demo

    const s = `PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC`;
    const regex = /(?:s*-s*d+%|(?<=bLLC))s*&s*/;
    console.log(s.split(regex));
    Login or Signup to reply.
  4. From the limited information upon which criteria on has to split a string, the most generic, yet precise enough, regex one can come up with, might look similar to

    /s*-.*?s+&s+|(?<=INC|LLC)s+&s+/g
    

    … and that …

    • /s*-.*?s+&s+/g … either does split at a minus char which is preceded by an optional whitespace sequence and followed by any sequence that does not contain an ampersand char until up to the next occurring whitespace-ampersand-whitespace sequence.

    • /(?<=INC|LLC)s+&s+/g … or which splits at any whitespace-ampersand-whitespace sequence which is preceded by either and LLC or an INC abbreviation where the latter pattern does utilize a positive lookbehind.

    const sampleText =
      'PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC';
    
    const regXSplit =
      // see ... [https://regex101.com/r/xM9elf/1]
      /s*-.*?s+&s+|(?<=INC|LLC)s+&s+/g;
    
    console.log(
      sampleText.split(regXSplit)
    );
    .as-console-wrapper { min-height: 100%!important; top: 0; }
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search