I have this string
PENNER,JANET E TR-50% & PENNER,MICHAEL G TR - 50% & SOURCE LLC & LARRY & FREDDY INC
and I want to split it up into groups like this
PENNER,JANET E TR
PENNER,MICHAEL G TR
SOURCE LLC
LARRY & FREDDY INC
I’m using javascript (node) with matchAll
and this is my attempt so far
/s*([^&]+) ((L.?L.?C.?)|(T.?R.?)|(I.?N.?C.?)|(C.?O.?)|(REV LIV))(?=-)?/g
The hard part is that some business names include ampersands (&) so how would I do this?
4
Answers
A Positive Lookbehind
(?<=% )&
that matches a percentage operator and space followed by an ampersand operator and space =>% &
, then you can use the same logic with an OR regex operator|
using the the literal string LLC and space followed by an ampersand space =>LLC &
. Thensplit
the string using this regex =>str.split(regex)
.It is very straightforward and will not likely accommodate all potential occurrences of names in your string, though you can always add more logic if needed to the regex using the same positive look behind =>
?<=
A dynamic version:
Judging by your description, maybe this regex could work for you:
If a positive lookbehind is supported for your environment, you could split using:
The pattern matches:
(?:
Non capture group for the 2 alternatives:s*-s*
Match a hyphen between optional whitespace charsd+%
Match 1+ digits followed by a percentage sign|
Or(?<=bLLC)
Positive lookbehind, assertLLC
directly to the left)
Close non capture groups*&s*
Match an ampersand between optional whitespace charsRegex demo
From the limited information upon which criteria on has to split a string, the most generic, yet precise enough, regex one can come up with, might look similar to …
… and that …
/s*-.*?s+&s+/g
… either does split at a minus char which is preceded by an optional whitespace sequence and followed by any sequence that does not contain an ampersand char until up to the next occurring whitespace-ampersand-whitespace sequence./(?<=INC|LLC)s+&s+/g
… or which splits at any whitespace-ampersand-whitespace sequence which is preceded by either andLLC
or anINC
abbreviation where the latter pattern does utilize a positive lookbehind.