I’m new to regex (and stackoverflow btw) and trying to extract “real” words out of this using R:
"nnnclonenstarnnnnnbrainnstarnnnnncalculatornstarnnnnnadding machinenstarnnnnnartificial intelligencenstar"
So i would like to match: clone, brain, calculator, adding machine, artificial intelligence.
I tried it with (?<=\n)(.*?)(?=\nstar)
which seems to be close…but it still doesn’t give me what I want. I guess I don’t have to specify n but instead use some omit newline comment?
5
Answers
Try this.See demo.
https://regex101.com/r/vD5iH9/63
.*?
will capture everything includingn
.So use alookahead
to check ifn
is not being captured.Are you trying to pull out the words?
or match them?
If you want to get a vector of words
This does it with a relatively simple regular expression:
giving:
Note: Here is a visualization of the regular expression:
Debuggex Demo
Just split on nstar or n and optionally remove leading characters to avoid the empty first string.