skip to Main Content

I’m new to regex (and stackoverflow btw) and trying to extract “real” words out of this using R:

"nnnclonenstarnnnnnbrainnstarnnnnncalculatornstarnnnnnadding machinenstarnnnnnartificial intelligencenstar"

So i would like to match: clone, brain, calculator, adding machine, artificial intelligence.

I tried it with (?<=\n)(.*?)(?=\nstar) which seems to be close…but it still doesn’t give me what I want. I guess I don’t have to specify n but instead use some omit newline comment?

5

Answers


  1. (?<=\n)((?:(?!\n).)*?)(?=\nstar)
    

    Try this.See demo.

    https://regex101.com/r/vD5iH9/63

    .*? will capture everything including n.So use a lookahead to check if n is not being captured.

    Login or Signup to reply.
  2. Are you trying to pull out the words?

    strsplit(x,'n+')
    

    or match them?

    gsub('[a-zA-Z]+','HELLOWORLD',x)
    
    Login or Signup to reply.
  3. If you want to get a vector of words

    x <- "nnnclonenstarnnnnnbrainnstarnnnnncalculatornstarnnnnnadding machinenstarnnnnnartificial intelligencenstar"
    x <- gsub("n"," ",x)
    x <- unlist(strsplit(x," "))
    x <- x[x != ""]
    
    Login or Signup to reply.
  4. This does it with a relatively simple regular expression:

    library(gsubfn)
    
    strapplyc(x, "([^n]*).star", simplify = c)
    

    giving:

    [1] "clone"  "brain"  "calculator"  "adding machine"         
    [5] "artificial intelligence"
    

    Note: Here is a visualization of the regular expression:

    ([^n]*).star
    

    Regular expression visualization

    Debuggex Demo

    Login or Signup to reply.
  5. Just split on nstar or n and optionally remove leading characters to avoid the empty first string.

    strsplit(x, "(nstar|n)+")   # OR
    strsplit(gsub("^n*", "", x), "(n|star)+")[[1]]
    [1] "clone"                   "brain"                   "calculator"             
    [4] "adding machine"          "artificial intelligence"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search