skip to Main Content

I have this text stored in a variable called description:

`This is a code update`



*Official Name:

*Pub:

*Agency:

*Reference: https://docs.google.com/document/d/1FFTgytIIcMYnCCgp2cKuUWIwdz7MFolLzCci_-OQn9c/edit#heading=h.81ay6ysgrxtb
https://docs.google.com/document/d/1FFTgytIIcMYnCCgp2cKuUWIwdz7MFolLzCci_-OQn9c/edit#heading=h.81ay6ysgrxtb

*Citation: rg



*Draft Doc Title:

*Draft Source Doc:

*Draft Drive:



*Final Doc Title:

*Final Source Doc:

*Final Drive:
    
*Effective Date:

Using my code below, it returns an array with two elements:

//3. Extract Reference    
var reference = description.search("Reference");
if(reference != -1){        
    reference = description.match(/(?<=^*s*References*:)[s]*[n]*.*?(?=n*)/ms);   
    reference  = reference?.[0].trim();  
    reference = reference.split(/[rn]+/);        
}else{
    reference = '';
}
console.log('Reference:');
console.log(reference);

Output:

["https://docs.google.com/document/d/1FFTgytIIcMYnCCgp2cKuUWIwdz7MFolLzCci_-OQn9c/edit#heading=h.81ay6ysgrxtb","https://docs.google.com/document/d/1FFTgytIIcMYnCCgp2cKuUWIwdz7MFolLzCci_-OQn9c/edit#heading=h.81ay6ysgrxtb"]

However, when I change my description text to :

`This is a code update`   


*Official Name:

*Pub:

*Agency:

*Reference: 

*Citation: rg



*Draft Doc Title:

*Draft Source Doc:

*Draft Drive:



*Final Doc Title:

*Final Source Doc:

*Final Drive:



*Effective Date:

The code returns *Citation: rg. It should return an empty string. Where did I go wrong ? Thanks.

2

Answers


  1. Problems

    Here’s a trick used by many programmers as their first step in debugging: they explain their code to a rubber duck.

    /                           
      (?<=^*s*References*:)  # Match after '* Reference:'
      [s]*[n]*                # 0 or more whitespaces followed by 0 or more line breaks
      .*?                       # 0 or more characters, including line breaks, lazily
      (?=n*)                  # until we meet a line break followed by '*'
    /ms                         # with '^' stands for start of line and '.' for all characters.
    

    Which matches:

    /
      '*Reference:'
      'n'
      '*Citation: rgnn'
      'n*'               # Followed by 'Draft Doc Title'
    /
    

    Also, [s]*[n]* is effectively the same as s*, as ‘n’ is a subset of ‘s’.

    Solution

    /
      (?<=^*s*References*:)  # Match after '* Reference:'
      (?:                       # a non-capturing group, consists of
        .                       # a character
        (?!^*)                 # which is not followed by a '*' at the start of a line
      )*                        # 0 or more times
    /ms
    

    Try it on regex101.com

    Login or Signup to reply.
  2. You could update the pattern from my previous answer and match at least a single non whitespace char after *Reference: on the same line.

    Using JavaScript:

    (?<=^*s*Reference[^Sn]*:[^Sn]*)S[^]*?(?=^s**)
    

    Explanation

    • (?<= Positive lookbehind, assert that to the left is
      • ^*s*Reference Match *Reference at the start of the string
      • [^Sn]*:[^Sn]* Match : between optional spaces without newlines
    • ) Close the lookbehind
    • S Match a non whitespace character
    • [^]*? Match any character including newlines, as few as possible
    • (?=^s**) Positive lookahead, assert the start of the string, match optional whitespace chars and then *

    See a match here and no match here.

    If you want to return an empty string instead of no match, you can omit matching a non whitespace character:

    (?<=^*s*Reference[^Sn]*:)[^]*?(?=^s**)
    

    See a match here and matching a space here.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search