skip to Main Content

I have some text:

const str = `This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.`

I’m trying to match all a tags, but my regular expression just returns the whole thing:

str.match(/<a href=".+ data-link-id="[0-9A-Z-a-z]{1,}">(.*?)</a>/)

What am I doing wrong here? I expect the result to be an array of two elements. Instead of (.*?), I’ve tried .+ and [A-Za-z0-9s]+, same result.

2

Answers


  1. Your current regex pattern has one slight bug in it, which is that it uses href=.+ as part of matching the anchor tag. The .+ is problematical because it is greedy, and will match across all anchors until the last one. If you instead use .+? it will behave as you want.

    var str = 'This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.';
    var matches = str.match(/<a href=".+? data-link-id="[0-9A-Z-a-z]{1,}">(.*?)</a>/g);
    console.log(matches);

    Note also that you should use the global /g flag with match() to get all matches.

    Login or Signup to reply.
  2. As a further suggestion, if you intend to solve exactly that scenario, it would be better dealing with an actual HTML parser instead of using regular expressions since html may have nested elements or special content embedded that wouldn’t be possible to obtain using regex.

    https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString

    The parseFromString() method of the DOMParser interface parses a
    string containing either HTML or XML, returning an HTMLDocument or an
    XMLDocument.

    const str = `This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.`;
    
    const parser = new DOMParser();
    const doc = parser.parseFromString(str, 'text/html');
    const anchors = doc.querySelectorAll('a');
    
    console.log(anchors);
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search