I have some text:
const str = `This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.`
I’m trying to match all a
tags, but my regular expression just returns the whole thing:
str.match(/<a href=".+ data-link-id="[0-9A-Z-a-z]{1,}">(.*?)</a>/)
What am I doing wrong here? I expect the result to be an array of two elements. Instead of (.*?)
, I’ve tried .+
and [A-Za-z0-9s]+
, same result.
2
Answers
Your current regex pattern has one slight bug in it, which is that it uses
href=.+
as part of matching the anchor tag. The.+
is problematical because it is greedy, and will match across all anchors until the last one. If you instead use.+?
it will behave as you want.Note also that you should use the global
/g
flag withmatch()
to get all matches.As a further suggestion, if you intend to solve exactly that scenario, it would be better dealing with an actual HTML parser instead of using regular expressions since html may have nested elements or special content embedded that wouldn’t be possible to obtain using regex.
https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString