Javascript - Regular expression isn't matching all hyperlinks in a string

MikeK
July 2, 2024
213 views
0 votes
2 Answers

I have some text:

const str = `This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.`

I’m trying to match all a tags, but my regular expression just returns the whole thing:

str.match(/<a href=".+ data-link-id="[0-9A-Z-a-z]{1,}">(.*?)</a>/)

What am I doing wrong here? I expect the result to be an array of two elements. Instead of (.*?), I’ve tried .+ and [A-Za-z0-9s]+, same result.

Tags: javascript regex

Answers

- TimBiegeleisen
- July 1, 2024 at 8:51 am
- 0 votes
0
Your current regex pattern has one slight bug in it, which is that it uses href=.+ as part of matching the anchor tag. The .+ is problematical because it is greedy, and will match across all anchors until the last one. If you instead use .+? it will behave as you want.
var str = 'This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.'; var matches = str.match(/<a href=".+? data-link-id="[0-9A-Z-a-z]{1,}">(.*?)</a>/g); console.log(matches);
Note also that you should use the global /g flag with match() to get all matches.
Login or Signup to reply.

- DiegoD
- July 1, 2024 at 8:56 am
- 0 votes
0
As a further suggestion, if you intend to solve exactly that scenario, it would be better dealing with an actual HTML parser instead of using regular expressions since html may have nested elements or special content embedded that wouldn’t be possible to obtain using regex.

https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString

The parseFromString() method of the DOMParser interface parses a
string containing either HTML or XML, returning an HTMLDocument or an
XMLDocument.
const str = `This <a href="https://regex101.com/" data-link-id="431ebea7-1426-65a5-8383-55a27313dc51">is a test link</a> which has a hyperlink, and <a href="https://regex102.com/" data-link-id="d62dc3eb-7b3d-953e-4d7a-987448e6928d">this is also</a> a hyperlink.`; const parser = new DOMParser(); const doc = parser.parseFromString(str, 'text/html'); const anchors = doc.querySelectorAll('a'); console.log(anchors);
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Regular expression isn't matching all hyperlinks in a string

Answers