skip to Main Content

I’m working on a regex to match phrases in a HTML string. For example, I want to find every instance of “artificial intelligence” and return the <span> tag that immediately precedes it.

The trouble I have is that the my regex only returns one large match.

Here is a link to an online regex builder I’ve been using: https://regex101.com/r/rK9yO9/1

I am looking to return the following two matches:

<span m='3'>
<span m='13'>

Example string:

<p><span m='2'>of</span> <span m='3'>artificial</span> 
<span m='4'>intelligence.</span><span m='4'>So</span> 
<span m='5'>that</span> <span m='6'>seems</span> 
<span m='9'>good.</span> <span m='10'>The</span> 
<span m='11'>impact</span> <span m='12'>of</span> 
<span m='13'>artificial</span> <span m='14'>intelligence,</span> 
<span m='15'>on</span> </p>

N.b there are no newlines in the text, I added those for readability.

The regex I have so far is:

(<span.*>)artificial.?</span>.?<span.*>intelligence.?</span>

Which returns the following match:

<span m='2'>of</span> <span m='3'>artificial</span> 
<span m='4'>intelligence.</span><span m='4'>So</span> 
<span m='5'>that</span> <span m='6'>seems</span> 
<span m='9'>good.</span> <span m='10'>The</span> 
<span m='11'>impact</span> <span m='12'>of</span> 
<span m='13'>artificial</span> <span m='14'>intelligence,</span>

2

Answers


  1. You are using greedy regex. To make matching stop at first occurrence use ?

    (<span.*?>)artificial.?</span>.?<span.*?>intelligence.?</span>
    

    will match

    '<span m='2'>of</span> <span m='3'>artificial</span> <span m='4'>intelligence.</span>'
    

    you can easily get the first group matched

    Login or Signup to reply.
  2. Try this regex:

     /(<span[^<]+?>(?:artificial|intelligenc.)</span>)/gm
    

    See DEMO

    It should match only selected tags

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search