skip to Main Content
$pedit =<<<head
 <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div> 
head;

  $pedit = preg_replace("/(<.*var="i_name_10".*>)(.*)(</.*?>s*)/","$1"."aaa"."$3",str_replace("> <","><",$pedit));

RESULT:

white gold
$5.99
aaa

Expected result is

aaa
$5.99

When I put a newline after newline after the "white gold" div or put a U after the pattern /U then it work as expected. But, it seem to be problematic or unclean with those solution if the string repeats and it is longer. Please help!

white gold replace with aaa instead of append aaa at the end.

Expected result is

aaa
$5.99

2

Answers


  1. Problem explanation

    By default, regular expression matching will be as greedy as possible. This means that the portion within the first capture group, .* will match more than you expect.

    For <.*var="i_name_10", this matches:

    <div class="item"><div var="i_name_10"
    

    We then have the remaining string:

    >white gold</div> <div var="i_price_10">$5.99</div></div>
    

    The next portion of the regex is then .*>, this can slurp up all of:

    >white gold</div> <div var="i_price_10">$5.99</div>
    

    because the next portion, (.*) can match nothing, since the quantifier * means 0 or more.

    We now have:

    </div>
    

    Which (</.*?>s*) matches.

    So our capture groups are

    Capture Result
    (<.*var="i_name_10".*>) <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div>
    (.*)
    (</.*?>s*) </div>

    Which is why when you replace with "$1"."aaa"."$3", you get:

    <!-- $1 -->
    <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div>
    <!-- /$1 -->
    aaa
    <!-- $3 -->
    </div>
    <!-- /$3 -->
    

    Solution

    But, it seem to be problematic or unclean with those solution if the string repeats and it is longer

    It is hard to discern what you mean here by "string repeats", which string repeats? The whole thing? Also for "and it is longer", what do you mean by "it" specifically?

    However, regardless of these caveats, it seems you are after replacing the the <div var="i_name_10">white gold</div> HTML. So you could use U flag you mentioned originally, or you could use some sort of "manual" ungreediness in the first capturing group, like:

    "/(<[^>]+var="i_name_10"[^>]*>)(.*)(</[^>]+?>s*)/"
    

    We use the expression [^>] to state that any character is fine apart from >, so that we match only inside the element and we don’t match "outside" it. We also use the 1 or more quantifier (+) instead of the 0 or more (*) since HTML requires the element name at the start of the element and in the closing tag. This reduces the chance of any surprises.

    Login or Signup to reply.
  2. Rather than writing a regular expression that you yourself won’t be able to read or modify after so much as a night’s sleep, let alone anyone else, you should use the tooling that is specifically designed to parse and modify the data you have.

    $pedit =<<<'head'
     <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div> 
    head;
    
    $d = new DOMDocument();
    $d->loadHTML($pedit, LIBXML_HTML_NOIMPLIED);
    
    // search for elements with the defined attribute, modify results
    $xpath = new DOMXPath($d);
    foreach($xpath->query('//div[@var="i_name_10"]') as $node)  {
        $node->nodeValue = 'aaa';
    }
    
    var_dump($d->saveHTML($d->documentElement));
    

    Output:

    string(88) "<div class="item"><div var="i_name_10">aaa</div> <div var="i_price_10">$5.99</div></div>"
    

    Ref: https://www.php.net/manual/en/book.dom.php

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search