Weird PHP preg_replace result

kimli
April 14, 2023
201 views
0 votes
2 Answers

$pedit =<<<head
 <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div> 
head;

  $pedit = preg_replace("/(<.*var="i_name_10".*>)(.*)(</.*?>s*)/","$1"."aaa"."$3",str_replace("> <","><",$pedit));

RESULT:

white gold
$5.99
aaa

Expected result is

aaa
$5.99

When I put a newline after newline after the "white gold" div or put a U after the pattern /U then it work as expected. But, it seem to be problematic or unclean with those solution if the string repeats and it is longer. Please help!

white gold replace with aaa instead of append aaa at the end.

Expected result is

aaa
$5.99

Tags: php preg-replace

Answers

- Wongjn
- April 14, 2023 at 11:03 pm
- 0 votes
0
Problem explanation

By default, regular expression matching will be as greedy as possible. This means that the portion within the first capture group, .* will match more than you expect.

For <.*var="i_name_10", this matches:
```
<div class="item"><div var="i_name_10"
```
We then have the remaining string:
```
>white gold</div> <div var="i_price_10">$5.99</div></div>
```
The next portion of the regex is then .*>, this can slurp up all of:
```
>white gold</div> <div var="i_price_10">$5.99</div>
```
because the next portion, (.*) can match nothing, since the quantifier * means 0 or more.

We now have:
```
</div>
```
Which (</.*?>s*) matches.

So our capture groups are

Capture Result

(<.*var="i_name_10".*>) <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div>

(.*)

(</.*?>s*) </div>

Which is why when you replace with "$1"."aaa"."$3", you get:
```

<div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div>

aaa

</div>

```
Solution

But, it seem to be problematic or unclean with those solution if the string repeats and it is longer

It is hard to discern what you mean here by "string repeats", which string repeats? The whole thing? Also for "and it is longer", what do you mean by "it" specifically?

However, regardless of these caveats, it seems you are after replacing the the <div var="i_name_10">white gold</div> HTML. So you could use U flag you mentioned originally, or you could use some sort of "manual" ungreediness in the first capturing group, like:
```
"/(<[^>]+var="i_name_10"[^>]*>)(.*)(</[^>]+?>s*)/"
```
We use the expression [^>] to state that any character is fine apart from >, so that we match only inside the element and we don’t match "outside" it. We also use the 1 or more quantifier (+) instead of the 0 or more (*) since HTML requires the element name at the start of the element and in the closing tag. This reduces the chance of any surprises.
Login or Signup to reply.

- Sammitch
- April 15, 2023 at 2:44 am
- 0 votes
0
Rather than writing a regular expression that you yourself won’t be able to read or modify after so much as a night’s sleep, let alone anyone else, you should use the tooling that is specifically designed to parse and modify the data you have.
```
$pedit =<<<'head'
 <div class="item"><div var="i_name_10">white gold</div> <div var="i_price_10">$5.99</div></div> 
head;

$d = new DOMDocument();
$d->loadHTML($pedit, LIBXML_HTML_NOIMPLIED);

// search for elements with the defined attribute, modify results
$xpath = new DOMXPath($d);
foreach($xpath->query('//div[@var="i_name_10"]') as $node)  {
    $node->nodeValue = 'aaa';
}

var_dump($d->saveHTML($d->documentElement));
```
Output:
```
string(88) "<div class="item"><div var="i_name_10">aaa</div> <div var="i_price_10">$5.99</div></div>"
```
Ref: https://www.php.net/manual/en/book.dom.php
Login or Signup to reply.