I’ve got many of the following type of thing: <span class="c">a+2da + 2d</span>
The first part of the span’s content is part of a math expression a+2d
(without spaces) and the second part has the same text, but with spaces around the operator: a + 2d
. I need to be able to capture the a+2d
so I can remove it.
Some examples of the expressions I’ve got:
x-y=3x - y = 3
, a+b-c=da + b - c = d
, x-y=3x - y = 3
and it could involve underscores and brackets, like a_n=a+(n−1)da_n = a + (n - 1)d
I can find the first half (only) of the simpler examples using the regex:
/<span class="c">((w*[+|-|=]w*)*)</span>/
and the second half (only) with spaces using
/<span class="c">((w* [+|-|=] w*)*)</span>/
But I have no idea how to match the 2nd half given the first, or vice-versa. None of the look ahead or look behind examples I came across fit the situation, and became complicated very quickly. TIA.
3
Answers
Only for that expression. We first match an operator with its preceding content
(w*)([+-=])
And then use
2 3
to search for the same content ahead but containing a space before the operator.The first half is captured in the 1st group, and the second half is captured in the 4th group.
https://regex101.com/r/AW9Ba5/1
Since each token in the second half is separated by a space, you can identify beginning of the second half by simply matching the first token, which is repeated before the first space or, if it’s the only token,
</span>
:and replace the match with an empty string to remove it.
Demo: https://regex101.com/r/pmfB4f/1
Try matching:
and replacing with nothing.
See: regex101 (borrowed from blhsing)
Explanation
MATCH:
<span class="c">
: Ensure formula is preceeded by (i.e. is inside of) a span tagK
: and disregard this.(?<first>[^-+=s]+)
: capturefirst
argument(?: ... )?
: and optionally(?<operation>[-+=])
: the succeedingoperation
S*
: as well as the rest of the formula.(?= ... )
: Now validate(?P=first)
: that the "spacy" equation begins with thefirst
argument(?(operation) ... )
: and if anoperation
was founds+(?P=operation)
: the sameoperation
preceeded by a space comes after it.