I have giant string (markdown) that contains something like this:
## Header 1
{~1.0} Lorem ipsum dolor sit amet. Sed congue diam
turpis, {~2.0} vitae congue erat accumsan nec. {~3.0}{~4.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~5.0}
vitae congue erat accumsan nec. {~6.0}{~7.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~8.0}
vitae congue erat accumsan nec. {~9.0}## Header 2
{~10.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~11.0}
vitae congue erat accumsan nec. {~12.0}{~113.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~14.0}
vitae congue erat accumsan nec. {~15.0}{~16.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~17.0}
vitae congue erat accumsan nec. {~18.0}## Header 3
{~19.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~20.0}
vitae congue erat accumsan nec. {~21.0}{~22.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~23.0}
vitae congue erat accumsan nec. {~24.0}{~25.0} Lorem ipsum dolor sit amet. Sed congue diam turpis, {~26.0}
vitae congue erat accumsan nec. {~27.0}
This is a marker {~x.x}
And I will call "section" to the combination of a header and one more more paragraphs.
I need to match the first and the last marker of every section.
Currently I’m using this regex /s?{([^}]*(~d*(?:.d+)?)[^}]*)}s?/g
in javascript that I got from the selected answer of this question to capture all the markers, but now I need to modify it to capture only the first and the last ones from every ‘section’.
The string comes from user input so I cannot know in advance how many paragraphs a ‘section’ will have neither the content of the headers, all that I know is that there will be at least one section (meaning one header followed by x amount of paragraphs).
3
Answers
This is possible with lookarounds, which JS supports.
Since we’re reusing the original pattern a lot, let’s store it in a variable:
A string that doesn’t contain the pattern above looks like this, where
[^]
denotes "all character", similar to a.
with thes
flag:From that, we construct our lookahead and lookbehind:
Here’s how our final steps go:
Try it:
You can achieve the result you want by matching the marker using a tempered greedy token to ensure there is either no marker between
##
and this one, or no marker or##
between this one and the next##
or end-of-string:This matches either:
##
: literal##
(?:(?!{~d+.d+}).)*
: some number of characters, where the character is not the start of a marker expression ({d+.d+}
){~
: literal{~
(d+.d+)
: the marker number, captured in group 1or:
{~
: literal{~
(d+.d+)
: the marker number, captured in group 2(?:(?!{~d+.d+}|##).)*
: some number of characters, where the character is not the start of a marker expression ({d+.d+}
) or a header (##
)(?=##|$)
: lookahead to assert that the next match is either the start of a header or end-of-stringDemo on regex101
In JavaScript:
This is my variant, less regexp:y than most others perhaps, but it works: