I have a text block as a paragraph that starts from some number as a string. There could be simple numbers or dotted ones. I have the next paragraph numbers:
1) sovereign control will prevail
1.1. These are the Rules
5.47.1 "Пункт зупинки тролейбуса".
I use the next regexp expression to find the number.
([0-9.,\-\s]+)\b")
Now, I got the new number format
source text
5.24.1 і 5.24.2 "Зміна напрямку руху на дорозі з розділювальною смугою"
I need to get the number 5.24.1_ 5.24.2
How can regex expression be written to use one all scenario?
2
Answers
Sometimes, the greater complexity of a code is compensated by a greater ductility and velocity, I know it is not what you are asking for, but…
I suggest matching any of the following patterns at the start of string only:
)
charThe regex will look like
See the regex demo.
Details:
(?U)
–Pattern.UNICODE_CHARACTER_CLASS
inline flag option (I see you may have Cyrillic letters in the expected matches, so it is required for thew
to match them)G
– either start of string or the end of the previous match (so we only allow to match consecutive matches from the start of the string)s*
– zero or more whitespaces(?:sіs+)?
– an optional sequence of a whitespace,і
and then one or more whitespaces(d+(?:.d+)+|w+))
– Group 1: eitherd+(?:.d+)+
– one or more digits and then one or more sequences of a.
+ one or more digits|
– orw+)
– one or more alphanumeric chars and then a)
char.See the Java testing code online:
Output: