skip to Main Content

I have a text block as a paragraph that starts from some number as a string. There could be simple numbers or dotted ones. I have the next paragraph numbers:

1) sovereign control will prevail
1.1. These are the Rules
5.47.1 "Пункт зупинки тролейбуса".

I use the next regexp expression to find the number.

([0-9.,\-\s]+)\b")

Now, I got the new number format
source text

 5.24.1 і 5.24.2 "Зміна напрямку руху на дорозі з розділювальною смугою"

I need to get the number 5.24.1_ 5.24.2

How can regex expression be written to use one all scenario?

2

Answers


  1. Sometimes, the greater complexity of a code is compensated by a greater ductility and velocity, I know it is not what you are asking for, but…

    List<String> getNumbers( String input ) {
       boolean findNumber = true;
       String aux[] = input.split( "" );
       String aux2 = "";
       List<String> out = new ArrayList<>();
       for( int k = 0; k < aux.length; k ++ ) {
          if( findNumber ) {
             if( aux[ k ].matches( "[0-9]" ) ) {
                aux2 += aux[ k ];
                findNumber = false;
             }
          }
          else {
             if( aux[ k ].matches( "[0-9]" )
                     || aux[ k ].matches( "\." ) ) {
                aux2 += aux[ k ];
             }
             else {
                out.add( aux2 );
                aux2 = "";
                findNumber = true;
             }
          }
       }
       return out;
    }
    
    Login or Signup to reply.
  2. I suggest matching any of the following patterns at the start of string only:

    • alphanumeric characters followed with a ) char
    • dot-separated digit sequences
    • dot-separated digit sequences separated with conjunctions (you may then extend this with more conjunctions as need be).

    The regex will look like

    String regex = "(?U)\G\s*(?:\sі\s+)?(\d+(?:\.\d+)+|\w+\))";
    

    See the regex demo.

    Details:

    • (?U)Pattern.UNICODE_CHARACTER_CLASS inline flag option (I see you may have Cyrillic letters in the expected matches, so it is required for the w to match them)
    • G – either start of string or the end of the previous match (so we only allow to match consecutive matches from the start of the string)
    • s* – zero or more whitespaces
    • (?:sіs+)? – an optional sequence of a whitespace, і and then one or more whitespaces
    • (d+(?:.d+)+|w+)) – Group 1: either
      • d+(?:.d+)+ – one or more digits and then one or more sequences of a . + one or more digits
      • | – or
      • w+) – one or more alphanumeric chars and then a ) char.

    See the Java testing code online:

    import java.util.*;
    import java.util.regex.*;
    
    class Ideone
    {
        public static void main (String[] args) throws java.lang.Exception
        {
            String regex = "(?U)\G\s*(?:\sі\s+)?(\d+(?:\.\d+)+|\w+\))";
            Pattern pattern = Pattern.compile(regex);
            List<String> strs = Arrays.asList(
                "а) driver licance",
                "1) sovereign control will prevail",
                "1.1. These are the Rules",
                "5.47.1 "Пункт зупинки тролейбуса".",
                "5.24.1 і 5.24.2 "Зміна напрямку руху на дорозі з розділювальною смугою"");
            for (String str : strs) {
                System.out.println("Input: "" + str + """);
                Matcher matcher = pattern.matcher(str);
                while (matcher.find()){
                    System.out.println(matcher.group(1)); 
                } 
                System.out.println("-- END OF STRING PROCESSING --");
            }
        }
    }
    

    Output:

    Input: "а) driver licance"
    а)
    -- END OF STRING PROCESSING --
    Input: "1) sovereign control will prevail"
    1)
    -- END OF STRING PROCESSING --
    Input: "1.1. These are the Rules"
    1.1
    -- END OF STRING PROCESSING --
    Input: "5.47.1 "Пункт зупинки тролейбуса"."
    5.47.1
    -- END OF STRING PROCESSING --
    Input: "5.24.1 і 5.24.2 "Зміна напрямку руху на дорозі з розділювальною смугою""
    5.24.1
    5.24.2
    -- END OF STRING PROCESSING --
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search