skip to Main Content

For splitting a string of HTML-sourcecode into it’s parts, I have written a method that takes the next desired part of the string, copies it into a list of strings and removes the occurence in the string afterwards.
For the removal, I am using text.replaceFirst(s, "").

Unfortunately, the provided HTML-Code contains regex meta chars like ‘|’, resulting in only a partial removal of the characters I have already copied.

I really don’t want to try to prefix all possibly problematic symbols with a backslash for escaping as it is tedious work and prone to mistakes.

Is there a possibility to replace only the first occurence of a string in another string without having to worry about such characters?

Example:

String input = "<title>text | more</title>";
String[] expected = ["<title>", "text | more", "</title>"];

I split for >, therefore get "<title>", add this to my lists of results and call input.replaceFirst("<title>", "");, leaving my input as "text | more</title>"

I split for <, therefore get "text | more", add this to my lists of results and call input.replaceFirst("text | more", ""), leaving my input as "| more</title>", but I would have wanted my input as "</title>".

2

Answers


  1. you can use the Pattern.quote() method to escape all regex meta chars in a string

    String input = "<title>text | more</title>";
    String[] expected = {"<title>", "text | more", "</title>"};
    
    String[] parts = input.split(Pattern.quote(">"));
    
    for (String part : parts) {
        input = input.replaceFirst(Pattern.quote(part), "");
    }
    
    System.out.println(Arrays.toString(parts)); 
    
    Login or Signup to reply.
  2. There’s no need for replacements. You can directly split on lookarounds for < and >.

    String[] res = input.split("(?<=>)|(?=<)");
    // [<title>, text | more, </title>]
    

    However, it should be noted that regular expressions are not the best tool for parsing HTML. Consider using an XML parser instead.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search