For splitting a string of HTML-sourcecode into it’s parts, I have written a method that takes the next desired part of the string, copies it into a list of strings and removes the occurence in the string afterwards.
For the removal, I am using text.replaceFirst(s, "")
.
Unfortunately, the provided HTML-Code contains regex meta chars like ‘|’, resulting in only a partial removal of the characters I have already copied.
I really don’t want to try to prefix all possibly problematic symbols with a backslash for escaping as it is tedious work and prone to mistakes.
Is there a possibility to replace only the first occurence of a string in another string without having to worry about such characters?
Example:
String input = "<title>text | more</title>";
String[] expected = ["<title>", "text | more", "</title>"];
I split for >, therefore get "<title>"
, add this to my lists of results and call input.replaceFirst("<title>", "");
, leaving my input as "text | more</title>"
I split for <, therefore get "text | more"
, add this to my lists of results and call input.replaceFirst("text | more", "")
, leaving my input as "| more</title>"
, but I would have wanted my input as "</title>"
.
2
Answers
you can use the Pattern.quote() method to escape all regex meta chars in a string
There’s no need for replacements. You can directly split on lookarounds for
<
and>
.However, it should be noted that regular expressions are not the best tool for parsing HTML. Consider using an XML parser instead.