Using VS-Code I wish to use a regular expression based find/replace to remove all instances of non-standard text from a set of documents.
The title across the document set should be this:
<title>Alien Encounters 2023</title>
But, there are many pages where there is stray text, like:
<title>Alien Encounters 2023 - Ice Cream Van</title>
or
<title>Alien Encounters 2023 - Probed Again</title>
So, I’d like to match (and remove) any character between <title>Alien Encounters 2023
and </title>
.
But, these blocks should not match: <strong>Alien Encounters 2023 - Probed Again</strong>
(it’s the wrong element type). This one should also not match: <title>Alien Encounters 2022 - Probed Again</title>
(it’s the wrong leading text).
Any reg-exp experts out there able to help me?
3
Answers
After some strong help from @HaoWu I managed it.
Use regular expressions
in the search term(<title>Alien Encounters 2023[^<>-]*?)s*-[^<>]*
$1
So you want to remove text starting with " – " and ending with (but excluding) "<"?
The following should work for this simple case:
/ - [^<]*/
<
Yes, you can try this:
This regular expression will match any sequence of characters that starts with <title>, followed by one or more whitespace characters, then "Alien Encounters 2023", followed by one or more whitespace characters, and then any character between (), and finally ends with </title>. The "*" quantifier means that the whitespace characters can occur zero or more times, and the "?" quantifier means that the optional character group can occur zero or one time.