skip to Main Content

Using VS-Code I wish to use a regular expression based find/replace to remove all instances of non-standard text from a set of documents.

The title across the document set should be this:
<title>Alien Encounters 2023</title>

But, there are many pages where there is stray text, like:
<title>Alien Encounters 2023 - Ice Cream Van</title>
or
<title>Alien Encounters 2023 - Probed Again</title>

So, I’d like to match (and remove) any character between <title>Alien Encounters 2023 and </title>.

But, these blocks should not match: <strong>Alien Encounters 2023 - Probed Again</strong> (it’s the wrong element type). This one should also not match: <title>Alien Encounters 2022 - Probed Again</title> (it’s the wrong leading text).

Any reg-exp experts out there able to help me?

3

Answers


  1. Chosen as BEST ANSWER

    After some strong help from @HaoWu I managed it.

    1. Set VS-code to Use regular expressions in the search term
    2. Search for this: (<title>Alien Encounters 2023[^<>-]*?)s*-[^<>]*
    3. Replace with $1

  2. So you want to remove text starting with " – " and ending with (but excluding) "<"?

    The following should work for this simple case: / - [^<]*/

    • Space, hyphen, space
    • Followed by 0 or more characters that are not <
    Login or Signup to reply.
  3. Yes, you can try this:

    <title>s*Alien Encounters 2023s*(.*?)</title>
    

    This regular expression will match any sequence of characters that starts with <title>, followed by one or more whitespace characters, then "Alien Encounters 2023", followed by one or more whitespace characters, and then any character between (), and finally ends with </title>. The "*" quantifier means that the whitespace characters can occur zero or more times, and the "?" quantifier means that the optional character group can occur zero or one time.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search