skip to Main Content

I’m trying to remove accents and special caracteres except dash(-), underline(_) and preserve the extension the string for exemple:

ÁÉÍÓÚáéíóúâêîôûàèìòùÇãç.,~!@#$%&_-12345.png

to:

AEIOUaeiouaeiouaeiouCac_-12345.png

I came to this result but the problem is that it’s ignoring all dots. I need to ignore only last occurence to preserve the extension from filename.

"ÁÉÍÓÚáéíóúâêîôûàèìòùÇãç.,~!@#$%&-12345.png".normalize(‘NFD’).replace(/[^a-zA-Z0-9-]/g, "")

I already tried negative look behind like this:

/[^a-zA-Z0-9-]+(?<!.)/g

using this reference but I didn’t have success.

"ÁÉÍÓÚáéíóúâêîôûàèìòùÇãç.,~!@#$%&-12.34.5.png".normalize('NFD').replace(/[^a-zA-Z0-9-]+(?<!.)/g, '')

If i have more than a dot in this case it only removes the first .

3

Answers


  1. Instead of checking every charecter for being a file extension, select the whole extension at once

    let nfd = "ÁÉÍÓÚáéíóúâêîôûàèìòùÇãç.,~!@#$%&-12.34.5.png".normalize('NFD')
    
    let exttest = nfd.replace(/(?<extension>.[^.]+$|)(?<badchar>.?)/g, '$1')
    console.log({ exttest }) // { "exttest": ".png" }
    
    let result = nfd.replace(/(?<extension>.[^.]+$|)(?<badchar>[^a-zA-Z0-9-]?)/g, '$1')
    console.log({ result }) // { "result": "AEIOUaeiouaeiouaeiouCac-12345.png" }
    

    ((?<name>blah) is just a named (blah) group, it’s named just for explaining)

    Login or Signup to reply.
  2. A negative lookahead that excludes any . followed by any letter or number which in turn is followed by a non-word character can work.

    /[^a-zA-Z0-9-._]|.(?![a-zA-Z0-9]+b)/g
    

    An alternative to [^a-zA-Z0-9-._] is [^w.-].

    RegEx101

    Explanation

    Segment Description
    [^a-zA-Z0-9-._]
    Exclude any letter, number, hyphen, underscore, and dot
    |
    OR
    .(?![a-zA-Z0-9]+b)
    Exclude any dot that is NOT followed by one or more letters and/or numbers followed by a non-word character

    Example

    const rgx = /[^a-zA-Z0-9-._]+|.(?![a-zA-Z0-9]+b)/g;
    
    const str = `ÁÉÍÓÚáéíóúâêîôûàèìòùÇãç.,~!@#$%&_-12345.png`;
    
    const file = str.replace(rgx, "");
    
    console.log(file);
    Login or Signup to reply.
  3. In your pattern you forgot to exclude the _ as you want to keep that in the result.

    You are using a negative lookbehind that asserts that from the current position, there is not a dot directly to the left.

    The negated character class [^a-zA-Z0-9-]+ can match a dot, but the lookbehind (?<!.) fails if it did match a dot, so it will never match any dot.

    What you could do, is to match this character class [^a-zA-Z0-9-]+, but assert that from the current position to the right there is a dot, followed by 1+ chars except a dot or a whitespace char till the end of the string using a positive lookahead (?=.*.[^s.]+$)

    const result = "ÁÉÍÓÚáéí...óúâêîôûàèìòùÇãç.,~!@#$%&_-12345.png"
      .normalize('NFD')
      .replace(/[^a-zA-Z0-9_-]+(?=.*.[^s.]+$)/g, '');
    console.log(result);
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search