skip to Main Content

With the below function trying to display the special characters

function replaceSpecialChar(text) {
    return text.replace(/[^x20-x7EnxC0-xFFu00C0-u00FFu0152u0153u0178]+/g, '');
}

But following characters are not displayed as expected
Ÿ, Œ and œ

Tried by adding it’s individual ascii code in the regex but it is returning different values for each of those

For Ÿ it is returning value as x

Œ it is returning value as R

œ it is returning value as S

2

Answers


  1. The issue you’re experiencing might be due to the way JavaScript handles Unicode characters. The range you’re using in your regular expression (xC0-xFF) only covers the Latin-1 Supplement Unicode block, which includes characters from À to ÿ, but does not include characters like Œ, œ, and Ÿ because they are part of the Latin Extended-A block.

    To include these characters, you would need to extend your range to cover the necessary Unicode blocks. However, JavaScript’s handling of Unicode can be a bit tricky, especially when dealing with characters outside of the Basic Multilingual Plane (BMP).

    Here’s an updated version of your function that should handle the characters you mentioned:

    function replaceSpecialChar(text) {
        return text.normalize('NFD').replace(/[u0300-u036f]/g, "").replace(/[^a-zA-Z0-9s]/g, "");
    }
    

    This function first normalizes the input text to its decomposed form (using the ‘NFD’ form), where combined characters like é are split into their base character e and the combining accent mark. Then it removes all combining accent marks (the u0300-u036f range covers all combining diacritical marks), and finally removes all non-alphanumeric characters.

    Please note that this function will remove all special characters, not just the ones you mentioned. If you want to keep certain special characters, you will need to adjust the regular expression in the last replace call accordingly.

    Also, be aware that JavaScript’s handling of Unicode can vary between environments, so you may need to adjust this code depending on where you’re running it.

    Login or Signup to reply.
  2. Use unicode property on your regular expression.

    It will:

    • Treat any Unicode code point escapes as such instead of identity escapes.
    • Interpret surrogate pairs as single character
    • When lastIndex is automatically advanced (such as when calling exec()), unicode regexes advance by Unicode code points instead of UTF-16 code units.

    So your resulting regexp would look like this:

    /[^x20-x7EnxC0-xFFu00C0-u00FFu0152u0153u0178]+/gu
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search