how can I modify this regular expression so that it distinguishes between latin and russian letters in links and emails - Reactjs

anonymous
February 16, 2023
233 views
1 vote
2 Answers

At the moment my program looks like this:

let final = text;
  const divElement = document.createElement('div');
  // eslint-disable-next-line max-len
  const linkRegExp = /b(((http(s)?://)([w-]{1,32}(.|:)[w-]{1,32}))|([w-]{1,32}(@)[w-]{1,32}(.)[w-]{1,32})|([w-]{1,32}(.)[A-Za-z]{1,32}))b/gi;

  function replacer(url) {
    if (url.match(/S+@S+.S+/ig)) {
      const email = document.createElement('a');
      email.innerHTML = url;
      email.href = `mailto:${url}`;
      email.setAttribute('class', 'email');

      return `${email.outerHTML}`;
    }
    const link = document.createElement('span');
    link.innerHTML = url;
    link.setAttribute('class', 'link');
    link.setAttribute('style', 'color: blue; cursor: pointer');

    return `${link.outerHTML}`;
  }

  final = final.replace(RegExp(linkRegExp), replacer);

Unfortunately, replacing [w-] with [wА-Яа-я-], [p{L}d_-] did not give positive results and adding u at the end of the expression causes an error

Answers

Chosen as BEST ANSWER
- anonymous_user
- February 17, 2023 at 8:13 am
- 0 votes
0
```
/(?<=^|W)((http|https)://)?(www.)?([A-Za-zА-Яа-я0-9]{1}[A-Za-zА-Яа-я0-9-@]*.?)*.{1}[A-Za-zА-Яа-я0-9-]{2,8}(/([w#!:.?+=&%@!-/])*)?(?=$|W)/gi
```
I present to your attention the ideal solution to this problem! thanks neural network!

(Edit)

- dnmeid
- February 16, 2023 at 11:07 pm
- 0 votes
0
Latin characters and Cyrillic characters (russian is using Cyrillic characters) are only two subsets of all characters.
First you should be precise if it is enough to check for let’s say Cyrillic or anything else, than it is only a two options differentiation.
Next you should decide how to procede with mixed text, i.e. Cyrillic and non Cyrillic in one text.
I guess for you it would be best to treat any text with at least one Cyrillic character as Cyrillic and anything else as Latin (though it could be e.g. Arabic as well).
Now there are different methods in javascript to check strings, but since you asked for a regular expression, here is how to match any Cyrillic character with regex:

const cyrillicRegex = /p{Script_Extensions=Cyrillic}/u

Of course you can use p{Script_Extensions=Cyrillic} also in character classes or with quantifiers. Regex script extensions are usable in all major browsers since around 2017.

Btw: the regex in your current program is using many unnecessary capturing groups. E.g. http(s)? can also be https?

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

how can I modify this regular expression so that it distinguishes between latin and russian letters in links and emails – Reactjs

Answers