Let’s say there are Arabic strings in javascript:
const resultText = 'اَلْيَوْمُ جَمِيلٌ وَالشَّمْسُ';
const searchText = 'والشم';
The searchText
variable is dynamic and might have a different value on runtime. So we need to write a regex that will replace the searchText
in the resultText
, but the problem is that along with having to write a regex with dynamic variable, also want to ignore certain characters when matching, and those characters to ignore are the diacritics in Arabic. So the end result would be after replacing the string as follows:
"اَلْيَوْمُ جَمِيلٌ <span class="highlighted">وَالشَّم</span>سُ"
so basically want to wrap the searchText
word with HTML span tag, but alongside want to ignore diacritics in order have a match for replacing, because the searchText
will be without diacritics and the resultText
will be with diacritics, if we first remove all the diacritics in resultText
, then we would easily have match, but want to keep the diacritics in resultText
and still match successfully so in order to do that will need to ignore the diacritics when matching searchText
inside it.
So far we have achieved to wrap the matched word in HTML but ignoring the diacritics is remaining:
const searchText = 'والشم';
const resultText = 'اَلْيَوْمُ جَمِيلٌ وَالشَّمْسُ';
const regex1 = new RegExp(this.searchText, 'gi');
const finalText = result.replace(regex1, '<span class="highlighted">$&</span>');
For a hint – the below regex is used to clear all the diacritics from a string:
'وَالشَّمْسُ'.normalize('NFD').replace(/([^u0621-u063Au0641-u064Au0660-u0669a-zA-Z 0-9])/g, '');
So all the diacritics characters are in the above regex pattern, so how can we use the above regex pattern or modify it to use it in the text replacing regex along with the dynamic variable, as described above.
2
Answers
To achieve your desired result of wrapping the matched word with HTML tags while ignoring certain characters (like the period in this case), you can modify your regular expression pattern to use a negative lookahead assertion. Here’s how you can do it:
Besides diacritics marks in this one of your samples (comment) there are even characters without such marks that have variations.
ا
(alef) is a different character toأ
(alef with hamza above). To match either you will need to identify all such characters that can occur and replace each occurance in yoursearchText
with a character class, for example replaceأ
with[اأإآ]
.To get started, I would do something like this (experimental, no experience with arabic text).
Note that I used
.{0}
to addp{M}*
between all characters. Ifsearch
often contains latin numbers and letters, you could target only the arabic chracters by(?=[ء-ي٠-٩])|(?<=[ء-ي٠-٩])
.Reference