This is my input string:
str = `👍🤌🤔😊😎😂😘👌😒😍❤️🤣`;
// str.length = 24
// str.split("") = [ "ud83d", "udc4d", /* 22 more elements */ ],
So, when I call Array.from(str)
, I expect this would be what happens internally:
arr = Array.from({
length: 24,
0: "ud83d", 1: "udc4d" /* ... and so on */
})
And arr
should be the same as str.split("")
:
["ud83d", "udc4d", /* 22 more elements */ ]
But the value of arr
is this:
// arr.length = 13 [ "👍", "🤌", "🤔", "😊", "😎", "😂", "😘", "👌", "😒", "😍", "❤", "️", "🤣" ]
For reference, this is equal to what we get if we call str.match(/[sS]/)gu
. Why?
const str = `👍🤌🤔😊😎😂😘👌😒😍❤️🤣`
const arr = Array.from(str)
console.log(arr)
2
Answers
JavaScript strings are stored as a sequence of UTF-16 code units.
Each character may consist of one or two code units (surrogate pairs for emojis or other complex characters).
String’s length reflects the total number of UTF-16 code units.
Array.from operates on the string’s iterator, which respects Unicode code points.
str.match(/./gu)
– Uses a regular expression with the u flag (Unicode mode) to match graphemes globally.Unicode-aware methods like Array.from and
str.match(/./gu)
are essential for accurately processing strings containing emojis, accented characters, or other complex symbols.Each of these emojis is made up of multiple parts. Suddenly, this package came to mind, which could help you quickly overcome the problem:
You encountered the same issue in your example. When you split the string into individual characters, you ended up breaking apart emojis that consist of multiple characters, like ❤️.
By using the grapheme-splitter, you can properly split the emojis to array: