I have a string:
const str = 'a string, a long string'
I want to break it down into words (no problem here) and then track the index of each word within the original string.
Actual result:
[
{ word: 'a', idx: 0 },
{ word: 'string', idx: 2 },
{ word: 'a', idx: 0 },
{ word: 'long', idx: 12 },
{ word: 'string', idx: 2 }
]
Desired result:
[
{ word: 'a', idx: 0 },
{ word: 'string', idx: 2 },
{ word: 'a', idx: 10 },
{ word: 'long', idx: 12 },
{ word: 'string', idx: 17 }
]
Code so far:
const str = 'a string, a long string'
const segmenter = new Intl.Segmenter([], { granularity: 'word' })
const getWords = str => {
const segments = segmenter.segment(str)
return [...segments]
.filter(s => s.isWordLike)
.map(s => s.segment)
}
const words = getWords(str)
const result = words.map(word => ({
word,
idx: str.indexOf(word)
}))
console.log(result)
3
Answers
I decomposed your string into an array of object containing the word and the word index.
If you want the ponctuation as a word you could use a regex.
The objects you’re iterating over, which contain the
segment
and whether or not itisWordLike
, also have theindex
:Here’s the type definition:
Maybe an idea to use
String.matchAll
to retrieve words and indexes.Something like: