I’m trying to parse a string similar to the one below. This represents queries for a book. Multiple options are available to look for specific fields, so intitle: looks for something in the title of the book specifically. I have two problems.
- It’s not parsing out some of the terms in the third returned element such as inauthor and inpublisher – ‘champ inauthor:"john smith" inpublisher:"the book place" ‘ – this may have something to do with the double quotes in the string?
- How can I use the double quotes to make it a single term?
The string:
basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156
My attempt
let q: string = `basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156`;
console.log(q);
q = q.replaceAll(`: `, `:`);
console.log(q);
let all = q.split(
/(bintitle:b|binauthor:b|binpublisher:b|bsubject:b|bisbn:b|blccn:b|boclc:b)/,
);
console.log(all);
[
'basketball ',
'intitle:',
'champ inauthor:"john smith" inpublisher:"the book place" ',
'subject:',
'fiba ',
'isbn:',
'12345 ',
'lccn:',
'689778 ',
'oclc:',
'1234156'
]
2
Answers
As mentioned in comments
:b
will not match:"
as there is no word break after the colon.I would suggest using
matchAll
and explicitly match the part in quotes. For instance:You can use adaptive word boundaries here if the search words can start/end with different characters in them:
The regex will look like
where
(?!Bw)
will only require a word boundary if the characters at the start / end of the "word" are word characters.The
terms.map(x => x.replace(/[/-\^$*+?.()|[]{}]/g, '\$&')).join('|')
part escapes the searchterms
in case there are any special regex metacharacters.