skip to Main Content

I’m trying to parse a string similar to the one below. This represents queries for a book. Multiple options are available to look for specific fields, so intitle: looks for something in the title of the book specifically. I have two problems.

  1. It’s not parsing out some of the terms in the third returned element such as inauthor and inpublisher – ‘champ inauthor:"john smith" inpublisher:"the book place" ‘ – this may have something to do with the double quotes in the string?
  2. How can I use the double quotes to make it a single term?

JSFiddle example

The string:

basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156

My attempt

let q: string = `basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156`;
console.log(q);
q = q.replaceAll(`: `, `:`);
console.log(q);
let all = q.split(
  /(bintitle:b|binauthor:b|binpublisher:b|bsubject:b|bisbn:b|blccn:b|boclc:b)/,
);
console.log(all);
[
  'basketball ',
  'intitle:',
  'champ inauthor:"john smith" inpublisher:"the book place" ',
  'subject:',
  'fiba ',
  'isbn:',
  '12345 ',
  'lccn:',
  '689778 ',
  'oclc:',
  '1234156'
]

2

Answers


  1. As mentioned in comments :b will not match :" as there is no word break after the colon.

    I would suggest using matchAll and explicitly match the part in quotes. For instance:

    const q = `basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156`;
    
    const matches = q.matchAll(/s*(?:(w+):s*)?(?:"([^"]+)"|(S+))/g);
    const obj = Object.fromEntries(
        Array.from(matches, ([, key, val1, val2]) => [key ?? "__main", val1 ?? val2])
    );
    console.log(obj);
    Login or Signup to reply.
  2. You can use adaptive word boundaries here if the search words can start/end with different characters in them:

    let q: string = `basketball intitle:champ inauthor:"john smith" inpublisher:"the book place" subject: fiba isbn: 12345 lccn: 689778 oclc: 1234156`;
    q = q.replaceAll(`: `, `:`);
    
    let terms = ['intitle:', 'inauthor:', 'inpublisher:', 'subject:', 'isbn:', 'lccn:', 'oclc:'];
    let regex: RegExp = new RegExp(String.raw`(?!Bw)(${terms.map(x => x.replace(/[/-\^$*+?.()|[]{}]/g, '\$&')).join('|')})(?!Bw)`);
    
    let all = q.split(regex);
    console.log(all);

    The regex will look like

    /(?!Bw)(intitle:|inauthor:|inpublisher:|subject:|isbn:|lccn:|oclc:)(?!Bw)/
    

    where (?!Bw) will only require a word boundary if the characters at the start / end of the "word" are word characters.

    The terms.map(x => x.replace(/[/-\^$*+?.()|[]{}]/g, '\$&')).join('|') part escapes the search terms in case there are any special regex metacharacters.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search