skip to Main Content

The string is not valid json so i don’t think there’s a simple solution that would allow me to use JSON.parse. Although i may be wrong.


Problem

I have a string of key value pairs and would like extract them using regex.

  • The keys are all known
  • The separator is a colon
  • The key may or may not be surrounded by single or double quotes. i.e key:value, 'key':value, "key":value
  • There may or may not be space between the key and the separator. i.e key:value, key :value
  • There may or may not be space between the separator and the value. i.e key:value, key: value
  • The value may or may not be surrounded by single or double quotes. i.e key:value, key:"value", key:'value'
  • The value may consist of multiline text. i.e
key: {
       val1: 1,
       val2: 2,
       val3: 3,
     }
key: [
       val1,
       val2,
       val3,
     ]
key: (arg1, arg2) => {
       return {
         arg1,
         arg2
       }
     }

Example

The string:

value1         :        true,
value2 : "something, something-else",
value3: [
  {
    a: 'a',
    b: true,
    c: 3
  }, {
    a: Thing,
    func: () => {
      return new Thing()
    }
  }
],
"value4": [1, 2, 3, 4],
'value5': "['a', 'b', 'c', 'd']",
value6: false

Ultimately i’d like to end up with a 2 dimensional array containing the key value pairs, but can handle that once the keys and values have been extracted using the regex.

The desired result:

 [
   ['value1', true],
   ['value2', 'something, something-else'],
   ['value3', "{
                 a: 'a',
                 b: true,
                 c: 3
               }, {
                 a: Thing,
                 func: () => {
                   return new Thing()
                 }
               }"],
   ['value4', "[1, 2, 3, 4]"],
   ['value5', "['a', 'b', 'c', 'd']"],
   ['value6', false]
 [

Attempted solution

This is what i’ve come up with so far:

(?<key>value1|value2|value3|value4|value5|value6)["'s]*?:s*(?<value>(?!value1|value2|value3|value4|value5).*)
  1. Use a named capture group to explicitly match the key to the left of the colon – taking into account the optional single or double quotes and whitespace either side
(?<key>value1|value2|value3|value4|value5|value6)["'s]*?:
  1. Use a negative lookahead to match the value up to the next key
s*(?<value>(?!value1|value2|value3|value4|value5).*)

But this doesn’t appear to be doing what i thought it was, as if you remove all the words and replace them with something arbitrary, the result is still the same

s*(?<value>(?!a).*)

I realise that this isn’t actually checking for a newline, but i’m not sure how to incorporate that?

Attempted solution on regex101

Nice to have

For the value, only extract what’s inside the optional single of double quotes, not the quotes or comma. i.e this something, something-else rather than 'something, something-else',

Note

The regex101 example is set to PCRE so that i can use the Regex debugger, but i’m looking for a solution that uses valid javascript regex.

2

Answers


  1. Is the order of the keys known? If so, you could try to slice the source string from one key to the next, and then removing the unwanted bits (spaces, line breaks, commas, quotes) from the start and end of each individual value:

    const str = `
      value1         :        true,
      value2 : "something, something-else",
      value3: [
        {
          a: 'a',
          b: true,
          c: 3
        }, {
          a: Thing,
          func: () => {
            return new Thing()
          }
        }
      ],
      "value4": [1, 2, 3, 4],
      'value5': "['a', 'b', 'c', 'd']",
      value6: false
    `
    
    function clean(dirtyValue) {
      return dirtyValue
        .replace(/^['"]?s*:s*/, '')
        .replace(/,?s*['"]?$/, '')
    }
    
    const keys = ['value1', 'value2', 'value3', 'value4', 'value5', 'value6']
    
    const parsed = keys.reduce((acc, key, i) => {
      const indexOfKey = str.indexOf(key);
      const indexOfNextKey = i < keys.length - 1 ? str.indexOf(keys[i + 1]) : str.length
      
      acc[key] = clean(str.slice(indexOfKey + key.length, indexOfNextKey))
    
      return acc;
    }, {})
    
    Object.entries(parsed).forEach(([key, value]) => console.log(key, '=', value))

    You can also adapt the example above to work with unsorted keys:

    const str = `
      value1         :        true,
      value2 : "something, something-else",
      value3: [
        {
          a: 'a',
          b: true,
          c: 3
        }, {
          a: Thing,
          func: () => {
            return new Thing()
          }
        }
      ],
      "value4": [1, 2, 3, 4],
      'value5': "['a', 'b', 'c', 'd']",
      value6: false
    `
    
    function clean(dirtyValue) {
      return dirtyValue
        .replace(/^['"]?s*:s*/, '')
        .replace(/,?s*['"]?$/, '')
    }
    
    function shuffleArray(arr) {
      const shuffledArray = arr.slice(0)
    
      for (let i = shuffledArray.length - 1; i > 0; i--) {
          const j = Math.floor(Math.random() * (i + 1));    
    
          [shuffledArray[i], shuffledArray[j]] = [shuffledArray[j], shuffledArray[i]]
      }
    
      return shuffledArray;
    }
    
    const keys = shuffleArray(['value1', 'value2', 'value3', 'value4', 'value5', 'value6'])
    
    const parsed = keys.reduce((acc, key) => {
      const indexOfKey = str.indexOf(key);
          
      const closestIndexOfNextKey = keys.map((possibleNextKey) => {
        const possibleNextKeyIndex = str.indexOf(possibleNextKey, indexOfKey + 1)
        
        return possibleNextKeyIndex <= 0 ? Infinity : possibleNextKeyIndex
      }).sort((a, b) => a - b)[0]
      
      acc[key] = clean(str.slice(indexOfKey + key.length, closestIndexOfNextKey))
    
      return acc;
    }, {})
    
    Object.entries(parsed).forEach(([key, value]) => console.log(key, '=', value))

    Note if you have many keys, you might want to optimize this code by removing the keys you already found from the keys array.

    Login or Signup to reply.
  2. The Data your getting is not JSON, but looks like it’s a Javascript Object instead, with the starting { and end } taken off.

    As such you could just parse this using eval, but be aware of issues of eval, IOW: make sure you trust the source.

    Your source has some functions like Thing() so would need stubbing, below I’ve used a little hack to trap the exceptions and add Thing automatically, it works on here for me on Chrome & Firefox, but parsing an error string just feels a little hacky, so just something to be aware off.

    I’ve noticed your result for boolean’s is not wrapping in strings, so I’ve done a test for that, because I’ve just used JSON to do the value part it’s not exactly the same as your output, but that could be replaced with a custom one that that gets closer to what your after.

    Anything more detailed than that, I would suggest using an AST to parse it, it might be possible with regex, but I feel like they could be some edge cases that will catch you out. If you do use an AST, don’t forget to add the { & } to make so it can parse.

    const src = `value1         :        true,
    value2 : "something, something-else",
    value3: [
      {
        a: 'a',
        b: true,
        c: 3
      }, {
        a: Thing,
        func: () => {
          return new Thing()
        }
      }
    ],
    "value4": [1, 2, 3, 4],
    'value5': "['a', 'b', 'c', 'd']",
    value6: false`;
    
    function keyValue(src) {
      const stubs = [];
      for (let l = 0; l < 100; l ++) {
        try {
          const stubsTxt = stubs.map(m=>`function ${m}(){}`).join(';');
          const p = eval(`${stubsTxt};({${src}})`);
          return Object.entries(p).map(([k, v]) => {
            return [k, typeof v === 'boolean' ? v : JSON.stringify(v)]
          });
        } catch (e) {
          const t = e.toString().split(' ');
          if (t[0] === 'ReferenceError:') {
            stubs.push(t[1]);
          } else break;
        }
      }
    }
    
    document.querySelector('pre').innerText = JSON.stringify(keyValue(src), null, '  ');
    <pre>
    </pre>
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search