The string is not valid json so i don’t think there’s a simple solution that would allow me to use JSON.parse
. Although i may be wrong.
Problem
I have a string of key value pairs and would like extract them using regex.
- The keys are all known
- The separator is a colon
- The key may or may not be surrounded by single or double quotes. i.e
key:value
,'key':value
,"key":value
- There may or may not be space between the key and the separator. i.e
key:value
,key :value
- There may or may not be space between the separator and the value. i.e
key:value
,key: value
- The value may or may not be surrounded by single or double quotes. i.e
key:value
,key:"value"
,key:'value'
- The value may consist of multiline text. i.e
key: {
val1: 1,
val2: 2,
val3: 3,
}
key: [
val1,
val2,
val3,
]
key: (arg1, arg2) => {
return {
arg1,
arg2
}
}
Example
The string:
value1 : true,
value2 : "something, something-else",
value3: [
{
a: 'a',
b: true,
c: 3
}, {
a: Thing,
func: () => {
return new Thing()
}
}
],
"value4": [1, 2, 3, 4],
'value5': "['a', 'b', 'c', 'd']",
value6: false
Ultimately i’d like to end up with a 2 dimensional array containing the key value pairs, but can handle that once the keys and values have been extracted using the regex.
The desired result:
[
['value1', true],
['value2', 'something, something-else'],
['value3', "{
a: 'a',
b: true,
c: 3
}, {
a: Thing,
func: () => {
return new Thing()
}
}"],
['value4', "[1, 2, 3, 4]"],
['value5', "['a', 'b', 'c', 'd']"],
['value6', false]
[
Attempted solution
This is what i’ve come up with so far:
(?<key>value1|value2|value3|value4|value5|value6)["'s]*?:s*(?<value>(?!value1|value2|value3|value4|value5).*)
- Use a named capture group to explicitly match the key to the left of the colon – taking into account the optional single or double quotes and whitespace either side
(?<key>value1|value2|value3|value4|value5|value6)["'s]*?:
- Use a negative lookahead to match the value up to the next key
s*(?<value>(?!value1|value2|value3|value4|value5).*)
But this doesn’t appear to be doing what i thought it was, as if you remove all the words and replace them with something arbitrary, the result is still the same
s*(?<value>(?!a).*)
I realise that this isn’t actually checking for a newline, but i’m not sure how to incorporate that?
Attempted solution on regex101
Nice to have
For the value, only extract what’s inside the optional single of double quotes, not the quotes or comma. i.e this something, something-else
rather than 'something, something-else',
Note
The regex101 example is set to PCRE so that i can use the Regex debugger, but i’m looking for a solution that uses valid javascript regex.
2
Answers
Is the order of the keys known? If so, you could try to slice the source string from one key to the next, and then removing the unwanted bits (spaces, line breaks, commas, quotes) from the start and end of each individual value:
You can also adapt the example above to work with unsorted keys:
Note if you have many keys, you might want to optimize this code by removing the keys you already found from the keys array.
The Data your getting is not JSON, but looks like it’s a Javascript Object instead, with the starting
{
and end}
taken off.As such you could just parse this using
eval
, but be aware of issues ofeval
, IOW: make sure you trust the source.Your source has some functions like
Thing()
so would need stubbing, below I’ve used a little hack to trap the exceptions and addThing
automatically, it works on here for me on Chrome & Firefox, but parsing an error string just feels a little hacky, so just something to be aware off.I’ve noticed your result for boolean’s is not wrapping in strings, so I’ve done a test for that, because I’ve just used JSON to do the value part it’s not exactly the same as your output, but that could be replaced with a custom one that that gets closer to what your after.
Anything more detailed than that, I would suggest using an AST to parse it, it might be possible with
regex
, but I feel like they could be some edge cases that will catch you out. If you do use an AST, don’t forget to add the{
&}
to make so it can parse.