skip to Main Content

With a string like {float: 'null', another: 'foo'}, I’d like to grab each set of key/values pairs so that the groups would output float null, and another and foo.
My current regex is /{(?<set>(?<key>w*)s*:s*(?<value>.*)?s?)*}/g
It grabs the key correctly, but anything past from the comma on receives it as the value. I’m using named groups mainly just for clarity. Can’t figure out how to extract each key/value pair especially when there are multiple.
Thanks for any help

Currently am trying /{(?<set>(?<key>w*)s*:s*(?<value>.*)?s?)*}/g but the output is:

the group ‘set’: float: 'null', another: 'foo' (correct)

the group ‘key’: float (correct)

the group ‘value’: 'null', another: 'foo' (incorrect, I want just null)

Would like it to capture all key/value pairs if possible


Edit for more clarity:

My specific use case is for parsing Markdown and plugging it into custom components in Svelte, where I want to control the ability to gather props from the markdown syntax on an image. From what I’ve gathered online about putting attributes on an image, it should look something like:

![Alt Text]https://<fullurl>.jpg "This is hover text"){prop1: 'foo', prop2: 'bar', float: true}

Reason for regex is parsing the markdown string. It’s not JSON, and I dont really gain anything by following JSON semantics ("‘s on the key)

2

Answers


  1. Have a go with this long JavaScript regex:

    /(?<key>w*)s*:s*(?<value>(?<quote>["'])(?:(?=(?<backslash>\?))k<backslash>.)*?k<quote>|(?<number>[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g
    

    In action (view in full page, if not it’s not all visible):

    const regexKeyValue = /(?<key>w*)s*:s*(?<value>(?<quote>["'])(?:(?=(?<backslash>\?))k<backslash>.)*?k<quote>|(?<number>[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g;
    
    document.getElementById('search').addEventListener('click', function () {
      const input = document.getElementById('input').value;
    
      let match,
          i = 1,
          output = [];
    
      while ((match = regexKeyValue.exec(input)) !== null) {
        console.log(`Match n°${i} : ` + match[0]);
        console.log('match.groups =', match.groups);
    
        // If the value is starting with quotes, then unquoted it and
        // also replace all the escape sequences (ex: "\n" should become "n").
        let value = match.groups.value;
        // If it's double quotes, let's use JSON.parse() as it will handle everything.
        if (value.match(/^"/)) {
          value = JSON.parse(value);
        }
        // If it's simple quotes, we can't use JSON.parse() so we have to trick it a bit.
        else if (value.match(/^'/)) {
          // 1) Remove the simple quotes around.
          // 2) Replace all ' by '.
          // 3) Escape all double quotes (" becomes ").
          // We have to search for all backslashes to handle also an escaped backslash.
          value = value
            .replace(/^'|'$/g, '')
            .replace(/\(.)/g, function (fullMatch, afterBackslash) {
              if (afterBackslash === "'") {
                return "'";
              } else {
                return fullMatch;
              }
            }).
            replace(/"/g, '\"');
          console.log(`"${value}"`);
          // Now use JSON.parse();
          value = JSON.parse(`"${value}"`);
        }
        
        // If it's a number or a constant, then convert the string to this real JS value.
        if (typeof match.groups.number !== 'undefined' ||
            typeof match.groups.constant !== 'undefined') {
          value = eval(match.groups.value);
        }
    
        output.push(
          `Match n°${i++} :n` +
          `  Key   : ${match.groups.key}n` +
          `  Value : ${value}n`
        );
      }
    
      document.getElementById('output').innerText = output.join("n");
      document.getElementById('label').classList.remove('hidden');
    });
    textarea {
      box-sizing: border-box;
      width: 100%;
    }
    
    pre {
      overflow-y: scroll;
    }
    
    .hidden {
      display: none;
    }
    <textarea id="input" rows="10">{
      float: 'null',
      another: "foo",
      age: 45,
      type: '"simple" ' quote',
      comment: "Hello,nA backslash \, a tab t and a "dummy" word.nOk?",
      important: true,
      weight: 69.7,
      negative: -2.5
    }</textarea>
    
    <button id="search">Search for key-value pairs</button>
    
    <p id="label" class="hidden">Matches:</p>
    <pre><code id="output"></code></pre>

    The same regular expression, with comments, with the x flag
    that PCRE offers:

    /
    (?<key>w*)
    s*:s*
    (?<value>
      # A string value, single or double-quoted:
      (?<quote>["'])
        (?:(?=(?<backslash>\?))k<backslash>.)*?
      k<quote>
    |
      # Int and float numbers:
      (?<number>[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?)
    |
      # true, false and null (or other constants):
      (?<constant>true | false | null)
    )
    /gx
    

    Or better, on regex101, you’ll have the colours and the explanation
    on the right column: https://regex101.com/r/bBPvUd/1

    Login or Signup to reply.
  2. As mentioned in the comments, eval() is considered as "evil" or at least as unsafe. I have forgotten exactly why, something to do with cross-site-scripting. However, if it is used within a "safe" environment, i. e. for preprocessing of input that you have full control over, then it might be admissible nonetheless.

    const md=`Some text and now the image: 
    ![Alt Text]https://<fullurl>.jpg "This is hover text"){prop1: 'foo', prop2: 'bar', float: true} 
    and some more text.
    
    A new paragraph any yet nother picture
    ![Alt Text2]https://<fullerURL>.jpg "This is another hover text"){prop1: 'fool', prop2: 'bart', float: false} and this is the end.`;
    
    function unsafeParse(s){
     return s.match(/{[^}]+}/g).map(t=>eval(`(${t})`));
    }
    
    // ouputs an array of all image property objects:
    console.log(unsafeParse(md));

    Apart from being "unsafe", the above is not completely fail-safe, as property values containing the "}" character will cause problems …

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search