skip to Main Content

I have a list of strings

time:1001 name:foo avg:5.7
time:1002 
time:1003 avg:1.2
time:1004 name:f 
time:1005 name:bar avg:2.1

I want to pick out the strings which match time:value name:value avg:value exactly and extract

[time, 1001], [name, foo], [avg, 5.7]
[time, 1005], [name, bar], [avg, 2.1]

using regex: /(w+):s*(?:"([^"]*)"|(S+))/g

code:

const regex = /(w+):s*(?:"([^"]*)"|(S+))/g;
let match = regex.exec(line);

I get all the lines,

[time, 1001], [name, foo], [avg, 5.7]
[time:1002], [name:undefined], [avg, undefined]
[time:1003], [name:undefined], [avg:1.2]
[time:1004], [name:f], [avg, undefined]
[time, 1005], [name, bar], [avg, 2.1]

how do I select the ones with all key values present

3

Answers


  1. You can create your regexp dynamically to use capture groups:

    const str = `time:1001 name:foo avg:5.7
    time:1002 
    time:1003 avg:1.2
    time:1004 name:f 
    time:1005 name:bar avg:2.1`;
    
    const result = [];
    
    const regex = new RegExp('(\w+):([^\s]+)(?:[^\n]|$)'.repeat(3), 'g');
    let m;
    while (m = regex.exec(str)) {
        result.push(m.slice(1).reduce(
            (arr, item, idx) => (arr[Math.floor(idx / 2)] ??=[]).push(item) && arr
            , []
        ));
    }
    
    console.log(JSON.stringify(result));

    Or use {3} to tell that you want exactly 3 key:value pairs in a
    line and parse a line manually:

    const str = `time:1001 name:foo avg:5.7
    time:1002 
    time:1003 avg:1.2
    time:1004 name:f 
    time:1005 name:bar avg:2.1`;
    
    const result = str.match(/(w+:[^s]+[^n]*){3}/g).map(line => 
      line.split(/s+/g).map(item => item.split(':'))
    );
    
    console.log(JSON.stringify(result));
    Login or Signup to reply.
  2. Make your regexp match the keys exactly, instead of using w+ to match anything.

    const lines = [
      'time:1001 name:foo avg:5.7 ',
      'time:1002 ',
      'time:1003 avg:1.2 ',
      'time:1004 name:f  ',
      'time:1005 name:bar avg:2.1 ',
    ];
    
    const regex = /time:(d+)s+name:(w+)s+avg:([d.]+)/;
    
    result = [];
    lines.forEach(line => {
      m = line.match(regex);
      if (m) {
        result.push([
          ['time', m[1]],
          ['name', m[2]],
          ['avg', m[3]]
        ]);
      }
    });
    
    console.log(result);
    Login or Signup to reply.
  3. One possible approach, which consumes the data-blob entirely as is …

    time:1001 name:foo avg:5.7
    time:1002 
    time:1003 avg:1.2
    time:1004 name:f 
    time:1005 name:bar avg:2.1
    

    … could be based on a globally flagged regex like

    /times*:s*(?<time>S+)s*names*:s*(?<name>S+)s*avgs*:s*(?<avg>S+)/g
    

    … which utilizes named capturing groups and gets applied via matchAll.

    Since the resulting iterator’s array-representation features just the valid matches, one can map said array, where with each iteration step one does render a valid part of the final result via e.g. a template literal. The mapped array then just needs to be joined with a newline / 'n'.

    The next provided executable example code generates exactly the result the OP was asking for, thus proving all of the above said/explained …

    const sampleData =
    `time:1001 name:foo avg:5.7
    time:1002 
    time:1003 avg:1.2
    time:1004 name:f 
    time:1005 name:bar avg:2.1`;
    
    const regXCapture =
      // see ... [https://regex101.com/r/hoiUFH/1]
      /times*:s*(?<time>S+)s*names*:s*(?<name>S+)s*avgs*:s*(?<avg>S+)/g;
    
    console.log(
      [...sampleData.matchAll(regXCapture)]
        .map(({ groups: { time, name, avg } }) =>
          `[time, ${ time }], [name, ${ name }], [avg, ${ avg }]`
        )
        .join('n')
    );
    .as-console-wrapper { min-height: 100%!important; top: 0; }
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search