skip to Main Content

I need to extract multiple pieces of information from a returned collection of strings. The strings consist of a number of "fields", for example:

{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.10:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-xvz7v", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.11:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-w8cqq", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.12:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-4wfhq", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.13:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-zfskj", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.15:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-cllnv", service="prometheus-prometheus-node-exporter"}

In this case, I want to extract the label and value of the fstype, instance and mountpoint fields. So for the example provided, I’d end up with something like the following:

fstype="ext4" instance="172.20.32.10:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.11:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.12:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.13:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.15:9100" mountpoint="/"

Can this be done with a regular expression? I’ve used REs for a long time, but never anything this complex. If the answer to that question is yes, how do I do it?

4

Answers


  1. This can easily be done with groups and string replace. See this regex:

    .*(fstype="[^"]*").*?(instance="[^"]*").*?(mountpoint="[^"]*").*
    

    with this replacement:

    $1 $2 $3
    

    Explanation:
    We create three groups that match the keys for which you want to have the values. E.g. group 1: (fstype="[^"]*"). The groups can then be referred to by $1 – $n. In order for the resulting string to only contain the groups, we need to make the regex match the whole string. Thus the regex starts and ends with .*. Further explanations:

    • .* – matches 0 to infinity non line-breaking character
    • .*? – same as above with the difference that it matches ‘lazy’. Meaning the regex tries to create the smallest possible match.
    • [^"]* – matches 0 to infinity characters, that are not ".
    • (...) – creates a group with the content resulting from the regex defined inside of the group.

    Small example with javascript:

    let input = `
    {container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.10:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-xvz7v", service="prometheus-prometheus-node-exporter"}
    {container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.11:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-w8cqq", service="prometheus-prometheus-node-exporter"}
    {container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.12:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-4wfhq", service="prometheus-prometheus-node-exporter"}
    {container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.13:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-zfskj", service="prometheus-prometheus-node-exporter"}
    {container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.15:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-cllnv", service="prometheus-prometheus-node-exporter"}
    `;
    
    const regex = /.*(fstype="[^"]*").*?(instance="[^"]*").*?(mountpoint="[^"]*").*/g;
    
    const newStr = input.replaceAll(regex, "$1 $2 $3");
    console.log(newStr);
      

    Notes:

    • you need to use the replaceAll(...) function. Otherwise, only the first match of the regex will be replaced. That leads to a minor adaptation of the regex. It needs the ‘global’ flag, which gets activated in the example by the g at the end of the regex.

    Remarks

    Depending on the complete task you need to accomplish it might be more reasonable to use a library for the data structure you’re given. From your example I suspect it to be JSON5. Depending on which programming language you’re using, there are many approaches with different libraries. If you are just using an editor (like VS Code or Notepad++) you can just use the regex I provided in the integrated search function.

    Login or Signup to reply.
  2. To extract the desired fields from the collection of strings using regular expressions, you can follow these steps:

    Construct the regular expression pattern: The pattern should match the desired fields and capture their values. Based on your example, the pattern can be constructed as follows:
    fstype="([^"]+)" instance="([^"]+)" mountpoint="([^"]+)"

    Here, ([^"]+) captures any non-quote characters and assigns them to the respective groups.

    Compile and apply the regular expression: Use a regular expression library in your programming language of choice (e.g., Python’s re module) to compile the pattern and apply it to each string in the collection.

    Iterate over the collection and extract the fields: For each string, apply the regular expression pattern and retrieve the captured groups. Here’s an example in Python:
    import re

    strings = [
    ‘{container="node-exporter", device="/dev/mapper/ubuntu–vg-ubuntu–lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.10:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-xvz7v", service="prometheus-prometheus-node-exporter"}’,
    # Add the remaining strings here
    ]

    pattern = r’fstype="([^"]+)" instance="([^"]+)" mountpoint="([^"]+)"’

    for string in strings:
    matches = re.findall(pattern, string)
    if matches:
    for match in matches:
    fstype, instance, mountpoint = match
    print(f"fstype="{fstype}" instance="{instance}" mountpoint="{mountpoint}"")

    This code will iterate over each string, apply the regular expression pattern, and extract the desired fields. It will then print the extracted fields in the desired format.

    Keep in mind that regular expressions can be powerful but may not be the best choice for all parsing tasks. If the structure of your strings varies significantly or becomes more complex, you may need to consider alternative parsing techniques or libraries that provide more advanced pattern matching and extraction capabilities.

    Login or Signup to reply.
  3. If it is known that the fields of interest are all present but their order in the string is not known, one could match the regular expression

    b(?<field_name>fstype|instance|mountpoint)="(?<field_value>[^"]+)
    

    Demo


    As shown in the demo, for the example given this returns three matches, with the contents of the two named capture groups, field_name and field_value as follows.

    field_name     field_value
    --------------------------------
    fstype         ext4
    instance       172.20.32.10:9100
    mountpoint     /
    

    This regular expression has the following elements.

    b                           # match a word boundary  
    (?<field_name>               # begin the named capture group field_name
      fstype|instance|mountpoint # match 'fstype', 'instance' or 'mountpoint'
    )                            # end the capture group field_name
    ="                          # match '="'
    (?<field_value>              # begin the named capture group field_value
     [^"]+                       # match one or more characters other than double
                                 # quotes, as many as possible
    )                            # end the capture group field_value
    

    One could test that all three fields are present by attempting to match a regular expression with three positive lookaheads:

    ^(?=.*bfstype=")(?=.*binstance=")(?=.*bmountpoint=")
    

    Demo

    Login or Signup to reply.
  4. "Can this be done with a regular expression?"

    Yes, it’s called a "capture".

    "If the answer to that question is yes, how do I do it?"

    There are many websites that allow for regular-expression pattern evaluation, and capture.

    A popular website used by many Stack Overflow users is, regex101.com.

    You can use the following pattern.

    (fstype|instance|mountpoint)=.(.+?)"
    

    Output

    fstype, ext4
    instance, 172.20.32.10:9100
    mountpoint, /
    fstype, ext4
    instance, 172.20.32.11:9100
    mountpoint, /
    fstype, ext4
    instance, 172.20.32.12:9100
    mountpoint, /
    fstype, ext4
    instance, 172.20.32.13:9100
    mountpoint, /
    fstype, ext4
    instance, 172.20.32.15:9100
    mountpoint, /
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search