I need to extract multiple pieces of information from a returned collection of strings. The strings consist of a number of "fields", for example:
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.10:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-xvz7v", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.11:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-w8cqq", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.12:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-4wfhq", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.13:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-zfskj", service="prometheus-prometheus-node-exporter"}
{container="node-exporter", device="/dev/mapper/ubuntu--vg-ubuntu--lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.15:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-cllnv", service="prometheus-prometheus-node-exporter"}
In this case, I want to extract the label and value of the fstype
, instance
and mountpoint
fields. So for the example provided, I’d end up with something like the following:
fstype="ext4" instance="172.20.32.10:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.11:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.12:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.13:9100" mountpoint="/"
fstype="ext4" instance="172.20.32.15:9100" mountpoint="/"
Can this be done with a regular expression? I’ve used REs for a long time, but never anything this complex. If the answer to that question is yes, how do I do it?
4
Answers
This can easily be done with groups and string replace. See this regex:
with this replacement:
Explanation:
We create three groups that match the keys for which you want to have the values. E.g. group 1:
(fstype="[^"]*")
. The groups can then be referred to by $1 – $n. In order for the resulting string to only contain the groups, we need to make the regex match the whole string. Thus the regex starts and ends with.*
. Further explanations:.*
– matches 0 to infinity non line-breaking character.*?
– same as above with the difference that it matches ‘lazy’. Meaning the regex tries to create the smallest possible match.[^"]*
– matches 0 to infinity characters, that are not"
.(...)
– creates a group with the content resulting from the regex defined inside of the group.Small example with javascript:
Notes:
replaceAll(...)
function. Otherwise, only the first match of the regex will be replaced. That leads to a minor adaptation of the regex. It needs the ‘global’ flag, which gets activated in the example by theg
at the end of the regex.Remarks
Depending on the complete task you need to accomplish it might be more reasonable to use a library for the data structure you’re given. From your example I suspect it to be JSON5. Depending on which programming language you’re using, there are many approaches with different libraries. If you are just using an editor (like VS Code or Notepad++) you can just use the regex I provided in the integrated search function.
To extract the desired fields from the collection of strings using regular expressions, you can follow these steps:
Construct the regular expression pattern: The pattern should match the desired fields and capture their values. Based on your example, the pattern can be constructed as follows:
fstype="([^"]+)" instance="([^"]+)" mountpoint="([^"]+)"
Here, ([^"]+) captures any non-quote characters and assigns them to the respective groups.
Compile and apply the regular expression: Use a regular expression library in your programming language of choice (e.g., Python’s re module) to compile the pattern and apply it to each string in the collection.
Iterate over the collection and extract the fields: For each string, apply the regular expression pattern and retrieve the captured groups. Here’s an example in Python:
import re
strings = [
‘{container="node-exporter", device="/dev/mapper/ubuntu–vg-ubuntu–lv", endpoint="http-metrics", fstype="ext4", instance="172.20.32.10:9100", job="node-exporter", mountpoint="/", namespace="prometheus", pod="prometheus-prometheus-node-exporter-xvz7v", service="prometheus-prometheus-node-exporter"}’,
# Add the remaining strings here
]
pattern = r’fstype="([^"]+)" instance="([^"]+)" mountpoint="([^"]+)"’
for string in strings:
matches = re.findall(pattern, string)
if matches:
for match in matches:
fstype, instance, mountpoint = match
print(f"fstype="{fstype}" instance="{instance}" mountpoint="{mountpoint}"")
This code will iterate over each string, apply the regular expression pattern, and extract the desired fields. It will then print the extracted fields in the desired format.
Keep in mind that regular expressions can be powerful but may not be the best choice for all parsing tasks. If the structure of your strings varies significantly or becomes more complex, you may need to consider alternative parsing techniques or libraries that provide more advanced pattern matching and extraction capabilities.
If it is known that the fields of interest are all present but their order in the string is not known, one could match the regular expression
Demo
As shown in the demo, for the example given this returns three matches, with the contents of the two named capture groups,
field_name
andfield_value
as follows.This regular expression has the following elements.
One could test that all three fields are present by attempting to match a regular expression with three positive lookaheads:
Demo
Yes, it’s called a "capture".
There are many websites that allow for regular-expression pattern evaluation, and capture.
A popular website used by many Stack Overflow users is, regex101.com.
You can use the following pattern.
Output