extract text between the two blocks using regex - Nginx

Monk
June 9, 2022
275 views
4 votes
4 Answers

I am trying to extract the text between the two strings using the following regex.

(?s)Non-terminated Pods:.*?in total.R(.*)(?=Allocated resources)

This regex looks fine in regex101 but somehow does not print the pod details when used with perl or grep -P. Below command results in empty output.

kubectl describe  node |perl -le '/(?s)Non-terminated Pods:.*?in total.R(.*)(?=Allocated resources)/m; printf "$1"'

Here is the sample input:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

Question:

how to extract the info from the above output, to look like below. What is wrong in the regex or the command that I am using?

Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)

Question-2: What if I have two blocks of similar inputs. How to extract the pod details ?
Eg:

if the input is:

PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:
....some
.......random data...
PodCIDRs:                     10.233.65.0/24
Non-terminated Pods:          (7 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
Allocated resources:

Answers

Using gnu-grep you can use your regex with some tweaks:

kubectl describe  node |
grep -zoP '(?s)Non-terminated Pods:.*?in total.RK(.*?)(?=Allocated resources)'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

Used K (match reset) after R to remove that line from output
Used -z option to treat treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

PS: Same regex will work with second input block as well with header line shown before each block.

Alternatively you can use any version sed for this job as well:

kubectl describe  node |
sed -n '/Non-terminated Pods:.*in total.*/,/Allocated resources:/ {//!p;}'

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

- zdim
- June 9, 2022 at 8:08 pm
- 0 votes
0
With some obvious assumptions, and keeping it close to the pattern in the question:
```
perl -0777 -wnE'
    @pods = /Non-terminateds+Pods:s+([0-9]+s+ins+total)n(.*?)nAllocated resources:/gs;
    say for @pods
' input-file
```
(note modifiers on the regex in this line, which is too wide to fit on screen: /gs)

The regex from the question works when used instead of the one in this answer (and with no /s modifier, as it should) on a single block of text. To work with multiple blocks the (.*) in it need be changed to (.*?), so that it doesn’t match all the way to the last Allocated...

The question doesn’t say how precisely is that regex "used with perl"; I can’t say what failed.

Comments on the command-line program above:
- The -0777 switch makes it read the file whole into a string, available in the program in the variable $_, to which the regex is bound by default
  
  There is also the switch -g, an alias for -0777, available starting with 5.36.0
- We still need the -n switch so that the program iterates over the "lines" of input (STDIN or a file). In this case the input record separator is undefined so it’s all just one "line"
- The regex captures are returned since the match operator is in the list context, being assigned to the array @pods
Login or Signup to reply.

- RavinderSingh13
- June 10, 2022 at 4:17 am
- 0 votes
0
With your shown samples, please try following GNU awk code. Written and tested in GNU awk. Simple explanation would be, setting RS as Non-terminated Pods:.*Allocated resources: for Input_file. Then in main program checking if RT is NOT NULL then using gsub function of awk to substitute (^|n)Non-terminated Pods:[^n]*n OR nAllocated resources:n* with NULL in RT variable and then printing its value which will provide output as per shown samples.
```
awk -v RS='Non-terminated Pods:.*Allocated resources:' '
RT{
  gsub(/(^|n)Non-terminated Pods:[^n]*n|nAllocated resources:n*/,"",RT)
  print RT
}
'  Input_file
```
Login or Signup to reply.

A possible solution could be as following for a very big files to read line by line.

Select range of lines of interest and remove the last one which is not included into desired output.

use strict;
use warnings;

while(<>) {
    if( /^  Namespace/ .. /^Allocated resources:/ ) {
        print unless /^Allocated resources:/;
    }
}

exit 0;

Output

  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo                                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-1                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp8                          100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     foo-1                                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         105s
  kube-system                 nginx-proxy-kube-worker-2                   25m (1%)      0 (0%)      32M (1%)         0 (0%)         9m8s
  kube-system                 nodelocaldns-xbjp3-2                        100m (5%)     0 (0%)      70Mi (4%)        170Mi (10%)    7m4s

Please signup or login to give your own answer.

Click here to cancel reply.

extract text between the two blocks using regex – Nginx

Answers