Ubuntu - Select multiline text from between two strings using regex or other method

saltedlolly
August 29, 2023
273 views
0 votes
2 Answers

I am trying to read variables from the diferent sections of a digibyte.conf file. Here is an example file:

# Generated by https://jlopp.github.io/bitcoin-core-config-generator/

# This config should be placed in following path:
# ~/.digibyte/digibyte.conf

# [chain]
# Run this node on the DigiByte Test Network. Equivalent to -chain=test
testnet=1
# Test Network.
chain=test

# [rpc]
# Accept command line and JSON-RPC commands.
server=1
# Bind to given address to listen for JSON-RPC connections. This option is ignored unless -rpcallowip is also passed. Port is optional and overrides -rpcport. Use [host]:port notation for IPv6. This option can be specified multiple times. (default: 127.0.0.1 and ::1 i.e., localhost)
rpcbind=127.0.0.1

# [wallet]
# Do not load the wallet and disable wallet RPC calls.
disablewallet=1


# [Sections]
# Most options automatically apply to mainnet, testnet, and regtest networks.
# If you want to confine an option to just one network, you should add it in the relevant section.
# EXCEPTIONS: The options addnode, connect, port, bind, rpcport, rpcbind and wallet
# only apply to mainnet unless they appear in the appropriate section below.

# Options only for mainnet
[main]

# Listen for incoming connections on non-default mainnet port. Mainnet default is 12024.
# Setting the port number here will override the default testnet port numbers.
port=12024

# Bind to given address to listen for JSON-RPC connections. This option is ignored unless
# -rpcallowip is also passed. Port is optional and overrides -rpcport. Use [host]:port notation
# for IPv6. This option can be specified multiple times. (default: 127.0.0.1 and ::1 i.e., localhost)
rpcbind=127.0.0.1

# Listen for JSON-RPC connections on this port. Mainnet default is 14022. 
rpcport=14022

# Options only for testnet
[test]

# Listen for incoming connections on non-default testnet port. Testnet default is 12026.
# Setting the port number here will override the default testnet port numbers.
port=12026

# Bind to given address to listen for JSON-RPC connections. This option is ignored unless
# -rpcallowip is also passed. Port is optional and overrides -rpcport. Use [host]:port notation
# for IPv6. This option can be specified multiple times. (default: 127.0.0.1 and ::1 i.e., localhost)
rpcbind=127.0.0.1

# Listen for JSON-RPC connections on this port. Testnet default is 14023.
rpcport=14023

# Options only for regtest
[regtest]

# regtest variables go here

I want to be able to grab particular sections from the file, so that I can then check if it contains certain variables. For example:

Select everything from the beginning of the file till the first line that starts with a ‘[‘ charcter. This will select everything above the [Sections].
Select the testnet section. i.e. Select everything between the line that starts [test] and the next line below it that starts with a ‘[‘ charcter. (some users may have sections in a diferent order, so it can’t assum the next section will be [regtest]. This will always select everything specified specifically for testnet.
Select the regtest section. i.e. Select everything between the line that starts [regtest] and the end of the file. Sometimes the section being searched for may be the last section in the file so there is no ending ‘[‘ charcter to finish on. I need to be able to search for both situations.

This is part of a larger script that helps setup and monitor a DigiByte Node.

So far I have managed to do a basic search for the testnet section using awk with a regex:
awk '/^[test]/,/^[regtest]/' ~/.digibyte/digibyte.conf

This pretty much works as intended.

However if I change this to only search for the next line starting ‘[‘ to end on instead of ‘[regtest]’, I get nothing:
awk '/^[test]/,/^[/' ~/.digibyte/digibyte.conf

I also have no idea how to select from the start of a file to a string, or from a string to the end of the file.

Ideally this should work on ubuntu/debian server without needing to install additional packages.

Any help would be greatly appreciated.

Answers

- PatrickJanser
- August 29, 2023 at 4:11 pm
- 0 votes
0
I think the problem is simply that ^[ will match the line "[test]"
itself!

So one option would be to use a negative lookahead to say "not
followed by the word test" but they are not available in awk.
As you don’t know what will be the next section, it’s a bit
complicated.

One could do:
```
awk '/^[test]/,/^[[^t]/' ~/.digibyte/digibyte.conf
```
But the problem is that if the next section is named, let’s say,
"[throttle]", then it won’t match because it also starts with the
letter "t".

I had a try with /r?n[/ to find the next section, but it didn’t
seem to work. Maybe you should use another tool, such as ripgrep
or perl. Both can handle multi-line searches.

Good old grep to the rescue

ripgrep will certainly not be installed. But grep should. And
in rescent versions, grep has the PCRE engine available with the
-P option. It’s also possible to enable the s (dot match all)
flag in the pattern with (?s) and to only print the match with
the -o option. Another trick is to enable the -z option of
grep so that a data line ends with a 0 byte instead of a newline.
This way, it’s possible to read multiple lines instead of one by
one like grep usually works.

It will even be possible to use lookarounds to avoid matching
the beginning of the next section.

So we can do this:
```
grep 
  -Pzo 
  '(?s)(?<=n|^)[test].*?(?=r?n[)' 
  ~/.digibyte/digibyte.conf
```
Login or Signup to reply.

You can also use regular expressions in Python to extract the desired sections from the digibyte.conf!

import re

#reading the digibyte.conf
with open('digibyte.conf', 'r') as file:
    content = file.read()

#select everything from the beginning of the file till the first line that starts with a '[' character
section1 = re.search(r'^(.*?)(?=[)', content, re.MULTILINE | re.DOTALL).group(1)

#select the testnet section
section2 = re.search(r'[test](.*?)^(?=[)', content, re.MULTILINE | re.DOTALL).group(1)

#select the regtest section
section3 = re.search(r'[regtest](.*?)Z', content, re.MULTILINE | re.DOTALL).group(1)

print("Section 1:")
print(section1)
print("")

print("Section 2:")
print(section2)
print("")

print("Section 3:")
print(section3)

or via Perl:

#!/usr/bin/perl

use strict;
use warnings;

#read the digibyte.conf
open(my $fh, '<', 'digibyte.conf') or die "Failed to open file: $!";
my $content = do { local $/; <$fh> };

#selecting everything from the beginning of the file till the first line that starts with a '[' character
my ($section1) = $content =~ /^(.*?)(?=[)/s;

#select the testnet section
my ($section2) = $content =~ /[test](.*?)(?=[)/s;

#select the regtest section
my ($section3) = $content =~ /[regtest](.*)/s;

print "Section 1:n";
print "$section1nn";

print "Section 2:n";
print "$section2nn";

print "Section 3:n";
print "$section3n";

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – Select multiline text from between two strings using regex or other method

Answers

Good old grep to the rescue