sed remove string until next occurence - Debian

UlfTietze
November 10, 2021
164 views
3 votes
3 Answers

imagine, that i’ve some chatlog protocol. It could look like this:

MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF

I only want to get the chat between Anna and Bob, but only want to have the senders name one time, just until the other chatpartner begins.

What i’ve already archived is this sed script.

s/^MSGs+(anna|bob)|(anna|bob):s{1}(.+)CRLF$/1: "3"/g
t end

/^.*/d

:end

This creates:

bob: "Hello anna"
bob: "How are you"
anna: "Im fine, you?"
bob: "Same, wanna hang out some time?"
anna: "Yes"
anna: "for sure"
anna: "maybe in a few weeks?"

But i want something similar to:

bob: 
  Hello anna
  How are you
anna
  Im fine, you?
bob: 
  Same, wanna hang out some time?
anna: 
  Yes
  for sure
  maybe in a few weeks?

So, how can delete after one bob, all the bobs until the next anna comes?
Note, this is some stuff i have to use sed for. This has to run on Ubuntu Linux Systems with sed (GNU sed) 4.7 Packaged by Debian

Tags: sed

Answers

The following script:

cat <<EOF |
MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF
EOF
sed '
  # preprocess - remove uninterested parts
  /MSG ((anna)|bob|(bob)|anna): (.*)CRLF/!d
  s//23:4/

  # Check if are doing it again with same name.
  G   # Grab the previous name from hold space.
  /^([^:]*):(.*)n1$/{   # The names match?
    s//  2/p                 # Print only the message.
    d
  }

  h    # Put the whole line into hold space. For later.
  s/^([^:]*):([^n]*).*/1/   # Extract only name from the line.
  x    # Put the name in hold space, and grab the full line from hold space.
  s//1:n  2/     # Print the name with the message.
'

outputs:

bob:
  Hello anna
  How are you
anna:
  Im fine, you?
bob:
  Same, wanna hang out some time?
anna:
  Yes
  for sure
  maybe in a few weeks?

- potong
- November 11, 2021 at 12:36 am
- 0 votes
0
This might work for you (GNU sed):
```
sed -E '/^MSG ((anna)|bob|(bob)|anna): (.*)CRLF/{s//23:4/;H};$!d
       x;s/(n.*:).*(1.*)*/1n&/mg;s/n+.*:(S)/n  1/mg;s/.//' file
```
Turn on extended regexp -E.

Gather up the anna and bob conversations in the hold space.

At the end of file swap to the hold space, prepend the name of the of the following lines of conversation, remove the unwanted names and space indent each line of conversation for the prepended name.

Finally remove the first newline artefact.

An alternative solution (similar to KamilCuk):
```
sed -E '/^MSG ((anna)|bob|(bob)|anna): (.*)CRLF/!d;s//23:4/;G
        /^([^:]*:)(.*)n1$/{s//  2/p;d};h;s/:.*/:/p;x;s/[^:]*:/  /;P;d' file
```
Login or Signup to reply.

- user14473238
- November 12, 2021 at 10:27 am
- 0 votes
0
This uses POSIX sed syntax.
```
sed '
/^MSG (anna)|bob:/!{
  /^MSG (bob)|anna:/!d
}
s//1:
 /;s/CRLF$//;t t
:t
H;x;s/^([^:]*:n).*1//;t
g' file
```
It appends the current record to the previous one in the hold space, swaps them, removes duplicate names (along with the previous record), or else reverts the pattern space back to the original current record.

Here’s a more efficient version:
```
sed '
t
/^MSG (anna)|bob:/!{
  /^MSG (bob)|anna:/!d
}
s//1:
 /;s/CRLF$//
H;s/:.*/:/
x;s/^([^:]*:n)1//p;D' file
```
This avoids the use of .* in the duplicate detecting regexp by using the hold space to store the previous name rather than the entire previous record.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

sed remove string until next occurence – Debian

Answers