imagine, that i’ve some chatlog protocol. It could look like this:
MSG sender|reciever2: Hello its meCRLF
MSG bob|anna: Hello annaCRLF
MSG bob|anna: How are youCRLF
MSG anna|bob: Im fine, you?CRLF
MSG bob|anna: Same, wanna hang out some time?CRLF
MSG anna|bob: YesCRLF
MSG bob|peter: hey im asking anna to hang out lolCRLF
MSG anna|bob: for sureCRLF
MSG anna|bob: maybe in a few weeks?CRLF
I only want to get the chat between Anna and Bob, but only want to have the senders name one time, just until the other chatpartner begins.
What i’ve already archived is this sed script.
s/^MSGs+(anna|bob)|(anna|bob):s{1}(.+)CRLF$/1: "3"/g
t end
/^.*/d
:end
This creates:
bob: "Hello anna"
bob: "How are you"
anna: "Im fine, you?"
bob: "Same, wanna hang out some time?"
anna: "Yes"
anna: "for sure"
anna: "maybe in a few weeks?"
But i want something similar to:
bob:
Hello anna
How are you
anna
Im fine, you?
bob:
Same, wanna hang out some time?
anna:
Yes
for sure
maybe in a few weeks?
So, how can delete after one bob, all the bobs until the next anna comes?
Note, this is some stuff i have to use sed for. This has to run on Ubuntu Linux Systems with sed (GNU sed) 4.7 Packaged by Debian
3
Answers
The following script:
outputs:
This might work for you (GNU sed):
Turn on extended regexp
-E
.Gather up the
anna
andbob
conversations in the hold space.At the end of file swap to the hold space, prepend the name of the of the following lines of conversation, remove the unwanted names and space indent each line of conversation for the prepended name.
Finally remove the first newline artefact.
An alternative solution (similar to KamilCuk):
This uses POSIX sed syntax.
It appends the current record to the previous one in the hold space, swaps them, removes duplicate names (along with the previous record), or else reverts the pattern space back to the original current record.
Here’s a more efficient version:
This avoids the use of
.*
in the duplicate detecting regexp by using the hold space to store the previous name rather than the entire previous record.