I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I’ve tried the following:
cat myfile.txt | egrep "^TIME[.*]ss-(3)-.+"
This identifies correctly the lines I want. Now, to delete the pattern I’ve tried:
cat myfile.txt | sed s/"^TIME[.*]ss-(3)-.+"//
but it just seems to be doing the cat
, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7
8
Answers
Thanks all, for your help. By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME[[0-9]+.[0-9]+ms]ss-(3)-.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME[[0-9]+.[0-9]+ms]ss-(3)-.+//’
Cheers, PedroWith your shown samples, please try following
grep
command. Written and tested with GNUgrep
.Explanation: Adding detailed explanation for above code.
2nd solution: Adding
awk
program here.Explanation: using
match
function ofawk
, to match regex^TIME[[0-9]+.[0-9]+ms][[:space:]]+-([0-9]+)-.+
which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.You may use:
The
s
regex extension may not be supported by yoursed
.In BRE syntax (which is what
sed
speaks out of the box) you do not backslash round parentheses – doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also,+
is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it:+
).You can try adding an
-E
option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won’t enable Perl regex extensions, which are not part of ERE syntax, either.should work on any reasonably POSIX
sed
. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had.*
from "escaping" past the closing square bracket. In some more detail,[^][]
is a negated character class which matches any character which isn’t (a newline or)]
or[
; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entiresed
script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)If you have
sed -E
orsed -r
you can use+
instead of*
but then this complicates the overall regex, so I won’t suggest that here.This
awk
using itssub()
function:sub()
returns true.A simpler one for
sed
:If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME…", then:
should get the line starting with "TIME…" and print the text within the quotes.