I need to print only the 1st match from each line.
My file contains text something like this:
cat t.txt
abcsuahrcb
abscuharcb
bsaucharcb
absuhcrcab
He is the command I am trying with:
cat t.txt | grep -oP 'a.*?c'
It gives:
abc
ahrc
absc
arc
auc
arc
absuhc
I need it to return:
abc
absc
auc
absuhc
These are the 1st possible matches from each line.
Any other alternatives like sed and aws will work, but not something which needs to be installed on Ubuntu.
3
Answers
Perl to the rescue:
-n
reads the input line by line, running the code for each;-l
removes newlines from input lines and adds them to output;a.*?c
, if matched, it stores the result in $1;Using
grep
you could write the pattern as matching from the firsta
to the firstc
using a negated character class.Using
-P
for Perl-compatible regular expressions, you can make use ofK
to forget what is matched so far.Note that you don’t have to use
cat
but you can add the filename at the end.The pattern matches:
^
Start of string[^a]*
Optionally match any char excepta
K
Forget what is matched so fara
Match literally[^c]*
Optionally match any char exceptc
c
Match literallyOutput
Another option with
gnu-awk
and the same pattern, only now using and printing the capture group 1 value:A
sed
variation on The fourth bird’s answer:Where:
-En
– enable extended regex support, suppress automatic printing of pattern space^[^a]*
– from start of line match all follow-on characters that are nota
(a[^c]*c)
– (1st capture group) match lettera
plus all follow-on characters that are notc
followed by ac
.*
– match rest of line1/p
– print contents of 1st capture groupOne
awk
idea:Where:
match()
call is non-zero (ie, ‘true’) so …substr
ing defined by theRSTART/RLENGTH
variables (which are auto-populated by a successfulmatch()
call)