Lazy Grep -P: How to show only to the 1st match from the lines - Ubuntu - PhpOut

AjaibSingh
September 6, 2022
150 views
2 votes
3 Answers

I need to print only the 1st match from each line.

My file contains text something like this:

cat t.txt
abcsuahrcb
abscuharcb
bsaucharcb
absuhcrcab

He is the command I am trying with:

cat t.txt | grep -oP 'a.*?c'

It gives:

abc
ahrc
absc
arc
auc
arc
absuhc

I need it to return:

abc
absc
auc
absuhc

These are the 1st possible matches from each line.

Any other alternatives like sed and aws will work, but not something which needs to be installed on Ubuntu.

Tags: awk grep pcre regex

Answers

- choroba
- September 6, 2022 at 11:26 pm
- 0 votes
0
Perl to the rescue:
```
perl -lne 'print $1 if /(a.*?c)/' t.txt
```
- -n reads the input line by line, running the code for each;
- -l removes newlines from input lines and adds them to output;
- The code tries to match a.*?c, if matched, it stores the result in $1;
- As there’s no loop, only one match per line is attempted.
Login or Signup to reply.

- Thefourthbird
- September 6, 2022 at 11:37 pm
- 0 votes
0
Using grep you could write the pattern as matching from the first a to the first c using a negated character class.

Using -P for Perl-compatible regular expressions, you can make use of K to forget what is matched so far.

Note that you don’t have to use cat but you can add the filename at the end.
```
grep -oP '^[^a]*Ka[^c]*c' t.txt
```
The pattern matches:
- ^ Start of string
- [^a]* Optionally match any char except a
- K Forget what is matched so far
- a Match literally
- [^c]* Optionally match any char except c
- c Match literally
Output
```
abc
absc
auc
absuhc
```
Another option with gnu-awk and the same pattern, only now using and printing the capture group 1 value:
```
awk 'match($0,/^[^a]*(a[^c]*c)/, a) { print a[1]}' t.txt
```
Login or Signup to reply.

- markpfuso
- September 6, 2022 at 11:50 pm
- 0 votes
0
A sed variation on The fourth bird’s answer:
```
$ sed -En 's/^[^a]*(a[^c]*c).*/1/p' t.txt
abc
absc
auc
absuhc
```
Where:
- -En – enable extended regex support, suppress automatic printing of pattern space
- ^[^a]* – from start of line match all follow-on characters that are not a
- (a[^c]*c) – (1st capture group) match letter a plus all follow-on characters that are not c followed by a c
- .* – match rest of line
- 1/p – print contents of 1st capture group
One awk idea:
```
$ awk 'match($0,/a[^c]*c/) { print substr($0,RSTART,RLENGTH)}' t.txt
abc
absc
auc
absuhc
```
Where:
- if we find a match then the match() call is non-zero (ie, ‘true’) so …
- print the substring defined by the RSTART/RLENGTH variables (which are auto-populated by a successful match() call)
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.