I have a Windows text file which contains a line (with ending CRLF)
aline
The following is several commands’ output:
[root@panel ~]# grep aline file.txt
aline
[root@panel ~]# grep aline$'r' file.txt
[root@panel ~]# grep aline$'r'$'n' file.txt
[root@panel ~]# grep aline$'n' file.txt
aline
The first command’s output is normal. I’m curious about the second and the third output. Why is it an empty line? And the last output, I think it can not find the string but it actually finds it, why? The commands are run on CentOS/bash.
3
Answers
If the input is not well-formed, the behavior is undefined.
In practice, some versions of GNU
grep
use CR for internal purposes, so attempting to match it does not work at all, or produces really bizarre results.For not entirely different reasons, passing in a literal newline as part of the regular expression could have some odd interpretations, including, but not limited to, interpreting the argument as two separate patterns. (Look at how
grep -F
reads from a file, and imagine that at least some implementations use the same logic to parse the command line.)In the grand scheme of things, the sane solution is to fix the input so it’s a valid text file before attempting to run Unix line-oriented tools on it.
For quick and dirty solutions, some tools have well-defined semantics for random binary input. Perl is a model citizen in this respect.
Awk also tends to work amicably, though there are several implementations, so the risk that somebody somewhere has a version which doesn’t behave identically to AT&T Awk is higher.
Maybe notice also how
r
is the last character before the end of the line (the DOS line ending is the sequence CR LF, where LF is the standard Unix line terminator for text files).In this case
grep
really matches the string"aliner"
but you just don’t see it because it was overwritten by the ANSI sequence that prints color. Pass the output tood -c
and you’ll seeWith
--color=never
you can see the output string becausegrep
doesn’t print out the color.r
simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by defaultgrep
will check whether it’s running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then printn
clears the rest of the lineTo match
n
you can use the-z
option to make null bytes the line separatorYour last command
grep aline$'n' file.txt
works becausen
is simply a word separator in bash, so the command is just the same asgrep aline file.txt
. Exactly the same thing happened in the 3rd line:grep aline$'r'$'n' file.txt
To pass a newline you must quote the argument to prevent word splittingTo demonstrate the effect of the quote with the 3rd line I added another line to the file
At least for me phuclv’s answer doesn’t completely cover the last case, i.e.
grep aline$'n' file.txt
.Your mileage my vary depending on which shell and which version and implementation of grep you are using, but for me
grep -z "aline$(echo $'n')"
andgrep -z aline$'n'
both just match the same pattern asgrep -z aline
.This becomes more apparent if the
-o
switch is used, so that grep outputs only the matched string and not the entire line (which is the entire file for most text files when the-z
option is used).If you use the same file.txt as in phuclv’s second example:
To actually match a
n
as part of the pattern I had to use the-P
switch to turn on "Perl-compatible regular expression"For reference: