skip to Main Content

I have a Windows text file which contains a line (with ending CRLF)

aline

The following is several commands’ output:

[root@panel ~]# grep aline file.txt
aline
[root@panel ~]# grep aline$'r' file.txt

[root@panel ~]# grep aline$'r'$'n' file.txt

[root@panel ~]# grep aline$'n' file.txt
aline

The first command’s output is normal. I’m curious about the second and the third output. Why is it an empty line? And the last output, I think it can not find the string but it actually finds it, why? The commands are run on CentOS/bash.

3

Answers


  1. If the input is not well-formed, the behavior is undefined.

    In practice, some versions of GNU grep use CR for internal purposes, so attempting to match it does not work at all, or produces really bizarre results.

    For not entirely different reasons, passing in a literal newline as part of the regular expression could have some odd interpretations, including, but not limited to, interpreting the argument as two separate patterns. (Look at how grep -F reads from a file, and imagine that at least some implementations use the same logic to parse the command line.)

    In the grand scheme of things, the sane solution is to fix the input so it’s a valid text file before attempting to run Unix line-oriented tools on it.

    For quick and dirty solutions, some tools have well-defined semantics for random binary input. Perl is a model citizen in this respect.

    bash$ perl -ne 'print if /aliner$/' <<<$'aliner'
    aline
    

    Awk also tends to work amicably, though there are several implementations, so the risk that somebody somewhere has a version which doesn’t behave identically to AT&T Awk is higher.

    Maybe notice also how r is the last character before the end of the line (the DOS line ending is the sequence CR LF, where LF is the standard Unix line terminator for text files).

    Login or Signup to reply.
  2. In this case grep really matches the string "aliner" but you just don’t see it because it was overwritten by the ANSI sequence that prints color. Pass the output to od -c and you’ll see

    $ grep aline file.txt
    aline
    $ grep aline$'r' file.txt
    
    $ grep aline$'r' --color=never file.txt
    aline
    $ grep aline$'r' --color=never file.txt | od -c
    0000000   a   l   i   n   e  r  n
    0000007
    $ grep aline$'r' --color=always file.txt | od -c
    0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
    0000020  r 033   [   m 033   [   K  n
    0000030
    

    With --color=never you can see the output string because grep doesn’t print out the color. r simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by default grep will check whether it’s running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then print n clears the rest of the line

    To match n you can use the -z option to make null bytes the line separator

    $ grep -z aline$'r'$'n' --color=never file.txt
    aline
    $ grep -z aline$'r'$'n' --color=never file.txt  | od -c
    0000000   a   l   i   n   e  r  n  
    0000010
    $ grep -z aline$'r'$'n' --color=always file.txt | od -c
    0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
    0000020  r 033   [   m 033   [   K  n  
    0000031
    

    Your last command grep aline$'n' file.txt works because n is simply a word separator in bash, so the command is just the same as grep aline file.txt. Exactly the same thing happened in the 3rd line: grep aline$'r'$'n' file.txt To pass a newline you must quote the argument to prevent word splitting

    $ echo "aline" | grep -z "aline$(echo $'n')"
    aline
    

    To demonstrate the effect of the quote with the 3rd line I added another line to the file

    $ cat file.txt
    aline
    another line
    $ grep -z "aline$(echo $'n')" file.txt | od -c
    0000000   a   l   i   n   e  r  n   a   n   o   t   h   e   r       l
    0000020   i   n   e  n  
    0000025
    $ grep -z "aline$(echo $'n')" file.txt
    aline
    another line
    $
    
    Login or Signup to reply.
  3. At least for me phuclv’s answer doesn’t completely cover the last case, i.e. grep aline$'n' file.txt.
    Your mileage my vary depending on which shell and which version and implementation of grep you are using, but for me grep -z "aline$(echo $'n')" and grep -z aline$'n' both just match the same pattern as grep -z aline.

    This becomes more apparent if the -o switch is used, so that grep outputs only the matched string and not the entire line (which is the entire file for most text files when the -z option is used).

    If you use the same file.txt as in phuclv’s second example:

    $ cat file.txt
    aline
    another line
    $ grep -z "aline$(echo $'n')" file.txt | od -c
    
    0000000   a   l   i   n   e  r  n   a   n   o   t   h   e   r       l
    
    0000020   i   n   e  n  
    
    0000025
    
    $ grep -z -o "aline$(echo $'n')" file.txt | od -c
    
    0000000   a   l   i   n   e  
    
    0000006
    
    $ grep -z -o aline$'n' file.txt | od -c
    
    0000000   a   l   i   n   e  
    
    0000006
    
    $ grep -z -o aline file.txt | od -c
    
    0000000   a   l   i   n   e  
    
    0000006
    
    

    To actually match a n as part of the pattern I had to use the -P switch to turn on "Perl-compatible regular expression"

    $ grep -z -o -P 'alinern' file.txt | od -c
    
    0000000   a   l   i   n   e  r  n  
    
    0000010
    
    $ grep -z -o -P 'alinernanother' file.txt | od -c
    
    0000000   a   l   i   n   e  r  n  a   n   o   t   h   e   r  
    
    0000017
    
    

    For reference:

    grep --version|head -n1
    grep (GNU grep) 3.1
    
    bash --version|head -n1
    GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search