skip to Main Content

I’m migrating many bash shell scripts from old versions of raspbian and ubuntu to the current raspbian version. I’ve made a brand new system installation, including various configuration (text) files that I’ve created for these scripts. I found to my horror that awk-print and awk-printf APPEAR to have changed in the latest version, as evidenced by bash variable-type errors when the values are used. What’s going on ?

Now that I know the answer, I can explain what happened so others can avoid it. That’s why I said, awk-print APPEARS to have changed. It didn’t, as I discovered when I checked the version of awk on all three machines. Running:

awk -W version

on all three systems gave the same version, mawk 1.3.3 Nov 1996.

When a text file is small, I find it the simplest to cat the file to a variable, grep that variable for a keyword that identifies a particular line and by extension a particular variable, and use ‘tr’ and ‘awk print’ to split the line and assign the value to a variable. Here’s an example line from which I want to assign ‘5’ to a variable:

"keyword=5"<line terminator>

That line is one of several read from a text file, so there’s at least one line terminator after each line. That line terminator is the key to the problem.

I execute the following commands to read the file, find the line with ‘keyword’, split the line at ‘=’, and assign the value from that line to bar:

file_contents="$(cat "$filename")"

bar="$(echo -e "$file_contents" | grep "keyword" | tr "=" " " | awk '{print $2}')"

Here’s the subtle part. Unknownst to me, in the process of creating a new system, the line terminators in some of my text files changed from linux format, with a single line terminator (n), to DOS format, with two line terminators (nr), for each line, when I set up the new system. When, working from the keyboard, I grepped the text file to get the desired line, this caused the value that awk-print assigned to ‘bar’ to have a line terminator (r). This terminator does NOT appear on screen because bash supplies one. It’s only evident if one executes:

echo ${#bar}

to get the length of the string, or does:

echo -e "$bar"

The hidden terminator shows up as one additional character.

So, the solution to the problem was either to use ‘fromdos’ to remove the second line terminator before processing the files, or to remove the unwanted ‘r’ that was being assigned to each variable. One helpful comment noted that ‘cat -vE $file" would show every character in the file. Sure enough, the dual terminators were present.

Another helpful comment noted that using I was causing multiple sub-processes to run when I parsed each line, slowing execution time, and that a bashism:

${foo//*=/}

could avoid it. That bashism helped parse each line but did not remove the offending ‘r’. A second bashism:

${foo//$'r'/}

removed that ‘r’.

CASE SOLVED

3

Answers


  1. Chosen as BEST ANSWER

    I found the problem thanks to several of the responses. It's rudimentary, I know, but I grepped a text file to extract a line with a keyword. I used tr to split the line and awk print to extract one argument, a numeric value, from that. That text file, once copied to the new machine, had a CR LF at the end of each line. Originally, it just had a newline character, which worked fine. But with the CR LF pair, every numeric value that I assigned to a variable using awk print had a newline character. This was not obvious onscreen, caused every arithmetic statement and numeric IF statement using it to fail, and caused the issues I reported about awk print.


  2. #!/bin/sh -x
    
    echo "value=5" | tr "=" "n" > temp
    echo "1,2p" | ed -s temp
    

    I have come to view Ed as UNIX’s answer to the lightsaber.

    Login or Signup to reply.
  3. I found a format string, ‘"%c", $2’ to use with printf in the current
    awk, but I have to use ‘"%s", $2 in the old version. Note ‘%c’ vs
    ‘%s’.

    %c behavior does depend on type of argument you feed – if it is numeric you will get character corresponding to given ASCII code, if it is string you will get first character of it, example

    mawk 'BEGIN{printf "%c", 42}' emptyfile
    

    does give output

    *
    

    and

    mawk 'BEGIN{printf "%c", "HelloWorld"}' emptyfile
    

    does give output

    H
    

    Apparently your 2nd field is digit and some junk characters, which is considered to be string, thus second option is used. But is taking first character correct action in all your use-cases? Is behavior compliant with requirement for multi-digit numbers, e.g. 555?

    (tested in mawk 1.3.3)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search