skip to Main Content

I am working on a Bash scripting project in which I need to delete one of two files if they have identical content. I should delete the one which comes last in an alphabetical sort and in the example output my professor has provided, apple.dat is deleted when the choices are apple.dat and Apple.dat.

if [[ "apple" > "Apple" ]]; then
    echo apple
else
    echo Apple
fi

prints Apple

echo $(echo -e "Applenapple" | sort | tail -n1)

prints Apple

The ASCII value of a is 97 and A is 65, why is the test saying A is greater?

The weird thing is that I get opposite results with the older syntax:

if [ "apple" > "Apple" ]; then
    echo apple
else
    echo Apple
fi

prints apple

and if we try to use the > in the [[ ]] syntax, it is a syntax error.

How can we correct this for the double bracket syntax? I have tested this on the school Debian server, my local machine, and my Digital Ocean droplet server. On my local Ubuntu 20.04 and on the school server I get the output described above. Interestingly, on my Digital Ocean droplet which is an Ubuntu 20.04 server, I get "apple" with both double and single bracket syntax. We are allowed to use either syntax, double bracket or the single bracket actual test call, however I prefer using the newer double bracket syntax and would rather learn how to make this work than to convert my mostly finished script to the older more POSIX compliant syntax.

3

Answers


  1. Chosen as BEST ANSWER

    I have come up with my own solution to the problem, however I must first thank @GordonDavisson and @LéaGris for their help and for what I have learned from them as that is invaluable to me.

    No matter if computer or human locale is used, if, in an alphabetical sort, apple comes after Apple, then it also comes after Banana and if Banana comes after apple, then Apple comes after apple. So I have come up with the following:

    # A function which sorts two words alphabetically with lower case coming after upper case.
    # The last word in the sort will be printed twice to demonstrate that this works for both
    # the POSIX compliant single bracket test call and the newer double bracket condition
    # syntax.
    # arg 1: One of two words to sort
    # arg 2: One of two words to sort
    # Return: 0 upon completion, 1 if incorrect number of args is given
    sort_alphabetically() {
        [ $# -ne 2 ] && return 1
    
        word_1_val=0
        word_2_val=0
    
        while read -n1 letter; do
            (( word_1_val += $(printf '%d' "'$letter") ))
        done < <(echo -n "$1")
    
        while read -n1 letter; do
            (( word_2_val += $(printf '%d' "'$letter") ))
        done < <(echo -n "$2")
    
        if [ $word_1_val -gt $word_2_val ]; then
            echo $1
        else
            echo $2
        fi
    
        if [[ $word_1_val -gt $word_2_val ]]; then
            echo $1
        else
            echo $2
        fi
    
        return 0
    }
    
    sort_alphabetically "apple" "Apple"
    sort_alphabetically "Banana" "apple"
    sort_alphabetically "aPPle" "applE"
    

    prints:

    apple
    apple
    Banana
    Banana
    applE
    applE
    

    This works using process substitution and redirecting the output into the while loop to read one character at a time and then using printf to get the decimal ASCII value of each character. It is like creating a temporary file from the string which will be automatically destroyed and then reading it one character at a time. The -n for echo means the n character, if there is one from user input or something, will be ignored.

    From bash man pages:

    Process Substitution

    Process substitution allows a process's input or output to be referred to using a filename. It takes the form of <(list) or >(list). The process list is run asynchronously, and its input or output appears as a filename. This filename is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will provide input for list. If the <(list) form is used, the file passed as an argument should be read to obtain the output of list. Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files.

    When available, process substitution is performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion.

    from stackoverflow post about printf:

    If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.

    Note: process substitution is not POSIX compliant, but it is supported by Bash in the way stated in the bash man page.


    UPDATE: The above does not work in all cases!


    The above solution works in many cases however we get some anomalies.

    first word second word last alphabetically
    apple Apple apple correct
    Apple apple apple correct
    apPLE Apple Apple incorrect
    apple Banana Banana correct
    apple BANANA apple incorrect

    The following solution gets the results that are needed:

    #!/bin/bash
    
    sort_alphabetically() {
        [ $# -ne 2 ] && return 1
    
        local WORD_1="$1"
        local WORD_2="$2"
        local WORD_1_LOWERED="$(echo -n $1 | tr '[:upper:]' '[:lower:]')"
        local WORD_2_LOWERED="$(echo -n $2 | tr '[:upper:]' '[:lower:]')"
    
        if [ $(echo -e "$WORD_1n$WORD_2" | sort | tail -n1) = "$WORD_1" ] ||
           [ $(echo -e "$WORD_1_LOWEREDn$WORD_2_LOWERED" | sort | tail -n1) =
             "$WORD_1_LOWERED" ]; then
    
            if [ "$WORD_1_LOWERED" = "$WORD_2_LOWERED" ]; then
    
                ASCII_VAL_WORD_1=0
                ASCII_VAL_WORD_2=0
                read -n1 FIRST_CHAR_1 < <(echo -n "$WORD_1")
                read -n1 FIRST_CHAR_2 < <(echo -n "$WORD_2")
    
                while read -n1 character; do
                    (( ASCII_VAL_WORD_1 += $(printf '%d' "'$character") ))
                done < <(echo -n $WORD_1)
                
                while read -n1 character; do
                    (( ASCII_VAL_WORD_2 += $(printf '%d' "'$character") ))
                done < <(echo -n $WORD_2)
                
                if [ $ASCII_VAL_WORD_1 -gt $ASCII_VAL_WORD_2 ] &&
                   [ "$FIRST_CHAR_1" > "$FIRST_CHAR_2" ]; then
    
                    echo "$WORD_1"
                elif [ $ASCII_VAL_WORD_2 -gt $ASCII_VAL_WORD_1 ] &&
                     [ "$FIRST_CHAR_2" > "$FIRST_CHAR_1" ]; then
    
                    echo "$WORD_2"
                elif [ "$FIRST_CHAR_1" > "$FIRST_CHAR_2" ]; then
                    echo "$WORD_1"
                else
                    echo "$WORD_2"
                fi
            else
                echo "$WORD_1"
            fi
        else
            echo $WORD_2
        fi
    
        return 0
    }
    
    sort_alphabetically "apple" "Apple"
    sort_alphabetically "Apple" "apple"
    sort_alphabetically "apPLE" "Apple"
    sort_alphabetically "Apple" "apPLE"
    sort_alphabetically "apple" "Banana"
    sort_alphabetically "apple" "BANANA"
    
    exit 0
    

    prints:

    apple
    apple
    apPLE
    apPLE
    Banana
    BANANA
    

  2. Hints:

    $ (LC_COLLATE=C; if [ "apple" > "Apple" ]; then echo apple; else echo Apple; fi)
    apple
    $ (LC_COLLATE=en_US; if [ "apple" > "Apple" ]; then echo apple; else echo Apple; fi)
    apple
    

    but:

    $ (LC_COLLATE=C; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
    apple
    $ (LC_COLLATE=en_US; if [[ "apple" > "Apple" ]]; then echo apple; else echo Apple; fi)
    Apple
    

    The difference is that the Bash specific test [[ ]] uses the locale collation’s rules to compare strings. Whereas the POSIX test [ ] uses the ASCII value.

    From bash man page:

    When used with [[, the < and > operators sort lexicographically using the current locale.

    When used with test or [, the < and > operators sort lexicographically using ASCII ordering.

    Login or Signup to reply.
  3. Change your syntax. if [[ "Apple" -gt "apple" ]] works as expected.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search