skip to Main Content

My trivial test shows that sort can sort on $-values (like $2,734, $278,280, etc), the US formatted numbers with commas as the thousands separator, just fine:

$ echo -e '$2,734n$278,280n$26,845' | sort
$2,734
$26,845
$278,280

$ echo -e '$5,088n$26n$8,714' | sort -r
$8,714
$5,088
$26

But why it does not work for me any more when I’m sorting them when they’re in fields?

$ cat test.md
9.|||85,000|$5,088|NEW|2.99%
8.|||6,086|$348|NEW|0.1%
8.|||566|$26|NEW|0.02%
8.|||492,900|$8,714|NEW|0.71%
7.|||25,737|$180|NEW|0.14%
6.|||2,821|$511|NEW|0.4%
5.|||14,980|$2,734|NEW|2.17%
4.|||85,578|$376|NEW|0.02%
4.|||173,750|$18,369|NEW|10.79%
4.|||1,019,133|$40,388|NEW|7.3%
3.|||3,692,661|$25,244|NEW|20.08%
3.|||15,303,963|$202,471|NEW|14.83%
29.|||1,000|$32|NEW|0%
27.|||100,000|$1,000|NEW|0.18%
26.|||27,752,808|$490,971|NEW|0.65%

The cat test.md | sort -t'|' -k5 -r -g gives exact output as above. And even sorting on -k4 doesn’t work for me:

$ cat test.md | sort -t'|' -k4 -r -g
8.|||566|$26|NEW|0.02%
8.|||492,900|$8,714|NEW|0.71%
4.|||173,750|$18,369|NEW|10.79%
27.|||100,000|$1,000|NEW|0.18%
9.|||85,000|$5,088|NEW|2.99%
4.|||85,578|$376|NEW|0.02%
26.|||27,752,808|$490,971|NEW|0.65%
7.|||25,737|$180|NEW|0.14%
3.|||15,303,963|$202,471|NEW|14.83%
5.|||14,980|$2,734|NEW|2.17%
8.|||6,086|$348|NEW|0.1%
3.|||3,692,661|$25,244|NEW|20.08%
6.|||2,821|$511|NEW|0.4%
4.|||1,019,133|$40,388|NEW|7.3%
29.|||1,000|$32|NEW|0%

And cat test.md | sort -t'|' -k4,4 -r -g gives exact output as above as well.

All I want to do is to sort that US formatted numbers with commas as the thousands separator field in reverse numeric order.

There must be something obvious that I’m missing, but my mind is blocked now, and I have no further ideas. Please help.

This is under Debian WSL:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

$ uname -rm
4.4.0-19041-Microsoft x86_64

$ sort --version
sort (GNU coreutils) 8.30

2

Answers


  1. Chosen as BEST ANSWER

    As you may be able to tell, the phrase "US formatted numbers with commas as the thousands separator" comes from https://unix.stackexchange.com/questions/507678/, which is what I was trying to duplicate. Apparently there are some minutia details that I've overlooked.

    Here is how to properly sort -k4:

    $ cat test.md | LC_ALL=en_US.utf8 sort -t'|' -k4 -r -h
    26.|||27,752,808|$490,971|NEW|0.65%
    3.|||15,303,963|$202,471|NEW|14.83%
    3.|||3,692,661|$25,244|NEW|20.08%
    4.|||1,019,133|$40,388|NEW|7.3%
    8.|||492,900|$8,714|NEW|0.71%
    4.|||173,750|$18,369|NEW|10.79%
    27.|||100,000|$1,000|NEW|0.18%
    4.|||85,578|$376|NEW|0.02%
    9.|||85,000|$5,088|NEW|2.99%
    7.|||25,737|$180|NEW|0.14%
    5.|||14,980|$2,734|NEW|2.17%
    8.|||6,086|$348|NEW|0.1%
    6.|||2,821|$511|NEW|0.4%
    29.|||1,000|$32|NEW|0%
    8.|||566|$26|NEW|0.02%
    

    And for sorting on $-values, just like @markp-fuso pointed out, it doesn't work. But for my specific case, it's an easy fix for me as I can remove that "$", and no need to keep it in my file.


  2. One idea for sorting on data with multiple delimiters (in this case |, $ and ,) is to pull the ‘sort’ column, strip out all secondary delimiters, append said data to front of line, sort numerically, then strip off the first column.

    One idea using awk to pull 5th field to front of line minus the $ and ,:

    $ awk -F'|' '{new1=$5;gsub("[,$]","",new1);print new1"|"$0}' test.md
    5088|9.|||85,000|$5,088|NEW|2.99%
    348|8.|||6,086|$348|NEW|0.1%
    26|8.|||566|$26|NEW|0.02%
    8714|8.|||492,900|$8,714|NEW|0.71%
    180|7.|||25,737|$180|NEW|0.14%
    511|6.|||2,821|$511|NEW|0.4%
    2734|5.|||14,980|$2,734|NEW|2.17%
    376|4.|||85,578|$376|NEW|0.02%
    18369|4.|||173,750|$18,369|NEW|10.79%
    40388|4.|||1,019,133|$40,388|NEW|7.3%
    25244|3.|||3,692,661|$25,244|NEW|20.08%
    202471|3.|||15,303,963|$202,471|NEW|14.83%
    32|29.|||1,000|$32|NEW|0%
    1000|27.|||100,000|$1,000|NEW|0.18%
    490971|26.|||27,752,808|$490,971|NEW|0.65%
    

    We now pipe this to sort:

    $ awk -F'|' '{new1=$5;gsub("[,$]","",new1);print new1"|"$0}' test.md | sort -t'|' -k1,1nr
    490971|26.|||27,752,808|$490,971|NEW|0.65%
    202471|3.|||15,303,963|$202,471|NEW|14.83%
    40388|4.|||1,019,133|$40,388|NEW|7.3%
    25244|3.|||3,692,661|$25,244|NEW|20.08%
    18369|4.|||173,750|$18,369|NEW|10.79%
    8714|8.|||492,900|$8,714|NEW|0.71%
    5088|9.|||85,000|$5,088|NEW|2.99%
    2734|5.|||14,980|$2,734|NEW|2.17%
    1000|27.|||100,000|$1,000|NEW|0.18%
    511|6.|||2,821|$511|NEW|0.4%
    376|4.|||85,578|$376|NEW|0.02%
    348|8.|||6,086|$348|NEW|0.1%
    180|7.|||25,737|$180|NEW|0.14%
    32|29.|||1,000|$32|NEW|0%
    26|8.|||566|$26|NEW|0.02%
    

    Now pass through cut (or sed or another awk) to strip off first field:

    $ awk -F'|' '{new1=$5;gsub("[,$]","",new1);print new1"|"$0}' test.md | sort -t'|' -k1,1nr | cut -d'|' -f2-
    26.|||27,752,808|$490,971|NEW|0.65%
    3.|||15,303,963|$202,471|NEW|14.83%
    4.|||1,019,133|$40,388|NEW|7.3%
    3.|||3,692,661|$25,244|NEW|20.08%
    4.|||173,750|$18,369|NEW|10.79%
    8.|||492,900|$8,714|NEW|0.71%
    9.|||85,000|$5,088|NEW|2.99%
    5.|||14,980|$2,734|NEW|2.17%
    27.|||100,000|$1,000|NEW|0.18%
    6.|||2,821|$511|NEW|0.4%
    4.|||85,578|$376|NEW|0.02%
    8.|||6,086|$348|NEW|0.1%
    7.|||25,737|$180|NEW|0.14%
    29.|||1,000|$32|NEW|0%
    8.|||566|$26|NEW|0.02%
    

    Which leaves us with the initial data sorted by the 5th field.

    NOTE: with a bit more coding could do sort|cut inside the first awk

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search