skip to Main Content

We are using below awk commands to split numbers and alphabets in a alphanumeric text.

echo "1.5GB" |awk '{ gsub(/([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/,"&n",$0) ; print "size="$1"nsymbol="$2}'

This command gives desired result in Ubuntu 20.04. Result is

size=1.5
symbol=GB

But in Ubuntu 18.04 it gives below result,which is not a desired result

size=1.5GB
symbol=

3

Answers


  1. i can’t replicate the issue – all my awk‘s outputs ended up with the same hashed value :

    % echo "1.5GB" | nawk '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum 
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | mawk '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | mawk2 '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | gawk -be '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | gawk -ne '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | gawk -ce '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    % echo "1.5GB" | gawk -Pe '{ print NR,NF,$0,$1,$NF; gsub(/[[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+/,"&n",$0) ; print NR,NF,$0,$1,$NF }' | xxh128sum
    1b0095d0c4c02859a61a0ab5a3253b58  stdin
    
    Login or Signup to reply.
  2. Although it is unclear what change in mawk 1.3.4 versus 1.3.3 made your code work, the code is logically flawed to begin with if the intent is to display the numeric portion of the input as size and the alphabetical portion as symbol even when one of the two components is missing, since the call to gsub makes whichever alphabetical or numeric characters it gets the first field. For example, if the input is just GB, your code will output:

    size=GB
    symbol=
    

    which I don’t think is desired.

    A better approach is to remove the alphabetical portion from the input to make it size, and remove the numeric portion from the input to make it symbol:

    awk '{s=$0;sub(/[[:alpha:]]+/,"",s);sub(/[[:digit:].-]+/,"");print"size="s"nsymbol="$0}'
    
    Login or Signup to reply.
  3. That 1996 mawk is a minimal-featured version of awk designed for speed of execution. It’s not POSIX compliant and so shouldn’t be expected to support POSIX character classes. Get a new version if at all possible or change this:

    /([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/
    

    to this:

    /([a-zA-Z]+|[0-9.-]+|[^a-zA-Z0-9.-]+)/
    

    e.g.:

    echo "1.5GB" |awk '{ gsub(/([a-zA-Z]+|[0-9.-]+|[^a-zA-Z0-9.-]+)/,"&n",$0) ; print "size="$1"nsymbol="$2}'
    size=1.5
    symbol=GB
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search