skip to Main Content

I have data from ‘speedtest-cli’ that get gathered in a cronjob.

12818 "Ridge Wireless" "Cupertino, CA" 2024-08-17T17:15:02.811911Z 142.96369289624104 33.358 322418432.2265161 23600938.32538198  67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T18:15:01.957127Z 83.30302174397045 28.576 514447518.90742075 24265080.17742009  67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T19:15:02.006891Z 83.30302174397045 33.523 498012209.6915029 23804897.964512113  67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T20:15:02.800014Z 83.30302174397045 28.459 411379941.90105027 23681434.10062648  67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T21:15:02.543215Z 143.01655083159983 28.8 446156606.22138727 23902144.606594093  67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T22:15:04.113615Z 143.01655083159983 27.703 344460192.3618878 23487079.94843895  67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T23:15:02.831512Z 143.01655083159983 28.307 478754656.0807323 23643468.646293264  67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T00:15:03.190852Z 142.96369289624104 30.545 284195879.4387585 18459056.85763892  67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T01:15:01.977997Z 142.96369289624104 31.603 352557160.9015237 24035464.022335984  67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T02:15:02.302942Z 142.96369289624104 30.178 411204274.5646509 23825237.50161968  67.160.240.135

Note the use of quotation. The rule is that if a token is a multi word string it is enclosed in quotes. Let’s make the assumption that zero, one ore more tokens could fall under that rule. Download speed is column 7, upload speed is column 8.

Trying to get the average of the last 10 measurements using awk gives undesired result.

tail -10 speed.dat | awk '{sum+=$7;n++} END {print sum/n;}'
689.376

Obviously awk fails to treat quotation properly.

What’s the easiest solution? Teach awk handling of the quotation? Or use other program to do the job? (within usual linux install)

Edit per request:

awk --version
awk: not an option: --version

It’s from Raspberry PI linux:

cat /etc/os-release 
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

Expected results : 406358687.2

3

Answers


  1. OP doesn’t appear to have access to GNU awk (aka gawk) so here’s one approach that should work with most (all?) flavors of awk

    NOTE: see 2nd half of this answer for my original response based on GNU awk / FPAT

    Since OP is only interested in the numeric fields we can replace the quoted strings with a dummy string (that contains no white space); this should allow us to reference the numeric fields based on standard white space delimiters.

    We’ll start by replacing all "strings with spaces" with __XXX__:

    awk '{ gsub(/"[^"]+"/,"__XXX__"); print }' speed.dat
    

    This generates:

    12818 __XXX__ __XXX__ 2024-08-17T17:15:02.811911Z 142.96369289624104 33.358 322418432.2265161 23600938.32538198  67.160.240.135
    18531 Wave __XXX__ 2024-08-17T18:15:01.957127Z 83.30302174397045 28.576 514447518.90742075 24265080.17742009  67.160.240.135
    18531 Wave __XXX__ 2024-08-17T19:15:02.006891Z 83.30302174397045 33.523 498012209.6915029 23804897.964512113  67.160.240.135
    18531 Wave __XXX__ 2024-08-17T20:15:02.800014Z 83.30302174397045 28.459 411379941.90105027 23681434.10062648  67.160.240.135
    32408 __XXX__ __XXX__ 2024-08-17T21:15:02.543215Z 143.01655083159983 28.8 446156606.22138727 23902144.606594093  67.160.240.135
    32408 __XXX__ __XXX__ 2024-08-17T22:15:04.113615Z 143.01655083159983 27.703 344460192.3618878 23487079.94843895  67.160.240.135
    32408 __XXX__ __XXX__ 2024-08-17T23:15:02.831512Z 143.01655083159983 28.307 478754656.0807323 23643468.646293264  67.160.240.135
    12818 __XXX__ __XXX__ 2024-08-18T00:15:03.190852Z 142.96369289624104 30.545 284195879.4387585 18459056.85763892  67.160.240.135
    12818 __XXX__ __XXX__ 2024-08-18T01:15:01.977997Z 142.96369289624104 31.603 352557160.9015237 24035464.022335984  67.160.240.135
    12818 __XXX__ __XXX__ 2024-08-18T02:15:02.302942Z 142.96369289624104 30.178 411204274.5646509 23825237.50161968  67.160.240.135
    

    At this point we can reference the 7th and 8th fields as normal:

    awk '
          { gsub(/"[^"]+"/,"__XXX__")
            sum += $7
            n++
          }
    END   { printf "%.2fn", (n==0 ? 0 : sum/n) }
    ' < <(tail -10 speed.dat)
    

    NOTE: (n==0 ? 0 : sum/n) => if n==0 we print 0 else we print the result of sum/n; this keeps us from generating a ‘divide by 0’ error if n==0

    This generates:

    406358687.23
    

    OP has stated the expected result should be 23270480.22 which appears to be related to the 8th column; modifying the code to replace $7 with $8 generates:

    23270480.22
    

    If OP wants to print both averages:

    awk '
          { gsub(/"[^"]+"/,"__XXX__")
            sum7 += $7
            sum8 += $8
            n++
          }
    END   { printf "%-10s%15.2fn%-10s%15.2fn",
                  "download", (n==0 ? 0 : sum7/n),
                  "upload",   (n==0 ? 0 : sum8/n)
          }
    ' < <(tail -10 speed.dat)
    

    This generates:

    download     406358687.23
    upload        23270480.22
    

    Original answer …

    One idea using GNU awk (for FPAT support):

    awk '
    BEGIN { FPAT = ""[^"]+"|[^[:space:]]+" }
          { sum += $7; n++ }
    END   { printf "%.2fn", (n==0 ? 0 : sum/n) }
    ' < <(tail -10 speed.dat)
    

    Where:

    • FPAT = "<fmt#1>|<fmt#2>" – define field format(s); in this case we have 2 different field formats:
    • "[^"]*" – 1st field format: field starts with a double quote, followed by zero or more characters that are not a double quote, ending with a double quote
    • [^[:space:]]+ – 2nd field format: field is a string of one or more characters that does not include white space

    This generates:

    406358687.23           # for $7
     23270480.22           # for $8
    

    Printing both averages:

    awk '
    BEGIN { FPAT = ""[^"]+"|[^[:space:]]+" }
          { sum7 += $7
            sum8 += $8
            n++
          }
    END   { printf "%-10s%15.2fn%-10s%15.2fn",
                  "download", (n==0 ? 0 : sum7/n),
                  "upload",   (n==0 ? 0 : sum8/n)
          }
    ' < <(tail -10 speed.dat)
    

    This generates:

    download     406358687.23
    upload        23270480.22
    
    Login or Signup to reply.
  2. As long as there are no quoted fields to the right of the ones of interest, counting backwards is possible:

    $ tail -10 log |
      awk '{ down+=$(NF-2) } END{ OFMT="%.1f"; if(NR) print down/NR }'
    406358687.2
    
    Login or Signup to reply.
  3. With GNU awk:

    awk 'BEGIN {FPAT = @/s+[0-9]+.[0-9]+s+/} {s += $1} END {print s/NR}' speed.dat
    

    With any POSIX awk:

    awk '{for(i = 1; i <= NF; i++) if($i ~ /^[0-9]+.[0-9]+$/) {s + = $i; next}}
         END {print s/NR}' speed.dat
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search