I have data from ‘speedtest-cli’ that get gathered in a cronjob.
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-17T17:15:02.811911Z 142.96369289624104 33.358 322418432.2265161 23600938.32538198 67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T18:15:01.957127Z 83.30302174397045 28.576 514447518.90742075 24265080.17742009 67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T19:15:02.006891Z 83.30302174397045 33.523 498012209.6915029 23804897.964512113 67.160.240.135
18531 Wave "San Francisco, CA" 2024-08-17T20:15:02.800014Z 83.30302174397045 28.459 411379941.90105027 23681434.10062648 67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T21:15:02.543215Z 143.01655083159983 28.8 446156606.22138727 23902144.606594093 67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T22:15:04.113615Z 143.01655083159983 27.703 344460192.3618878 23487079.94843895 67.160.240.135
32408 "EGI Hosting" "Santa Clara, CA" 2024-08-17T23:15:02.831512Z 143.01655083159983 28.307 478754656.0807323 23643468.646293264 67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T00:15:03.190852Z 142.96369289624104 30.545 284195879.4387585 18459056.85763892 67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T01:15:01.977997Z 142.96369289624104 31.603 352557160.9015237 24035464.022335984 67.160.240.135
12818 "Ridge Wireless" "Cupertino, CA" 2024-08-18T02:15:02.302942Z 142.96369289624104 30.178 411204274.5646509 23825237.50161968 67.160.240.135
Note the use of quotation. The rule is that if a token is a multi word string it is enclosed in quotes. Let’s make the assumption that zero, one ore more tokens could fall under that rule. Download speed is column 7, upload speed is column 8.
Trying to get the average of the last 10 measurements using awk gives undesired result.
tail -10 speed.dat | awk '{sum+=$7;n++} END {print sum/n;}'
689.376
Obviously awk fails to treat quotation properly.
What’s the easiest solution? Teach awk handling of the quotation? Or use other program to do the job? (within usual linux install)
Edit per request:
awk --version
awk: not an option: --version
It’s from Raspberry PI linux:
cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
Expected results : 406358687.2
3
Answers
OP doesn’t appear to have access to
GNU awk
(akagawk
) so here’s one approach that should work with most (all?) flavors ofawk
…NOTE: see 2nd half of this answer for my original response based on
GNU awk / FPAT
Since OP is only interested in the numeric fields we can replace the quoted strings with a dummy string (that contains no white space); this should allow us to reference the numeric fields based on standard white space delimiters.
We’ll start by replacing all
"strings with spaces"
with__XXX__
:This generates:
At this point we can reference the 7th and 8th fields as normal:
NOTE:
(n==0 ? 0 : sum/n)
=> ifn==0
we print0
else we print the result ofsum/n
; this keeps us from generating a ‘divide by 0’ error ifn==0
This generates:
OP has stated the expected result should be
23270480.22
which appears to be related to the 8th column; modifying the code to replace$7
with$8
generates:If OP wants to print both averages:
This generates:
Original answer …
One idea using
GNU awk
(forFPAT
support):Where:
FPAT = "<fmt#1>|<fmt#2>"
– define field format(s); in this case we have 2 different field formats:"[^"]*"
– 1st field format: field starts with a double quote, followed by zero or more characters that are not a double quote, ending with a double quote[^[:space:]]+
– 2nd field format: field is a string of one or more characters that does not include white spaceThis generates:
Printing both averages:
This generates:
As long as there are no quoted fields to the right of the ones of interest, counting backwards is possible:
With GNU
awk
:With any POSIX
awk
: