I am trying to write a Bash script that processes log files and extracts specific values based on given conditions. The logs are located at a provided URL and contain real web server log data. Each line in the logs starts with a date and looks like this:
Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"
Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
Here is my current script i tried:
awk '/coderbyte heroku/router/ {
match($0, /request_id=([^ ]+)/, req);
match($0, /fwd="([^"]+)"/, fwd);
req_value = req[1];
fwd_value = fwd[1];
if (fwd_value == "MASKED") {
print req_value " [M]";
} else {
print req_value " [" fwd_value "]";
}
}' web-logs-raw
But it keeps saying:
awk: line 2: syntax error at or near ,
awk: line 3: syntax error at or near ,
Please what am i getting wrong?
2
Answers
Let
web-logs-raw
content bethen
gives output
when using GNU Awk 5.3.1, so you are apparently not using GNU
AWK
, check if you have one, if yes (or you are allowed to installgawk
) you should changeawk
togawk
to resolve syntax error. Observe that there is another problem,request=
andfwd=
are not at same line as line with heroku. This is easy to counteract in GNUAWK
, as it is enough to engage paragraph mode, by settingRS
to empty string namelywill give output
Explanation: when
RS
is set to empty string GNUAWK
assumes records are separated by blank lines.As others have mentioned, you aren’t using GNU awk so
match()
doesn’t have a 3rd argument, hence the syntax errors. Fortunately you don’t need it for what you’re trying to do.If your input is actually 6 single lines like this:
then you can do this with any POSIX awk:
or if the input truly is multi-line records separated by blank lines as shown in the question:
then just set
RS
to null to use paragraph mode, again using any awk: