skip to Main Content

I am trying to write a Bash script that processes log files and extracts specific values based on given conditions. The logs are located at a provided URL and contain real web server log data. Each line in the logs starts with a date and looks like this:

Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https

Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
/backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
Safari/537.36"

Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https

Here is my current script i tried:

awk '/coderbyte heroku/router/ {
    match($0, /request_id=([^ ]+)/, req);
    match($0, /fwd="([^"]+)"/, fwd);
    req_value = req[1];
    fwd_value = fwd[1];

    if (fwd_value == "MASKED") {
        print req_value " [M]";
    } else {
        print req_value " [" fwd_value "]";
    }
}' web-logs-raw

But it keeps saying:

awk: line 2: syntax error at or near ,
awk: line 3: syntax error at or near ,

Please what am i getting wrong?

2

Answers


  1. Let web-logs-raw content be

    Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
    /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
    dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
    fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
    fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
    
    Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
    /backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
    Safari/537.36"
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
    fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
    

    then

    awk '/coderbyte heroku/router/ {
        match($0, /request_id=([^ ]+)/, req);
        match($0, /fwd="([^"]+)"/, fwd);
        req_value = req[1];
        fwd_value = fwd[1];
    
        if (fwd_value == "MASKED") {
            print req_value " [M]";
        } else {
            print req_value " [" fwd_value "]";
        }
    }' web-logs-raw
    

    gives output

     []
     []
     []
     []
    

    when using GNU Awk 5.3.1, so you are apparently not using GNU AWK, check if you have one, if yes (or you are allowed to install gawk) you should change awk to gawk to resolve syntax error. Observe that there is another problem, request= and fwd= are not at same line as line with heroku. This is easy to counteract in GNU AWK, as it is enough to engage paragraph mode, by setting RS to empty string namely

    gawk 'BEGIN{RS=""}/coderbyte heroku/router/ {
        match($0, /request_id=([^ ]+)/, req);
        match($0, /fwd="([^"]+)"/, fwd);
        req_value = req[1];
        fwd_value = fwd[1];
    
        if (fwd_value == "MASKED") {
            print req_value " [M]";
        } else {
            print req_value " [" fwd_value "]";
        }
    }' web-logs-raw
    

    will give output

    b19a87a1-1bbb-46e7-b207-bd9f23d46afa [108.31.000.000]
    910b07d1-3f71-4347-a1a7-bfa20384ef65
    fwd="108.31.000.000" [108.31.000.000]
    097bf65e-e189-4f9f-9dfb-4758cff411b2
    fwd="108.31.000.000" [108.31.000.000]
    d48278c2-5731-464e-be38-ab9ad84ca4a8
    fwd="108.31.000.000" [108.31.000.000]
    

    Explanation: when RS is set to empty string GNU AWK assumes records are separated by blank lines.

    Login or Signup to reply.
  2. As others have mentioned, you aren’t using GNU awk so match() doesn’t have a 3rd argument, hence the syntax errors. Fortunately you don’t need it for what you’re trying to do.

    If your input is actually 6 single lines like this:

    $ cat web-logs-raw-1
    Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder? key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000" dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder? shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65 fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder? shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2 fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
    Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET /backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder? shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8 fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
    

    then you can do this with any POSIX awk:

    $ awk '/coderbyte heroku/router/ {
        match($0, /request_id=[^[:space:]]+/)
        req_value = substr($0, RSTART+11, RLENGTH-11)
        match($0, /fwd="[^"]+"/)
        fwd_value = substr($0, RSTART+5, RLENGTH-6)
    
        if (fwd_value == "MASKED") {
            print req_value " [M]";
        } else {
            print req_value " [" fwd_value "]";
        }
    }' web-logs-raw-1
    b19a87a1-1bbb-46e7-b207-bd9f23d46afa [108.31.000.000]
    910b07d1-3f71-4347-a1a7-bfa20384ef65 [108.31.000.000]
    097bf65e-e189-4f9f-9dfb-4758cff411b2 [108.31.000.000]
    d48278c2-5731-464e-be38-ab9ad84ca4a8 [108.31.000.000]
    

    or if the input truly is multi-line records separated by blank lines as shown in the question:

    $ cat web-logs-raw-2
    Apr 10 11:17:35 coderbyte app/web.3: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
    /backend/requests/editor/placeholder?shareLinkId=69dff0hba0nv HTTP/1.1" 200 148 "https://coderbyte.com"
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    key=s2fwad25E" host=coderbyte.com request_id=b19a87a1-1bbb-46e7-b207-bd9f23d46afa fwd="108.31.000.000"
    dyno=web.3 connect=0ms service=92ms status=200 bytes=3194 protocol=https
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=910b07d1-3f71-4347-a1a7-bfa20384ef65
    fwd="108.31.000.000" dyno=web.2 connect=1ms service=17ms status=200 bytes=4435 protocol=https
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=tosrve4v8q8q" host=coderbyte.com request_id=097bf65e-e189-4f9f-9dfb-4758cff411b2
    fwd="108.31.000.000" dyno=web.3 connect=1ms service=10ms status=200 bytes=4435 protocol=https
    
    Apr 10 11:17:35 coderbyte app/web.2: IP_MASKED - - [10/Apr/2020:18:17:35 +0000] "GET
    /backend/requests/editor/placeholder?key=s2fwad25E HTTP/1.1" 200 4263 "https://coderbyte.com"
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163
    Safari/537.36"
    
    Apr 10 11:17:35 coderbyte heroku/router: at=info method=GET path="/backend/requests/editor/placeholder?
    shareLinkId=4eiramcayuo" host=coderbyte.com request_id=d48278c2-5731-464e-be38-ab9ad84ca4a8
    fwd="108.31.000.000" dyno=web.4 connect=1ms service=7ms status=200 bytes=3194 protocol=https
    

    then just set RS to null to use paragraph mode, again using any awk:

    $ awk -v RS= '/coderbyte heroku/router/ {
        match($0, /request_id=[^[:space:]]+/)
        req_value = substr($0, RSTART+11, RLENGTH-11)
        match($0, /fwd="[^"]+"/)
        fwd_value = substr($0, RSTART+5, RLENGTH-6)
    
        if (fwd_value == "MASKED") {
            print req_value " [M]";
        } else {
            print req_value " [" fwd_value "]";
        }
    }' web-logs-raw-2
    b19a87a1-1bbb-46e7-b207-bd9f23d46afa [108.31.000.000]
    910b07d1-3f71-4347-a1a7-bfa20384ef65 [108.31.000.000]
    097bf65e-e189-4f9f-9dfb-4758cff411b2 [108.31.000.000]
    d48278c2-5731-464e-be38-ab9ad84ca4a8 [108.31.000.000]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search