skip to Main Content

I am trying to find a way to parse a single (apache) log line into blocks.
I know I can change apache config to create a json, but I believe this awk knowledge will help me in the future.

So I have this:

127.0.1.1:80 187.207.66.53 - - [18/Jan/2021:18:28:22 +0100] "GET / HTTP/1.1" 200 2352 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

And want to change it into this:

127.0.1.1:80
187.207.66.53
-
-
[18/Jan/2021:18:28:22 +0100]
"GET / HTTP/1.1"
200
2352
[...]

So basically I believe I need to set up different field separators, am I right?

-F '[<fieldSeparator1>|<fieldSeparator2> ]' '{
for (i = 1; i<= NF; i++)
print $i
}'

2

Answers


  1. With GNU awk and a regex. Tested only with your example.

    awk '{$1=$1; print}' OFS='n' FPAT='"[^"]*"|\[[^]]*]|[0-9:.]+|-' file
    

    FPAT: A regular expression describing the contents of the fields in a record. When set, gawk parses the
    input into fields, where the fields match the regular expression, instead of using the value of FS
    as the field separator.

    Output:

    127.0.1.1:80
    187.207.66.53
    -
    -
    [18/Jan/2021:18:28:22 +0100]
    "GET / HTTP/1.1"
    200
    2352
    "-"
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
    

    See: man awk and The Stack Overflow Regular Expressions FAQ

    Login or Signup to reply.
  2. With GNU awk for the 3rd arg to match():

    $ awk '
        match($0,/(S+) (S+) (S+) (S+) ([[^]]*]) ("[^"]*") (S+) (S+) ("[^"]*") ("[^"]*")/,f) {
            for (i=1; i in f; i++) {
                print f[i]
            }
        }
    ' file
    127.0.1.1:80
    187.207.66.53
    -
    -
    [18/Jan/2021:18:28:22 +0100]
    "GET / HTTP/1.1"
    200
    2352
    "-"
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search