skip to Main Content

I try to get the hits on an access_log by hour, but inside I have some lines that I want to ignore (css/js/etc…)

If I run:

grep "31/Mar" access_log | cut -d[ -f2 | cut -d] -f1 | awk  -F: '{print $2}' | sort -n | uniq -c

Have the expected result, like:

105 03
177 04
153 05
144 06    

But if I add the filter :

grep "31/Mar" access_log | cut -d[ -f2 | cut -d] -f1 | awk  -F '!/.pdf|.css|.png|.jpg|.js/': '{print $2}' | sort -n | uniq -c

The result is one line…

7496

What I doing wrong ?

3

Answers


  1. Chosen as BEST ANSWER

    My error... After tests, I notice the problem was grep. If I refine grep I can ignore lines and apply awk under the result correctly.

    grep -Ev ".js|.css|.jpg|.png|.pdf" access_log | cut -d[ -f2 | cut -d] -f1 | awk  -F : '{print $2}' | sort -n | uniq -c
    

  2. You probably meant to write:

    awk  -F':' '!/.(pdf|css|png|jpg|js)$/{print $2}'
    

    but there’s other issues in your script we could help you with given a MCVE.

    Login or Signup to reply.
  3. All this long pipeline can be done in a single awk as well like this:

    awk -F: '!/.(pdf|css|png|jpg|js)$/ && /31/Mar/ {++freq[$4]}
    END {for (f in freq) print f, freq[f]}' access_log
    
    12 8
    13 2
    14 1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search