skip to Main Content

I want to parse an Apache log file such as:

1.1.1.1 - - [12/Dec/2019:18:25:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
1.1.1.1 - - [13/Dec/2019:18:25:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
2.2.2.2 - - [13/Dec/2019:18:27:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
2.2.2.2 - - [13/Jan/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
3.3.3.3 - - [13/Jan/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
1.1.1.1 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"

I need to get list of clients IPs visited per month. I have something like this

awk '{print $1,$4}' access.log | grep Dec | cut -d" " -f1 | uniq -c

but this is wrong because it counts visits ip per day.

The expected result should be like (indentation doesn’t matter):

Dec 2019
1.1.1.1 2
2.2.2.2 1
Jan 2020
2.2.2.2 1
3.3.3.3 1
Feb 2020
4.4.4.4 3
1.1.1.1 1

where 2 are total amount of visits from 1.1.1.1 ip per Dec 2019.

Could you suggest me an approach how to do it?

4

Answers


  1. Though your sample expected output doesn’t look to match your shown sample, based on your shown sample output and description, could you please try following. Also since this is a log file I will go with field separators method of awk since logs will be of fixed pattern.

    awk -F':| |-|/+|]' '
    {
      ind[$7 OFS $8 OFS $1]++
      value[$7 OFS $8 OFS $1]=$1
    }
    END{
      for(i in value){
        split(i,arr," ")
        print arr[1],arr[2] ORS value[i],ind[i]
      }
    }' Input_file
    

    Explanation: Adding detailed explanation for above.

    awk -F':| |-|/+|]' '                             ##Starting awk program from here and setting field separators as : space - / ] here.
    {
      ind[$7 OFS $8 OFS $1]++                        ##Creating ind array whose index is 7th 8th and 1st field and keep increasing value with 1 here.
      value[$7 OFS $8 OFS $1]=$1                     ##Creating value with index of 7th, 8th and 1st field and its value is 1st field.
    }
    END{                                             ##Starting END block of this program from here.
      for(i in value){                               ##Traversing through value elements here.
        split(i,arr," ")                             ##Splitting i into array arr with delimiter as space here.
        print arr[1],arr[2] ORS value[i],ind[i]      ##Printing 1st and 2nd element of arr with ORS(new line) and array value and ind value here.
      }
    }' Input_file                                    ##Mentioning Input_file name here.
    
    Login or Signup to reply.
  2. try this..

    shell:

    #!/usr/bin/env bash
    LOG_FILE=$1
    
    #regex to find mmm/yyyy
    dateUniq=`grep -oP '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/d{4}' $LOG_FILE | sort | uniq`
    
    
    for i in $dateUniq
    do  
        #output mmm yyyy
        echo $i | sed 's/// /g'
        
        #regex to find ip
        ipUniq=`grep $i $LOG_FILE | grep -oP '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'  | sort | uniq`
        
        for x in $ipUniq
        do  
            count=`grep $i $LOG_FILE |grep -c $x`
            #output count ip
            echo $count $x
        done
        echo
    done
    

    output:

    Dec 2019
    2 1.1.1.1
    1 2.2.2.2
    
    Feb 2020
    1 1.1.1.1
    3 4.4.4.4
    
    Jan 2020
    1 2.2.2.2
    1 3.3.3.3
    
    Login or Signup to reply.
  3. One for GNU awk, that outputs in the order the data was fed in (ie. chronological data such as log records should be output in that order):

    $ gawk '                     # using GNU awk
    BEGIN {
        a[""][""]                # initialize a 2D array
    }
    {
        split($4,t,/[/:]/)       # split datetime 
        my=t[2] OFS t[3]         # my=month year
        if(!(my in mye)) {       # if current my unseen
            mye[my]=++myi        # update month year exists array with new index
            mya[myi]=my          # chronology is made
        }
        a[mye[my]][$1]++         # update record to a hash
    }
    END {                        # in the end
        # PROCINFO["sorted_in"]="@val_num_desc"  # this may work for ordering visits
        for(i=1;i<=myi;i++) {    # in fed order 
            print mya[i]         # print month year
            for(j in a[i])       # then related ips in no particular order
                print j,a[i][j]  # output ip and count
        }
    }' file
    

    Output:

    Dec 2019
    1.1.1.1 2
    2.2.2.2 1
    Jan 2020
    2.2.2.2 1
    3.3.3.3 1
    Feb 2020
    1.1.1.1 1
    4.4.4.4 3
    
    Login or Signup to reply.
  4. for a quick summarize access log. just run below commands.

    cat /var/log/apache2/access.log|awk '{print $1}'|sort -nr |uniq -c |sort -nr |head -n 25
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search