skip to Main Content

I want to extract the number of requests for each domain separately. Access.log "apache" – Vhost
To get the following result:

# domain10.com  20-11-2020
   560  22:00
   550  22:01
   620  22:02
# test.domain20.com
number request       time
   550              22:01
   620               22:02

I use grep to extract all requests per hour – minute

grep "[domain.com]" /root/eslam33/test/access.log.7 |
cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":"$3}' |
sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}' 

output: from access.log:

105.181.206.150 - - [30/Nov/2020:06:37:03 +0200] "POST /store/web/app.php/api/v3/WEB/products/filter?_locale=en_US HTTP/1.1" 200 19002 "https://from-egypt.com/en_US/collection?taxons=Fashion&sort=date&order=asc&page=1"

but I want to run one command or script to give each domain requests separately. How can I do that?

2

Answers


  1. You can do this with awk and a standard apache log file:

    awk '{ 
            split($4,map1,"[:/]");
            split($11,map2,"/"); # split to get the domain name
            if (map2[3] == "")
            {
              next
            }
            map[map2[3]" "substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 
         } 
     END { 
            for (i in map) 
                          { 
                             print i" - "map[i]
            } 
         }' access_log
    

    One liner:

    awk '{ split($4,map1,"[:/]");split($11,map2,"/");if (map2[3])=="") { next } map[map2[3]" "substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 } END { for (i in map) { print i" - "map[i]} }' access_log
    

    Split the 4th space delimited field using : and / into array called map. Then use the day, month, year, hour and minutes (different indexes of map1) to create an index for another array map. Add the domain name by splitting the 9th field. This array is incremented every time the request is encountered for the same day, month, year, minute and hour. At the end, the data from the array is printed.

    In order to search on a specific domain, simple add a pattern match on field 9 and so:

    awk '$11 ~ /from-egypt.com/ { 
            split($4,map1,"[:/]");
            split($11,map2,"/");
            if (map2[3])=="") 
            { 
              next 
            }
            map[substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 
         } 
     END { 
            for (i in map) 
                          { 
                             print i" - "map[i]
            } 
         }' access_log
    
    Login or Signup to reply.
  2. If you want to extract the text in the last quoted field between the second and third slashes, try

    sed 's%.*"https*://([^/]*)/[^"]*"$%1%' apache.log
    

    If you want the minute and the hour of the visit as a prefix, that’s doable too:

    sed 's%[^[]*[[^:]*:([0-9]*:[0-9]*)].*"https*://([^/]*)/[^"]*"$%1 2%' apache.log
    

    Your sort pipeline should work fine as such; then add a bit of postprocessing to format it to your liking.

    sed 's%[^[]*[[^:]*:([0-9]*:[0-9]*)].*"https*://([^/]*)/[^"]*"$%1 2%' apache.log |
    sort -n | uniq -c |
    awk 'NR == 1 || $2 != prev { print "# " $2; prev = $1; next }
       $1 > 10 { print $1, $3 }'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search