How to extract different domains from access.log - Apache

Islamshehata
December 1, 2020
221 views
0 votes
2 Answers

I want to extract the number of requests for each domain separately. Access.log "apache" – Vhost
To get the following result:

# domain10.com  20-11-2020
   560  22:00
   550  22:01
   620  22:02
# test.domain20.com
number request       time
   550              22:01
   620               22:02

I use grep to extract all requests per hour – minute

grep "[domain.com]" /root/eslam33/test/access.log.7 |
cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":"$3}' |
sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}'

output: from access.log:

105.181.206.150 - - [30/Nov/2020:06:37:03 +0200] "POST /store/web/app.php/api/v3/WEB/products/filter?_locale=en_US HTTP/1.1" 200 19002 "https://from-egypt.com/en_US/collection?taxons=Fashion&sort=date&order=asc&page=1"

but I want to run one command or script to give each domain requests separately. How can I do that?

Answers

You can do this with awk and a standard apache log file:

awk '{ 
        split($4,map1,"[:/]");
        split($11,map2,"/"); # split to get the domain name
        if (map2[3] == "")
        {
          next
        }
        map[map2[3]" "substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 
     } 
 END { 
        for (i in map) 
                      { 
                         print i" - "map[i]
        } 
     }' access_log

One liner:

awk '{ split($4,map1,"[:/]");split($11,map2,"/");if (map2[3])=="") { next } map[map2[3]" "substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 } END { for (i in map) { print i" - "map[i]} }' access_log

Split the 4th space delimited field using : and / into array called map. Then use the day, month, year, hour and minutes (different indexes of map1) to create an index for another array map. Add the domain name by splitting the 9th field. This array is incremented every time the request is encountered for the same day, month, year, minute and hour. At the end, the data from the array is printed.

In order to search on a specific domain, simple add a pattern match on field 9 and so:

awk '$11 ~ /from-egypt.com/ { 
        split($4,map1,"[:/]");
        split($11,map2,"/");
        if (map2[3])=="") 
        { 
          next 
        }
        map[substr(map1[1],2)" "map1[2]" "map1[3]" "map1[4]":"map1[5]]+=1 
     } 
 END { 
        for (i in map) 
                      { 
                         print i" - "map[i]
        } 
     }' access_log

- tripleee
- December 1, 2020 at 3:08 pm
- 0 votes
0
If you want to extract the text in the last quoted field between the second and third slashes, try
```
sed 's%.*"https*://([^/]*)/[^"]*"$%1%' apache.log
```
If you want the minute and the hour of the visit as a prefix, that’s doable too:
```
sed 's%[^[]*[[^:]*:([0-9]*:[0-9]*)].*"https*://([^/]*)/[^"]*"$%1 2%' apache.log
```
Your sort pipeline should work fine as such; then add a bit of postprocessing to format it to your liking.
```
sed 's%[^[]*[[^:]*:([0-9]*:[0-9]*)].*"https*://([^/]*)/[^"]*"$%1 2%' apache.log |
sort -n | uniq -c |
awk 'NR == 1 || $2 != prev { print "# " $2; prev = $1; next }
   $1 > 10 { print $1, $3 }'
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to extract different domains from access.log – Apache

Answers