Need to get unique url's from the apache acess log - with their count

PaaPs
April 3, 2019
131 views
0 votes
2 Answers

I need to print unique url’s from an apache access log file with the unique count of each url, and I need to do it in a specific date ranges.

we have a logging url that gets parameters with GET, so it’s more optimal to sort through the access file to look for unique urls, count them in the daterange and insert them into the database instead of actively inserting every connection.

the access log is in this format:

11.111.11.111 - - [03/Apr/2019:11:43:11 +0300] "GET /url.php?parameter=&2nd_parameter=15&mana=587&something_else=mana HTTP/1.1" 200 5316 "something:something" "Mozilla/5.0 (Android; U; en-GB) AppleWebKit/533.19.4 (KHTML, like Gecko) AdobeAIR/29.0" 1152 [url.url.com]

I need to do it in time ranges so I could have at least some time frames and the files are pretty big – a days access log can be >10gb. Results of the grep will be parsed with PHP.

cat access_ssl.log | awk '{print $7}' | sort -n | uniq -c

results in unique url’s and their count. I also need to get only results from a specific timeranges.

I expect to input a specific time range, like : 11:00:00,12:00:00 – an hour for example, and the output to be grouped, counted urls :

20 /url.php?parameter=&2nd_parameter=15&mana=587&something_else=mana
15 /url.php?parameter=&2nd_parameter=15&mana=577&something_else=something_else

Tags: apache grep

Answers

Chosen as BEST ANSWER
- PaaPs
- April 3, 2019 at 2:02 pm
- 0 votes
0
I did manage to get a working bash script:
```
#!/bin/sh
DATE1=$1
DATE2=$2
cat /var/log/apache2/access_ssl.log | awk '$4 >= "['${DATE2}'" && $4 < "[i'${DATE2}'"' | awk '{print $7}' | sort -n | uniq -c > file.log
```
where a php script will be exec(); calling the bash script with 2 parameters as date/hour and waiting for the output file, and then parsing that file.

I hope someone makes use of this.

(Edit)

- RavinderSingh13
- April 3, 2019 at 11:36 am
- 0 votes
0
If you are ok with awk, could you please try following.
```
awk 'match($0,//url.php.*_else=[^( HTTP)]*/){++a[substr($0,RSTART,RLENGTH)]} END{for(i in a){print a[i],i}}'  Input_file
```
Adding a non-one liner form of solution now.
```
awk '
match($0,//url.php.*_else=[^( HTTP)]*/){
  ++a[substr($0,RSTART,RLENGTH)]
}
END{
  for(i in a){
    print a[i],i
  }
}'  Input_file
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Need to get unique url's from the apache acess log – with their count

Answers