I need to print unique url’s from an apache access log file with the unique count of each url, and I need to do it in a specific date ranges.
we have a logging url that gets parameters with GET, so it’s more optimal to sort through the access file to look for unique urls, count them in the daterange and insert them into the database instead of actively inserting every connection.
the access log is in this format:
11.111.11.111 - - [03/Apr/2019:11:43:11 +0300] "GET /url.php?parameter=&2nd_parameter=15&mana=587&something_else=mana HTTP/1.1" 200 5316 "something:something" "Mozilla/5.0 (Android; U; en-GB) AppleWebKit/533.19.4 (KHTML, like Gecko) AdobeAIR/29.0" 1152 [url.url.com]
I need to do it in time ranges so I could have at least some time frames and the files are pretty big – a days access log can be >10gb. Results of the grep will be parsed with PHP.
cat access_ssl.log | awk '{print $7}' | sort -n | uniq -c
results in unique url’s and their count. I also need to get only results from a specific timeranges.
I expect to input a specific time range, like : 11:00:00
,12:00:00
– an hour for example, and the output to be grouped, counted urls :
20 /url.php?parameter=&2nd_parameter=15&mana=587&something_else=mana
15 /url.php?parameter=&2nd_parameter=15&mana=577&something_else=something_else
2
Answers
I did manage to get a working bash script:
where a php script will be exec(); calling the bash script with 2 parameters as date/hour and waiting for the output file, and then parsing that file.
I hope someone makes use of this.
If you are ok with
awk
, could you please try following.Adding a non-one liner form of solution now.