skip to Main Content

I can print the ip and url from a massive log file, but I need to list how many times an ip has visited that url. I have done some research about throwing the log in a database, but I specifically need to do all of this in Python. any help is very appreciated.

My Code so far:

#!/usr/bin/python3
count = 0
log = open("access.log-20201019", "r")
arr = []
frequency_array = []

for i in log.readlines():
        ip = i[0:14]
        ip2 = ip.split(' ')
        ip3 = ip2[0]
        #print(ip3)
        url =i[53:87]
        url2 = url.split()
        url3 = url2[0]
        print(ip3,url3)

Snippet of Log file:

66.177.237.17 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/1/latest.jpeg HTTP/2.0" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 200 37145 "-" "curl/7.46.0" "-"
112.198.71.230 - - [18/Oct/2020:03:06:09 -0400] "GET /precip/raingauge2.gif HTTP/2.0" 200 10078 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:10 -0400] "GET /NHPR/NHPR_rad_an.gif HTTP/2.0" 200 587317 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
173.9.45.97 - - [18/Oct/2020:03:06:11 -0400] "GET /favicon.ico HTTP/2.0" 200 27877 "https://vortex.plymouth.edu/NHPR/NHPR_rad_an.gif" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 200 242804 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 404 2256 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 200 36974 "-" "curl/7.46.0" "-"

I am able to run my current code, but will output the ip and url multiple times for the same ip. I just want the number of times an ip visited a certain url.

2

Answers


  1. I hope I’ve understood your question right:

    text = """
    66.177.237.17 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/1/latest.jpeg HTTP/2.0" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:07 -0400] "GET /webcam/rwis/littlebay/latest.jpeg HTTP/1.1" 200 37145 "-" "curl/7.46.0" "-"
    112.198.71.230 - - [18/Oct/2020:03:06:09 -0400] "GET /precip/raingauge2.gif HTTP/2.0" 200 10078 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" "-"
    173.9.45.97 - - [18/Oct/2020:03:06:10 -0400] "GET /NHPR/NHPR_rad_an.gif HTTP/2.0" 200 587317 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
    173.9.45.97 - - [18/Oct/2020:03:06:11 -0400] "GET /favicon.ico HTTP/2.0" 200 27877 "https://vortex.plymouth.edu/NHPR/NHPR_rad_an.gif" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:11 -0400] "GET /webcam/1/nograph.1.jpeg HTTP/1.1" 200 242804 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:12 -0400] "GET /webcam/rwis/echolake/latest.jpeg HTTP/1.1" 404 2256 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 301 169 "-" "curl/7.46.0" "-"
    158.136.64.65 - - [18/Oct/2020:03:06:14 -0400] "GET /webcam/rwis/lafeyette/latest.jpeg HTTP/1.1" 200 36974 "-" "curl/7.46.0" "-"
    """
    
    import re
    from collections import Counter
    
    pat = re.compile(r"([d.]+).*?(?:GET|POST|PUT|PATCH|DELETE|OPTIONS|HEAD) (S+)")
    
    cnt = Counter()
    for line in text.splitlines():
        m = pat.match(line)
        if m:
            cnt.update([m.groups()])
    
    for (ip, url), how_many_times in cnt.items():
        print(f"{ip} has visited [{url}] {how_many_times} time(s)")
    

    Prints:

    66.177.237.17 has visited [/webcam/1/latest.jpeg] 1 time(s)
    158.136.64.65 has visited [/webcam/rwis/littlebay/latest.jpeg] 2 time(s)
    112.198.71.230 has visited [/precip/raingauge2.gif] 1 time(s)
    173.9.45.97 has visited [/NHPR/NHPR_rad_an.gif] 1 time(s)
    173.9.45.97 has visited [/favicon.ico] 1 time(s)
    158.136.64.65 has visited [/webcam/1/nograph.1.jpeg] 2 time(s)
    158.136.64.65 has visited [/webcam/rwis/echolake/latest.jpeg] 2 time(s)
    158.136.64.65 has visited [/webcam/rwis/lafeyette/latest.jpeg] 2 time(s)
    

    EDIT: To read data from file you can try:

    import re
    from collections import Counter
    
    pat = re.compile(r"([d.]+).*?(?:GET|POST|PUT|PATCH|DELETE|OPTIONS|HEAD) (S+)")
    
    cnt = Counter()
    with open("your_log_file.txt", "r") as f_in:
        for line in f_in:
            m = pat.match(line)
            if m:
                cnt.update([m.groups()])
    
    for (ip, url), how_many_times in cnt.items():
        print(f"{ip} has visited [{url}] {how_many_times} time(s)")
    
    Login or Signup to reply.
  2. I would use a python dictionary, use the IP address as a key. Check if the ip exists in the keys list everytime I come across an IP. It will be something like:

    if k in dict.keys(): dict[k] += 1
    
    else: dict[k] = 1
    

    Or you could make use of dict.setdefault() and set all to 1 and add everytime you find an IP

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search