skip to Main Content

The output I have is the following:

T 2020/03/05 16:06:41.565817 193.126.13.199:80 -> 10.8.0.4:55639 [AP] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:41 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache, 
T 2020/03/05 16:06:46.727199 10.8.0.4:55642 -> 193.126.13.199:80 [AP] GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/xml;q=0.9,image/webp,*/*;q=0.8..Accept-Langu
T 2020/03/05 16:06:47.174078 193.126.13.199:80 -> 10.8.0.4:55642 [A] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:46 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache

How can I do a regex pattern to match only the [AP] rows?

Something like:

T 2020/03/05 16:06:46.727199 10.8.0.4:55642 -> 193.126.13.199:80 [AP] GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0)

So.. the first group: 2020/03/05

Second group: 16:06:46.727199

Third group: 10.8.0.4:55642

Fourth group: GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0)

I have the following python regex:

pattern = r'''Ts([^ ]+)s([^ ]+)s([^ ]+).*?[.]{2,}(.*?)[.]{2,}'''

Not working like I want..

2

Answers


  1. You could add matching 2 extra parts matching a whitespace and non whitespace chars and match the [AP] part

     Ts(S+)s(S+)s(S+)sS+sS+s[AP].*?.{2}(.*?).{2}.*
    

    Regex demo | Python demo

    import re
    
    regex = r"Ts(S+)s(S+)s(S+)sS+sS+s[AP].*?.{2}(.*?).{2}.*"
    
    test_str = ("T 2020/03/05 16:06:41.565817 193.126.13.199:80 -> 10.8.0.4:55639 [AP] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:41 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache, n"
        "T 2020/03/05 16:06:46.727199 10.8.0.4:55642 -> 193.126.13.199:80 [AP] GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/xml;q=0.9,image/webp,*/*;q=0.8..Accept-Langun"
        "T 2020/03/05 16:06:47.174078 193.126.13.199:80 -> 10.8.0.4:55642 [A] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:46 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache")
    
    matches = re.finditer(regex, test_str)
    
    for matchNum, match in enumerate(matches, start=1):
        print (match.group())
    

    Output

    T 2020/03/05 16:06:41.565817 193.126.13.199:80 -> 10.8.0.4:55639 [AP] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:41 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache, 
    T 2020/03/05 16:06:46.727199 10.8.0.4:55642 -> 193.126.13.199:80 [AP] GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/xml;q=0.9,image/webp,*/*;q=0.8..Accept-Langu
    
    Login or Signup to reply.
  2. Why not the obvious in operator?

    data = """
    T 2020/03/05 16:06:41.565817 193.126.13.199:80 -> 10.8.0.4:55639 [AP] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:41 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache, 
    T 2020/03/05 16:06:46.727199 10.8.0.4:55642 -> 193.126.13.199:80 [AP] GET / HTTP/1.1..Host: www.radionova.fm..User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/xml;q=0.9,image/webp,*/*;q=0.8..Accept-Langu
    T 2020/03/05 16:06:47.174078 193.126.13.199:80 -> 10.8.0.4:55642 [A] HTTP/1.1 200 OK..Date: Thu, 05 Mar 2020 16:06:46 GMT..Server: Apache/2.2.3 (CentOS)..Expires: Thu, 19 Nov 1981 08:52:00 GMT..Cache-Control: no-store, no-cache
    """
    
    rows_ap = [(splitted[1], splitted[2], splitted[3], " ".join(splitted[7:]))
               for line in data.split("n")
               if line and "[AP]" in line
               for splitted in [line.split(" ")]]
    
    print(rows_ap)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search