skip to Main Content

I’m working on a Bash script to parse Postgres error logs and pull out log entries between certain dates/times. The complicating factor is that entries can be multiline, and only the first line includes the time stamp. Log entries look like this:

YYYY-MM-DD hh:mm:ss UTC:<IP address>(<port>):<user>@<host>:[pid]:<msgtype>:<message>

where <message> can be 1 or more lines. <msgtype> is ERROR, STATEMENT, DETAIL, etc. Almost any <msgtype> can have any number of lines.

Awk will be processing multiple files on the command line, and while the lines in the file are in timestamp order, the files aren’t necessarily so. Currently I’m just doing a simple compare (awk '{if ($1 >= "$first") { print } }')* where $first is set to the beginning time stamp. Adding a check for a $last is trivial, the problem is getting those lines that don’t start with a timestamp, and only those following a matching one.

Can someone point me in the right direction for this?

*It just accorred to me that this will only compare the date and not the time, so can someone help with this part as well? Can I do awk '{if ( ($1" "$2) >= "$first") { print } }'?

ETA: sample log entry:

 2023-11-07 07:01:25 UTC::@:[605]:ERROR: could not connect to the publisher: connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL on
 connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL off

2

Answers


  1. It’s possible that this might be what you’re trying to do but without useful sample input/output that demonstrates all your requirements, it’s a guess based on multiple assumption:

    $ awk -v beg='2023-11-07 05:00:00' -v end='2024-12-01 07:00:00' '
        match($0,/^[0-9]{4}([-: ][0-9]{2}){5}/) { cur = substr($0,RSTART,RLENGTH) }
        (beg <= cur) && (cur <= end)
    ' file
    2023-11-07 07:01:25 UTC::@:[605]:ERROR: could not connect to the publisher: connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL on
     connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL off
    

    You may or may not want to tighten the date+time regexp I’m using depending on what else can exist at the start of a line in your log file.

    Login or Signup to reply.
  2. problem is getting those lines that don’t start with a timestamp, and
    only those following a matching one.

    You might use RS (row separator) in GNU AWK following way, consider following simple example, let file.txt

    2023-01-01 01:01:01
    UNO
    2023-03-03 03:03:03
    TRES
    TRES
    TRES
    2023-02-02 02:02:02
    DOS
    DOS
    

    and your task is to extract entries for 2nd month of 2023, then

    awk 'BEGIN{RS="2023-[0-1][0-9]-[0-3][0-9]"}dt>="2023-02-01"&&dt<="2023-02-31"{printf("%s%s",dt,$0)}{dt=RT}' file.txt
    

    gives output

    2023-02-02 02:02:02
    DOS
    DOS
    

    Explanation: I use RS which should match entry dates and only entry dates in file, it assumes log pertains to year 2023 only and is limited in digits allowed in subsequent places, though it still might matches some nonsense dates e.g. 2023-01-37. Then I filter for dates where dt in required range and printf given date with content ($0), independently from that I store RT (row terminator) in variable dt, as I need to consider row terminator of previous line during filtering.

    (tested in GNU Awk 5.1.0)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search