I’m working on a Bash script to parse Postgres error logs and pull out log entries between certain dates/times. The complicating factor is that entries can be multiline, and only the first line includes the time stamp. Log entries look like this:
YYYY-MM-DD hh:mm:ss UTC:<IP address>(<port>):<user>@<host>:[pid]:<msgtype>:<message>
where <message>
can be 1 or more lines. <msgtype>
is ERROR
, STATEMENT
, DETAIL
, etc. Almost any <msgtype>
can have any number of lines.
Awk will be processing multiple files on the command line, and while the lines in the file are in timestamp order, the files aren’t necessarily so. Currently I’m just doing a simple compare (awk '{if ($1 >= "$first") { print } }'
)* where $first
is set to the beginning time stamp. Adding a check for a $last
is trivial, the problem is getting those lines that don’t start with a timestamp, and only those following a matching one.
Can someone point me in the right direction for this?
*It just accorred to me that this will only compare the date and not the time, so can someone help with this part as well? Can I do awk '{if ( ($1" "$2) >= "$first") { print } }'
?
ETA: sample log entry:
2023-11-07 07:01:25 UTC::@:[605]:ERROR: could not connect to the publisher: connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL on
connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL off
2
Answers
It’s possible that this might be what you’re trying to do but without useful sample input/output that demonstrates all your requirements, it’s a guess based on multiple assumption:
You may or may not want to tighten the date+time regexp I’m using depending on what else can exist at the start of a line in your log file.
You might use
RS
(row separator) in GNUAWK
following way, consider following simple example, letfile.txt
and your task is to extract entries for 2nd month of 2023, then
gives output
Explanation: I use
RS
which should match entry dates and only entry dates in file, it assumes log pertains to year 2023 only and is limited in digits allowed in subsequent places, though it still might matches some nonsense dates e.g. 2023-01-37. Then I filter for dates wheredt
in required range andprintf
given date with content ($0
), independently from that I storeRT
(row terminator) in variabledt
, as I need to consider row terminator of previous line during filtering.(tested in GNU Awk 5.1.0)