How do I read PostgreSQL error logs using AWK?

Swechsler
November 9, 2023
233 views
0 votes
2 Answers

I’m working on a Bash script to parse Postgres error logs and pull out log entries between certain dates/times. The complicating factor is that entries can be multiline, and only the first line includes the time stamp. Log entries look like this:

YYYY-MM-DD hh:mm:ss UTC:<IP address>(<port>):<user>@<host>:[pid]:<msgtype>:<message>

where <message> can be 1 or more lines. <msgtype> is ERROR, STATEMENT, DETAIL, etc. Almost any <msgtype> can have any number of lines.

Awk will be processing multiple files on the command line, and while the lines in the file are in timestamp order, the files aren’t necessarily so. Currently I’m just doing a simple compare (awk '{if ($1 >= "$first") { print } }')* where $first is set to the beginning time stamp. Adding a check for a $last is trivial, the problem is getting those lines that don’t start with a timestamp, and only those following a matching one.

Can someone point me in the right direction for this?

*It just accorred to me that this will only compare the date and not the time, so can someone help with this part as well? Can I do awk '{if ( ($1" "$2) >= "$first") { print } }'?

ETA: sample log entry:

 2023-11-07 07:01:25 UTC::@:[605]:ERROR: could not connect to the publisher: connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL on
 connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL off

Answers

- EdMorton
- November 9, 2023 at 6:09 pm
- 0 votes
0
It’s possible that this might be what you’re trying to do but without useful sample input/output that demonstrates all your requirements, it’s a guess based on multiple assumption:
```
$ awk -v beg='2023-11-07 05:00:00' -v end='2024-12-01 07:00:00' '
    match($0,/^[0-9]{4}([-: ][0-9]{2}){5}/) { cur = substr($0,RSTART,RLENGTH) }
    (beg <= cur) && (cur <= end)
' file
2023-11-07 07:01:25 UTC::@:[605]:ERROR: could not connect to the publisher: connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL on
 connection to server at "<ip addr>", port 5432 failed: FATAL: pg_hba.conf rejects replication connection for host "<ip addr>", user "<userid>", SSL off
```
You may or may not want to tighten the date+time regexp I’m using depending on what else can exist at the start of a line in your log file.
Login or Signup to reply.

- Daweo
- November 9, 2023 at 7:51 pm
- 0 votes
0
problem is getting those lines that don’t start with a timestamp, and
only those following a matching one.

You might use RS (row separator) in GNU AWK following way, consider following simple example, let file.txt
```
2023-01-01 01:01:01
UNO
2023-03-03 03:03:03
TRES
TRES
TRES
2023-02-02 02:02:02
DOS
DOS
```
and your task is to extract entries for 2nd month of 2023, then
```
awk 'BEGIN{RS="2023-[0-1][0-9]-[0-3][0-9]"}dt>="2023-02-01"&&dt<="2023-02-31"{printf("%s%s",dt,$0)}{dt=RT}' file.txt
```
gives output
```
2023-02-02 02:02:02
DOS
DOS
```
Explanation: I use RS which should match entry dates and only entry dates in file, it assumes log pertains to year 2023 only and is limited in digits allowed in subsequent places, though it still might matches some nonsense dates e.g. 2023-01-37. Then I filter for dates where dt in required range and printf given date with content ($0), independently from that I store RT (row terminator) in variable dt, as I need to consider row terminator of previous line during filtering.

(tested in GNU Awk 5.1.0)
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.