`pgtoolkit.log`

Postgres logs are still the most comprehensive source of information on what’s going on in a cluster. pgtoolkit.log provides a parser to exploit efficiently Postgres log records from Python.

Parsing logs is tricky because format varies across configurations. Also performance is important while logs can contain thousands of records.

Configuration

Postgres log records have a prefix, configured with log_line_prefix cluster setting. When analyzing a log file, you must known the log_line_prefix value used to generate the records.

Postgres can emit more message for your needs. See Error Reporting and Logging section if PostgreSQL documentation for details on logging fields and message type.

Performance

The fastest code is NOOP. Thus, the parser allows you to filter records as soon as possible. The parser has several distinct stages. After each stage, the parser calls a filter to determine whether to stop record processing. Here are the stages in processing order :

Split prefix, severity and message, determine message type.
Extract and decode prefix data
Extract and decode message data.

Limitations

pgtoolkit.log does not manage opening and uncompressing logs. It only accepts a line reader iterator that loops log lines. The same way, pgtoolkit.log does not manage to start analyze at a specific position in a file.

pgtoolkit.log does not gather record set such as ERROR and following HINT record. It’s up to the application to make sense of record sequences.

pgtoolkit.log does not analyze log records. It’s just a parser, a building block to write a log analyzer in your app.

API Reference

Here are the few functions and classes used to parse and access log records.

pgtoolkit.log.parse(fo: Iterable[str], prefix_fmt: str, filters: NoopFilters | None = None) → Iterator[Record | UnknownData][source]

Parses log lines and yield Record or UnknownData objects.

This is a helper around LogParser and :PrefixParser.

Parameters:

fo – A line iterator such as a file-like object.
prefix_fmt – is exactly the value of log_line_prefix Postgresql settings.
filters – is an object like NoopFilters instance.

See Example section for usage.

class pgtoolkit.log.LogParser(prefix_parser: PrefixParser, filters: NoopFilters | None = None)[source]

Log parsing manager

This object gather parsing parameters and trigger parsing logic. When parsing multiple files with the same parameters or when parsing multiple sets of lines, LogParser object ease the initialization and preservation of parsing parameters.

When parsing a single set of lines, one can use parse() helper instead.

Parameters:

prefix_parser – An instance of PrefixParser.
filters – An instance of NoopFilters

class pgtoolkit.log.PrefixParser(re_: Pattern[str], prefix_fmt: str | None = None)[source]

Extract record metadata from PostgreSQL log line prefix.

classmethod from_configuration(log_line_prefix: str) → PrefixParser[source]

Factory from log_line_prefix

Parses log_line_prefix and build a prefix parser from this.

Parameters:: log_line_prefix – log_line_prefix PostgreSQL setting.
Returns:: A PrefixParser instance.

class pgtoolkit.log.Record(prefix: str, severity: str, message_type: str = 'unknown', message_lines: list[str] | None = None, raw_lines: list[str] | None = None, **fields: str)[source]

Log record object.

Record object stores record fields and implements the different parse stages.

A record is primarily composed by a prefix, a severity and a message. Actually, severity is mixed with message type. For example, a HINT: message has the same severity as LOG: and is actually a continuation message (see csvlog output to compare). Thus we can determine easily message type as this stage. pgtoolkit.log does not rewrite message severity.

Once prefix, severity and message are split, the parser analyze prefix according to log_line_prefix parameter. Prefix can give a lot of information for filtering, but costs some CPU cycles to process.

Finally, the parser analyze the message to extract information such as statement, hint, duration, execution plan, etc. depending on the message type.

These stages are separated so that marshalling can apply filter between each stage.

as_dict() → dict[str, str | object | datetime][source]: Returns record fields as a dict.

Each record field is accessible as an attribute :

prefix: Raw prefix line.

severity: One of DEBUG1 to DEBUG5, CONTEXT, DETAIL, ERROR, etc.

message_type: A string identifying message type. One of unknown, duration, connection, analyze, checkpoint.

raw_lines: A record can span multiple lines. This attribute keep a reference on raw record lines of the record.

message_lines: Just like raw_lines, but the first line only include message, without prefix nor severity.

The following attributes correspond to prefix fields. See log_line_prefix documentation for details.

application_name

command_tag

database

epoch

Type:: datetime.datetime

error

line_num

Type:: int

pid

Type:: int

remote_host

remote_port

Type:: int

session

start

Type:: datetime.datetime

timestamp

Type:: datetime.datetime

user

virtual_xid

xid

Type:: int

If the log lines miss a field, the record won’t have the attribute. Use hasattr() to check whether a record have a specific attribute.

class pgtoolkit.log.UnknownData(lines: Sequence[str])[source]

Represents unparsable data.

UnknownData is throwable, you can raise it.

lines: The list of unparsable strings.

class pgtoolkit.log.NoopFilters[source]

Basic filter doing nothing.

Filters are grouped in an object to simplify the definition of a filtering policy. By subclassing NoopFilters, you can implement simple filtering or heavy parameterized filtering policy from this API.

If a filter method returns True, the record processing stops and the record is dropped.

stage1(record: Record) → None[source]

First stage filter.

Parameters:: record (Record) – A new record.
Returns:: True if record must be dropped.

record has only prefix, severity and message_type attributes.

stage2(record: Record) → None[source]

Second stage filter.

Parameters:: record (Record) – A new record.
Returns:: True if record must be dropped.

record has attributes from stage 1 plus attributes from prefix analysis. See Record for details.

stage3(record: Record) → None[source]

Third stage filter.

Parameters:: record (Record) – A new record.
Returns:: True if record must be dropped.

record has attributes from stage 2 plus attributes from message analysis, depending on message type.

Example

Here is a sample structure of code parsing a plain log file.

with open('postgresql.log') as fo:
    for r in parse(fo, prefix_fmt='%m [%p]'):
        if isinstance(r, UnknownData):
            "Process unknown data"
        else:
            "Process record"

Using `pgtoolkit.log` as a script

You can use this module to dump logs as JSON using the following usage:

python -m pgtoolkit.log <log_line_prefix> [<filename>]

pgtoolkit.log serializes each record as a JSON object on a single line.

$ python -m pgtoolkit.log '%m [%p]: [%l-1] app=%a,db=%d%q,client=%h,user=%u ' data/postgresql.log
{"severity": "LOG", "timestamp": "2018-06-15T10:49:31.000144", "message_type": "connection", "line_num": 2, "remote_host": "[local]", "application": "[unknown]", "user": "postgres", "message": "connection authorized: user=postgres database=postgres", "database": "postgres", "pid": 8423}
{"severity": "LOG", "timestamp": "2018-06-15T10:49:34.000172", "message_type": "connection", "line_num": 1, "remote_host": "[local]", "application": "[unknown]", "user": "[unknown]", "message": "connection received: host=[local]", "database": "[unknown]", "pid": 8424}

pgtoolkit.log