pgtoolkit.log
Postgres logs are still the most comprehensive source of information on what’s
going on in a cluster. pgtoolkit.log provides a parser to exploit
efficiently Postgres log records from Python.
Parsing logs is tricky because format varies across configurations. Also performance is important while logs can contain thousands of records.
Configuration
Postgres log records have a prefix, configured with log_line_prefix cluster
setting. When analyzing a log file, you must known the log_line_prefix
value used to generate the records.
Postgres can emit more message for your needs. See Error Reporting and Logging section if PostgreSQL documentation for details on logging fields and message type.
Performance
The fastest code is NOOP. Thus, the parser allows you to filter records as soon as possible. The parser has several distinct stages. After each stage, the parser calls a filter to determine whether to stop record processing. Here are the stages in processing order :
Split prefix, severity and message, determine message type.
Extract and decode prefix data
Extract and decode message data.
Limitations
pgtoolkit.log does not manage opening and uncompressing logs. It only
accepts a line reader iterator that loops log lines. The same way,
pgtoolkit.log does not manage to start analyze at a specific position in
a file.
pgtoolkit.log does not gather record set such as ERROR and
following HINT record. It’s up to the application to make sense of record
sequences.
pgtoolkit.log does not analyze log records. It’s just a parser, a
building block to write a log analyzer in your app.
API Reference
Here are the few functions and classes used to parse and access log records.
- pgtoolkit.log.parse(fo: Iterable[str], prefix_fmt: str, filters: NoopFilters | None = None) Iterator[Record | UnknownData][source]
Parses log lines and yield
RecordorUnknownDataobjects.This is a helper around
LogParserand :PrefixParser.- Parameters:
fo – A line iterator such as a file-like object.
prefix_fmt – is exactly the value of
log_line_prefixPostgresql settings.filters – is an object like
NoopFiltersinstance.
See Example section for usage.
- class pgtoolkit.log.LogParser(prefix_parser: PrefixParser, filters: NoopFilters | None = None)[source]
Log parsing manager
This object gather parsing parameters and trigger parsing logic. When parsing multiple files with the same parameters or when parsing multiple sets of lines,
LogParserobject ease the initialization and preservation of parsing parameters.When parsing a single set of lines, one can use
parse()helper instead.- Parameters:
prefix_parser – An instance of
PrefixParser.filters – An instance of
NoopFilters
- class pgtoolkit.log.PrefixParser(re_: Pattern[str], prefix_fmt: str | None = None)[source]
Extract record metadata from PostgreSQL log line prefix.
- classmethod from_configuration(log_line_prefix: str) PrefixParser[source]
Factory from log_line_prefix
Parses log_line_prefix and build a prefix parser from this.
- Parameters:
log_line_prefix –
log_line_prefixPostgreSQL setting.- Returns:
A
PrefixParserinstance.
- class pgtoolkit.log.Record(prefix: str, severity: str, message_type: str = 'unknown', message_lines: list[str] | None = None, raw_lines: list[str] | None = None, **fields: str)[source]
Log record object.
Record object stores record fields and implements the different parse stages.
A record is primarily composed by a prefix, a severity and a message. Actually, severity is mixed with message type. For example, a HINT: message has the same severity as
LOG:and is actually a continuation message (see csvlog output to compare). Thus we can determine easily message type as this stage.pgtoolkit.logdoes not rewrite message severity.Once prefix, severity and message are split, the parser analyze prefix according to
log_line_prefixparameter. Prefix can give a lot of information for filtering, but costs some CPU cycles to process.Finally, the parser analyze the message to extract information such as statement, hint, duration, execution plan, etc. depending on the message type.
These stages are separated so that marshalling can apply filter between each stage.
Each record field is accessible as an attribute :
- prefix
Raw prefix line.
- severity
One of
DEBUG1toDEBUG5,CONTEXT,DETAIL,ERROR, etc.
- message_type
A string identifying message type. One of
unknown,duration,connection,analyze,checkpoint.
- raw_lines
A record can span multiple lines. This attribute keep a reference on raw record lines of the record.
- message_lines
Just like
raw_lines, but the first line only include message, without prefix nor severity.
The following attributes correspond to prefix fields. See log_line_prefix documentation for details.
- application_name
- command_tag
- database
- epoch
- Type:
- error
- remote_host
- session
- start
- Type:
- timestamp
- Type:
- user
- virtual_xid
If the log lines miss a field, the record won’t have the attribute. Use
hasattr()to check whether a record have a specific attribute.
- class pgtoolkit.log.UnknownData(lines: Sequence[str])[source]
Represents unparsable data.
UnknownDatais throwable, you can raise it.- lines
The list of unparsable strings.
- class pgtoolkit.log.NoopFilters[source]
Basic filter doing nothing.
Filters are grouped in an object to simplify the definition of a filtering policy. By subclassing
NoopFilters, you can implement simple filtering or heavy parameterized filtering policy from this API.If a filter method returns True, the record processing stops and the record is dropped.
- stage1(record: Record) None[source]
First stage filter.
- Parameters:
record (Record) – A new record.
- Returns:
Trueif record must be dropped.
recordhas only prefix, severity and message_type attributes.
Example
Here is a sample structure of code parsing a plain log file.
with open('postgresql.log') as fo:
for r in parse(fo, prefix_fmt='%m [%p]'):
if isinstance(r, UnknownData):
"Process unknown data"
else:
"Process record"
Using pgtoolkit.log as a script
You can use this module to dump logs as JSON using the following usage:
python -m pgtoolkit.log <log_line_prefix> [<filename>]
pgtoolkit.log serializes each record as a JSON object on a single line.
$ python -m pgtoolkit.log '%m [%p]: [%l-1] app=%a,db=%d%q,client=%h,user=%u ' data/postgresql.log
{"severity": "LOG", "timestamp": "2018-06-15T10:49:31.000144", "message_type": "connection", "line_num": 2, "remote_host": "[local]", "application": "[unknown]", "user": "postgres", "message": "connection authorized: user=postgres database=postgres", "database": "postgres", "pid": 8423}
{"severity": "LOG", "timestamp": "2018-06-15T10:49:34.000172", "message_type": "connection", "line_num": 1, "remote_host": "[local]", "application": "[unknown]", "user": "[unknown]", "message": "connection received: host=[local]", "database": "[unknown]", "pid": 8424}