pgtoolkit.log
Postgres logs are still the most comprehensive source of information on what’s
going on in a cluster. pgtoolkit.log
provides a parser to exploit
efficiently Postgres log records from Python.
Parsing logs is tricky because format varies accross configurations. Also performance is important while logs can contain thousands of records.
Configuration
Postgres log records have a prefix, configured with log_line_prefix
cluster
setting. When analyzing a log file, you must known the log_line_prefix
value used to generate the records.
Postgres can emit more message for your needs. See Error Reporting and Logging section if PostgreSQL documentation for details on logging fields and message type.
Performance
The fastest code is NOOP. Thus, the parser allows you to filter records as soon as possible. The parser has several distinct stages. After each stage, the parser calls a filter to determine whether to stop record processing. Here are the stages in processing order :
Split prefix, severity and message, determine message type.
Extract and decode prefix data
Extract and decode message data.
Limitations
pgtoolkit.log
does not manage opening and uncompressing logs. It only
accepts a line reader iterator that loops log lines. The same way,
pgtoolkit.log
does not manage to start analyze at a specific position in
a file.
pgtoolkit.log
does not gather record set such as ERROR
and
following HINT
record. It’s up to the application to make sense of record
sequences.
pgtoolkit.log
does not analyze log records. It’s just a parser, a
building block to write a log analyzer in your app.
API Reference
Here are the few functions and classes used to parse and access log records.
- pgtoolkit.log.parse(fo: Iterable[str], prefix_fmt: str, filters: NoopFilters | None = None) Iterator[Record | UnknownData] [source]
Parses log lines and yield
Record
orUnknownData
objects.This is a helper around
LogParser
and :PrefixParser.- Parameters:
fo – A line iterator such as a file-like object.
prefix_fmt – is exactly the value of
log_line_prefix
Postgresql settings.filters – is an object like
NoopFilters
instance.
See Example section for usage.
- class pgtoolkit.log.LogParser(prefix_parser: PrefixParser, filters: NoopFilters | None = None)[source]
Log parsing manager
This object gather parsing parameters and trigger parsing logic. When parsing multiple files with the same parameters or when parsing multiple sets of lines,
LogParser
object ease the initialization and preservation of parsing parameters.When parsing a single set of lines, one can use
parse()
helper instead.- Parameters:
prefix_parser – An instance of
PrefixParser
.filters – An instance of
NoopFilters
- class pgtoolkit.log.PrefixParser(re_: Pattern[str], prefix_fmt: str | None = None)[source]
Extract record metadata from PostgreSQL log line prefix.
- classmethod from_configuration(log_line_prefix: str) PrefixParser [source]
Factory from log_line_prefix
Parses log_line_prefix and build a prefix parser from this.
- Parameters:
log_line_prefix –
log_line_prefix
PostgreSQL setting.- Returns:
A
PrefixParser
instance.
- class pgtoolkit.log.Record(prefix: str, severity: str, message_type: str = 'unknown', message_lines: list[str] | None = None, raw_lines: list[str] | None = None, **fields: str)[source]
Log record object.
Record object stores record fields and implements the different parse stages.
A record is primarily composed by a prefix, a severity and a message. Actually, severity is mixed with message type. For example, a HINT: message has the same severity as
LOG:
and is actually a continuation message (see csvlog output to compare). Thus we can determine easily message type as this stage.pgtoolkit.log
does not rewrite message severity.Once prefix, severity and message are splitted, the parser analyze prefix according to
log_line_prefix
parameter. Prefix can give a lot of informations for filtering, but costs some CPU cycles to process.Finally, the parser analyze the message to extract informations such as statement, hint, duration, execution plan, etc. depending on the message type.
These stages are separated so that marshalling can apply filter between each stage.
Each record field is accessible as an attribute :
- prefix
Raw prefix line.
- severity
One of
DEBUG1
toDEBUG5
,CONTEXT
,DETAIL
,ERROR
, etc.
- message_type
A string identifying message type. One of
unknown
,duration
,connection
,analyze
,checkpoint
.
- raw_lines
A record can span multiple lines. This attribute keep a reference on raw record lines of the record.
- message_lines
Just like
raw_lines
, but the first line only include message, without prefix nor severity.
The following attributes correspond to prefix fields. See log_line_prefix documentation for details.
- application_name
- command_tag
- database
- epoch
- Type:
- error
- remote_host
- session
- start
- Type:
- timestamp
- Type:
- user
- virtual_xid
If the log lines miss a field, the record won’t have the attribute. Use
hasattr()
to check whether a record have a specific attribute.
- class pgtoolkit.log.UnknownData(lines: Sequence[str])[source]
Represents unparseable data.
UnknownData
is throwable, you can raise it.- lines
The list of unparseable strings.
- class pgtoolkit.log.NoopFilters[source]
Basic filter doing nothing.
Filters are grouped in an object to simplify the definition of a filtering policy. By subclassing
NoopFilters
, you can implement simple filtering or heavy parameterized filtering policy from this API.If a filter method returns True, the record processing stops and the record is dropped.
- stage1(record: Record) None [source]
First stage filter.
- Parameters:
record (Record) – A new record.
- Returns:
True
if record must be dropped.
record
has only prefix, severity and message_type attributes.
Example
Here is a sample structure of code parsing a plain log file.
with open('postgresql.log') as fo:
for r in parse(fo, prefix_fmt='%m [%p]'):
if isinstance(r, UnknownData):
"Process unknown data"
else:
"Process record"
Using pgtoolkit.log
as a script
You can use this module to dump logs as JSON using the following usage:
python -m pgtoolkit.log <log_line_prefix> [<filename>]
pgtoolkit.log
serializes each record as a JSON object on a single line.
$ python -m pgtoolkit.log '%m [%p]: [%l-1] app=%a,db=%d%q,client=%h,user=%u ' data/postgresql.log
{"severity": "LOG", "timestamp": "2018-06-15T10:49:31.000144", "message_type": "connection", "line_num": 2, "remote_host": "[local]", "application": "[unknown]", "user": "postgres", "message": "connection authorized: user=postgres database=postgres", "database": "postgres", "pid": 8423}
{"severity": "LOG", "timestamp": "2018-06-15T10:49:34.000172", "message_type": "connection", "line_num": 1, "remote_host": "[local]", "application": "[unknown]", "user": "[unknown]", "message": "connection received: host=[local]", "database": "[unknown]", "pid": 8424}