WL#9651: Logging services: log filter (configuration engine)

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

Provide a configuration language for WL#9342, "logging services: log filter 
(filtering engine)". This language should expose all features of the filtering 
engine and allow the user to leverage its full power using a simple "if 
<condition> then <action>" syntax.
Func-Req 1  Configuration

Func-Req 1.1  Filtering language

The scope of this WL comprises the implementation the configuration language 
specified in detail below, as to allow the user to leverage the versatility and 
power of the new built-in filtering 
engine (cf. WL#9342); this engine shall set up the rules from user input, but 
call upon the built-in engine to do the actual filtering.

Func-Req 1.2  Mechanism

The filter language shall receive its rules in a new system variable, 
@@global.log_error_filter_rules. This variable may be replaced with a per-
component one once those are supported by the component framework, or by other 
suitable mechanisms.


Func-Req 2  Loadable service

The configuration engine shall be dynamically loadable service that 
allows the user to replace the 5.7 compatibilty mode with a rule-based
setup.
"Compatibility mode" within the context of this WL denotes the
support for system variables/start-up options such as --log-error-verbosity
that were re-implemented on top of the new filtering engine as
part of WL#9342.

Func-Req 2.1  Loading

When this engine is loaded and added to the error log stack 
(@@global.log_error_services), it shall parse its configuration and add the 
resulting rules to the filtering engine specified in WL#9342.

Func-Req 2.2  Unloading

When this engine is unloaded or removed from the error log stack 
(@@global.log_error_services), its rules shall no longer be executed by the 
filtering engine specified in WL#9342.

Func-Req 2.3  Configuration changes

When this engine is loaded and configured as part of the error log stack, it 
should update its ruleset in the filtering engine (WL#9342) if its configuration 
is changed (for now, if @@global.log_error_filter_rules is changed).
As the component framework evolves, this mechanism is likely to be migrated (to 
read its data from e.g. a per component instance system variable, a table, or a 
nested configuration of @@global.log_error_services -- for obvious reasons, this 
is beyond the scope of this WL).
I-1  Semantics

NO CHANGE

I-2  Instrumentation

NO CHANGE

I-3  Error and Warnings

NO CHANGE

I-4  Install/Upgrade

NO CHANGE

I-5  Commercial plugins

NO CHANGE.

I-6  Replication

NO CHANGE

I-7  XProtocols

NO CHANGE

I-8  Protocols

NO CHANGE

I-9  Security

NO CHANGE
Future filters may elect to obfuscate parts of plain-text messages however, etc.

I-10  Log Files

YES, unsurprisingly. :)  See I-3

I-11  MySQL clients

NO CHANGE

I-12  Auth plugins

NO CHANGE

I-13  Globalization

NO CHANGE

I-14  Configuration

YES. The filter engine shall use a new system variable for its configuration; 
initially, this system variable shall reside in the MySQL 
server namespace; once the component configuration framework is finalized, it 
may be moved into a more appropriate namespace.

I-15  File formats

NO CHANGE (see log writers for that)

I-16  Plugins/APIs

NO CHANGE. (Will use the new APIs introduced by the service/component framework 
(WL#4102) and "logging services" (WL#9323) and "log filters" (WL#9342) WLs 
however.)

I-17  Other programs

NO CHANGE

I-18  Storage engines

NO CHANGE

I-19  Data types

NO CHANGE
1  SYNTAX

The following details a configuration language for the filter service (WL#9342) 
working on the structured log-info used by the structured logging services 
(WL#9323) that define key/value pairs of various types, and log lines that are 
a collection of these pairs.

The internal model follows a condition/verb/object structure.
Keywords are case-insensitive.

Key:
[foo]     optional particle, may appear 0 or 1 times
{bar}     optional particle, may appear 0, 1, or many times
'a'       character literal (where required for clarity)
"abc"     string literal    (where required for clarity)
'a'..'z'  any one character from the range from 'a' to 'z'
<baz>     metasyntactic variable defined below
x | y     either one of x and y is valid here


<statement>            ::= IF <conditions> THEN <action>
                           {ELSEIF <condition> THEN <action>}
                           [ELSE <action>]
                           '.'

<conditions>           ::= <condition> { AND <condition> | OR <condition> }

<condition>            ::= <field> <comparator> <value> |
                           <prefix_operator> <field>

<field>                ::= <wellknown_field_name> |
                           <ad_hoc_field_name>

<wellknown_field_name> ::= <core_field>  | <optional_field>

<core_field>           ::= "err_code"    | "err_symbol"  |
                           "SQL_state"   |
                           "prio"        | "label"       |
                           "time"        |
                           "msg"         | "subsystem"

<optional_field>       ::= "OS_errno"    | "OS_errmsg"   |
                           "source_file" | "source_line" | "function" |
                           "user"        | "host"        |
                           "thread"      | "query_id"

<ad_hoc_field_name>    ::= { 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' }
                           Any name conforming to the above formation
                           rule that does not duplicate a well-known
                           field name.

<comparator>           ::= "==" | "!=" | ">=" | "<=" | "<" | ">" | "<>"
                           (compare a field and a value to test
                           for equality, etc.:
                          "prio>WARNING": "If severity is INFO or lower ..."
                           As a courtesy, we also convert '=<' into '<='
                           and so on. Since this happens at parse-time,
                           it incurs no run-time overhead, and should
                           prevent user-frustration.)

<value>                ::= <string_literal> | <numeric_literal> |
                           <error_symbol> | <severity>

<numeric_literal>      ::= <integer_literal> | <float_literal>

<digit>                ::= '0'..'9'

<sign>                 ::= '+' | '-' | ""

<integer_literal>      ::= <sign> <digit> { <digit> }

<float_literal>        ::= <sign> <digit> { <digit> } 
                             [ '.' <digit> { <digit> } ]

<string_literal>       ::= '"' { ^'"' } '"' | '\'' { ^'\'' } '\''

<error_symbol>         ::= a valid MySQL error symbol, e.g. ER_STARTUP

<severity>             ::= WARNING | ERROR | INFORMATION
                           (for comparison to prio only)

<prefix_operator>      ::= NOT EXISTS | EXISTS
                           (test for existence / absence of a field)

<action>               ::= drop | 
                           throttle <integer_literal>['/' <integer_literal>] |
                           set <assignment> |
                           unset <field>

<assignment>           ::= <field> ":=" <value> |
                           <field>  '=" <value> |
                           <field>  " " <value>

                    ...

that can easily model prior behaviour:


1.1  Verbs

1.1.1  DROP

"DROP" discards the log event in its entirety.
It is used for example to re-implement --log-error-verbosity=...
(discarding events of a certain verbosity):

--log_error_verbosity=2 can thus be noted as

  IF prio >= INFORMATION THEN drop.

"If the priority/severity is >= INFORMATION_LEVEL, discard the event."


1.1.2  THROTTLE

"THROTTLE" applies a rate-limiter to a class of error-events (e.g. events with a 
certain priority/severity, with a certain subsystem, or a certain error number). 
This generalizes the existing Error_log_throttle class:

  IF err_code==1408 THEN throttle 100.

Our condition selects the start-up message (error number 1408 -- "IF 
err_symbol==ER_STARTUP" would have the same effect). "throttle 100" then allows 
100 of the selected message class to pass per minute; any further messages will 
be summarized as a digest.

Other equivalence classes are conceivable, for instance we could allow only a 
limited number of warnings per minute:

  IF prio==2 THEN throttle 500.

Finally, other window-sizes can be specified (in seconds), e.g.

  IF prio==2 THEN throttle 1000/3600.

to allow 1000 warnings per hour (3600: 60 minutes * 60 seconds).
The denominator, if it exists, must be satisfy 0<integer_seconds<=604800.

NB  WL#11047, "logging services: review throttle class," will review the 
possibility of changing over the remaining use of the legacy class; this should 
be done in tandem with the code owners to make sure the intention is preserved; 
rpl have agreed to review.


1.1.3 SET

"SET" assigns a value to a key, or updates the value should the key/value pair 
exist. In this example, we're debugging our replication set-up. We generally 
discard note/information type events to de-spam, but no replication-related info 
may be lost. To achieve that, we reclassify any log-events originating in a 
given source-file or subsystem as errors by updating their priority 
to 0, ERROR_LEVEL:

  IF source_file=="rpl_slave.cc" THEN set prio:=ERROR.


1.1.4  UNSET

"UNSET" discards individual key/value pairs rather than the entire event. This 
enables the DBA to keep the log-files lean and uncluttered, and helps the Oracle 
engineer create deterministic .result files.

In this example, we discard unwanted log-items ("fields", so to say):

  IF EXISTS subsystem THEN unset subsystem.

This should be invaluable for .test cases, where we can now simply filter out 
non-deterministic or oft-changing fields, saving us much work with regular 
expressions or updates of .result files.


1.2  Well-known fields

While the following fields are available, few calls would set all or even just 
most of them. The most common subset would be { time, err_code, msg }.

The distinction however is a guide-line, not a guarantee. For the purpose of 
this WL, "core" means that the field will be set up automatically on error 
events; its presence is expected, but not guaranteed (e.g. when a filter has 
deleted it etc.). A service must handle the absence of such a field gracefully, 
log-writers for instance could do so by omitting the field, by indicating the 
absence of the value, by using a sensible default (where possible/appropriate), 
or in extreme cases (perhaps when the 'msg' field is not set), by not processing 
the event at all.

Conversely, for an "optional" field the expectation is absence; some events may 
use some of these fields, most probably won't. If present, these fields contain 
additional info that an event-creator may have elected to furnish us with where 
appropriate (and where such information is available to the event-creator in the 
first place).


1.2.1  Core fields

time:        timestamp. usually auto-generated.

msg:         error message. usually inferred from err_code

prio:        priority ("severity" in syslog):
             (numeric; error, warning, or note/info)

label:       derived from prio. can be overriden in filter
             for log-writers that support custom-labels.

err_code:    error-ID (numeric, e.g.  1234)
err_symbol:  error_ID (symbolic, e.g. ER_STARTUP -- this is
                       supported for convenience and portability;
                       internally, the numeric format is used)

subsystem:   what sub-system the error was thrown in.
             (subsystems can be defined by code-owners to comprise
              several source-files, e.g. "replication")

SQL_state:   SQL-state (e.g. "HY000")

1.2.2  Optional fields

1.2.2.1  Errors signaled by the operating system

OS_errno:    operating system's error number
OS_errmsg:   operating system's error message / strerror

1.2.2.2  Identifying the client on whose behalf we were working

user/host:   user/host, where known/applicable
thread,
query_id:    thread-ID ("connection_ID()") and query ID

1.2.2.3  Debug information

source_file: debugging: what source file the error was thrown in
source_line: debugging: what source line the error was thrown on
function:    debugging: what function    the error was thrown in


2  CONFIGURATION

2.1  "New-style configuration"

A new system variable created that will allow the DBA access to all of the 
filter's features.

WL#9424 aims to add per-component system variables for configuration; the new 
variable may move into an appropriate namespace (built-in or external service) 
once 9424 becomes available. For the time being, to facilitate testing, we 
provide a variable with the same semantics for its content in the server 
namespace (the only namespace existing at present, and therefore the default). 
The proposed name for this interim global variable is "log_error_filter_rules". 
It is of the string type and may hold 0, 1, or many statements following the 
syntax described above ("SYNTAX").  A reasonable default value would be assumed 
to not be dissimilar to e.g.:

IF prio>=3 THEN drop. IF EXISTS source_line THEN unset.

2.2  "Old-style configuration"

The docs should strongly impress upon the user the suggestion to make a choice 
between old-style (--log-error-verbosity etc.) OR new-style configuration to 
avoid confusion.

2.3  Configuration parsing stage

As a courtesy, during parsing (e.g. "IF prio>=3 THEN drop."),the filter 
configuration engine checks whether it knows the field ("prio"), and if so, 
whether the storage class it expects for the field (integer) matches that of the 
argument (3).  In our example, it does; if it didn't, the configuration engine 
would throw an error.

The same applies if a well-known field appears in the action (e.g. the action 
'set label:="HELO".' in the rule 'IF MySQL_error==1408 THEN set 
label:="HELO".')  label is a well-known field here, its well-known storage class 
is string, and since "HELO" is a string, all's well.  (Otherwise, again, we'd 
throw an error.)


3  OTHER

- Config variable should be NOT_IN_BINLOG, in alignment with --log-error-
verbosity, --log-warnings, --log-syslog-*, and the new --log-error-services.

- if a log-event has LOG_ITEM_LOG_SUPPRESSED set on it (to indicate "we threw 
away n events in the last window"), further rate-limiters should not throw this 
item away, so this information is not lost. no such guarantees are made for the 
drop verb.

- Mostly due to current limitations of our ability to configure
  multiple instances of a given log-service,

  - We do not currently support multiple service pipelines / tee-ing,
    as in

    log-event created
            |
           tee
           /  \
      filter  filter
         |      |
       sink   sink

  - Likewise, multiple instances of the filter language
    being loaded is not a supported configuration, as
    they'd currently both use the same rule-set (derived
    from the value in @@global.log_error_filter_rules),
    which seems without merit:

    log-event created
            |
    filter-language
            |
    filter-language
            |
          sink