WL#9651: Logging services: log filter (configuration engine)

Affects: Server-8.0   —   Status: Complete

Provide a configuration language for WL#9342, "logging services: log filter 
(filtering engine)". This language should expose all features of the filtering 
engine and allow the user to leverage its full power using a simple "if 
 then " syntax.
Func-Req 1  Configuration

Func-Req 1.1  Filtering language

The scope of this WL comprises the implementation the configuration language 
specified in detail below, as to allow the user to leverage the versatility and 
power of the new built-in filtering 
engine (cf. WL#9342); this engine shall set up the rules from user input, but 
call upon the built-in engine to do the actual filtering.

Func-Req 1.2  Mechanism

The filter language shall receive its rules in a new system variable, 
@@global.log_error_filter_rules. This variable may be replaced with a per-
component one once those are supported by the component framework, or by other 
suitable mechanisms.


Func-Req 2  Loadable service

The configuration engine shall be dynamically loadable service that 
allows the user to replace the 5.7 compatibilty mode with a rule-based
setup.
"Compatibility mode" within the context of this WL denotes the
support for system variables/start-up options such as --log-error-verbosity
that were re-implemented on top of the new filtering engine as
part of WL#9342.

Func-Req 2.1  Loading

When this engine is loaded and added to the error log stack 
(@@global.log_error_services), it shall parse its configuration and add the 
resulting rules to the filtering engine specified in WL#9342.

Func-Req 2.2  Unloading

When this engine is unloaded or removed from the error log stack 
(@@global.log_error_services), its rules shall no longer be executed by the 
filtering engine specified in WL#9342.

Func-Req 2.3  Configuration changes

When this engine is loaded and configured as part of the error log stack, it 
should update its ruleset in the filtering engine (WL#9342) if its configuration 
is changed (for now, if @@global.log_error_filter_rules is changed).
As the component framework evolves, this mechanism is likely to be migrated (to 
read its data from e.g. a per component instance system variable, a table, or a 
nested configuration of @@global.log_error_services -- for obvious reasons, this 
is beyond the scope of this WL).
I-1  Semantics

NO CHANGE

I-2  Instrumentation

NO CHANGE

I-3  Error and Warnings

NO CHANGE

I-4  Install/Upgrade

NO CHANGE

I-5  Commercial plugins

NO CHANGE.

I-6  Replication

NO CHANGE

I-7  XProtocols

NO CHANGE

I-8  Protocols

NO CHANGE

I-9  Security

NO CHANGE
Future filters may elect to obfuscate parts of plain-text messages however, etc.

I-10  Log Files

YES, unsurprisingly. :)  See I-3

I-11  MySQL clients

NO CHANGE

I-12  Auth plugins

NO CHANGE

I-13  Globalization

NO CHANGE

I-14  Configuration

YES. The filter engine shall use a new system variable for its configuration; 
initially, this system variable shall reside in the MySQL 
server namespace; once the component configuration framework is finalized, it 
may be moved into a more appropriate namespace.

I-15  File formats

NO CHANGE (see log writers for that)

I-16  Plugins/APIs

NO CHANGE. (Will use the new APIs introduced by the service/component framework 
(WL#4102) and "logging services" (WL#9323) and "log filters" (WL#9342) WLs 
however.)

I-17  Other programs

NO CHANGE

I-18  Storage engines

NO CHANGE

I-19  Data types

NO CHANGE
1  SYNTAX

The following details a configuration language for the filter service (WL#9342) 
working on the structured log-info used by the structured logging services 
(WL#9323) that define key/value pairs of various types, and log lines that are 
a collection of these pairs.

The internal model follows a condition/verb/object structure.
Keywords are case-insensitive.

Key:
[foo]     optional particle, may appear 0 or 1 times
{bar}     optional particle, may appear 0, 1, or many times
'a'       character literal (where required for clarity)
"abc"     string literal    (where required for clarity)
'a'..'z'  any one character from the range from 'a' to 'z'
     metasyntactic variable defined below
x | y     either one of x and y is valid here


            ::= IF  THEN 
                           {ELSEIF  THEN }
                           [ELSE ]
                           '.'

           ::=  { AND  | OR  }

            ::=    |
                            

                ::=  |
                           

 ::=   | 

           ::= "err_code"    | "err_symbol"  |
                           "SQL_state"   |
                           "prio"        | "label"       |
                           "time"        |
                           "msg"         | "subsystem"

       ::= "OS_errno"    | "OS_errmsg"   |
                           "source_file" | "source_line" | "function" |
                           "user"        | "host"        |
                           "thread"      | "query_id"

    ::= { 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' }
                           Any name conforming to the above formation
                           rule that does not duplicate a well-known
                           field name.

           ::= "==" | "!=" | ">=" | "<=" | "<" | ">" | "<>"
                           (compare a field and a value to test
                           for equality, etc.:
                          "prio>WARNING": "If severity is INFO or lower ..."
                           As a courtesy, we also convert '=<' into '<='
                           and so on. Since this happens at parse-time,
                           it incurs no run-time overhead, and should
                           prevent user-frustration.)

                ::=  |  |
                            | 

      ::=  | 

                ::= '0'..'9'

                 ::= '+' | '-' | ""

      ::=   {  }

        ::=   {  } 
                             [ '.'  {  } ]

       ::= '"' { ^'"' } '"' | '\'' { ^'\'' } '\''

         ::= a valid MySQL error symbol, e.g. ER_STARTUP

             ::= WARNING | ERROR | INFORMATION
                           (for comparison to prio only)

      ::= NOT EXISTS | EXISTS
                           (test for existence / absence of a field)

               ::= drop | 
                           throttle ['/' ] |
                           set  |
                           unset 

           ::=  ":="  |
                             '="  |
                             " " 

                    ...

that can easily model prior behaviour:


1.1  Verbs

1.1.1  DROP

"DROP" discards the log event in its entirety.
It is used for example to re-implement --log-error-verbosity=...
(discarding events of a certain verbosity):

--log_error_verbosity=2 can thus be noted as

  IF prio >= INFORMATION THEN drop.

"If the priority/severity is >= INFORMATION_LEVEL, discard the event."


1.1.2  THROTTLE

"THROTTLE" applies a rate-limiter to a class of error-events (e.g. events with a 
certain priority/severity, with a certain subsystem, or a certain error number). 
This generalizes the existing Error_log_throttle class:

  IF err_code==1408 THEN throttle 100.

Our condition selects the start-up message (error number 1408 -- "IF 
err_symbol==ER_STARTUP" would have the same effect). "throttle 100" then allows 
100 of the selected message class to pass per minute; any further messages will 
be summarized as a digest.

Other equivalence classes are conceivable, for instance we could allow only a 
limited number of warnings per minute:

  IF prio==2 THEN throttle 500.

Finally, other window-sizes can be specified (in seconds), e.g.

  IF prio==2 THEN throttle 1000/3600.

to allow 1000 warnings per hour (3600: 60 minutes * 60 seconds).
The denominator, if it exists, must be satisfy 0WL#9424 aims to add per-component system variables for configuration; the new 
variable may move into an appropriate namespace (built-in or external service) 
once 9424 becomes available. For the time being, to facilitate testing, we 
provide a variable with the same semantics for its content in the server 
namespace (the only namespace existing at present, and therefore the default). 
The proposed name for this interim global variable is "log_error_filter_rules". 
It is of the string type and may hold 0, 1, or many statements following the 
syntax described above ("SYNTAX").  A reasonable default value would be assumed 
to not be dissimilar to e.g.:

IF prio>=3 THEN drop. IF EXISTS source_line THEN unset.

2.2  "Old-style configuration"

The docs should strongly impress upon the user the suggestion to make a choice 
between old-style (--log-error-verbosity etc.) OR new-style configuration to 
avoid confusion.

2.3  Configuration parsing stage

As a courtesy, during parsing (e.g. "IF prio>=3 THEN drop."),the filter 
configuration engine checks whether it knows the field ("prio"), and if so, 
whether the storage class it expects for the field (integer) matches that of the 
argument (3).  In our example, it does; if it didn't, the configuration engine 
would throw an error.

The same applies if a well-known field appears in the action (e.g. the action 
'set label:="HELO".' in the rule 'IF MySQL_error==1408 THEN set 
label:="HELO".')  label is a well-known field here, its well-known storage class 
is string, and since "HELO" is a string, all's well.  (Otherwise, again, we'd 
throw an error.)


3  OTHER

- Config variable should be NOT_IN_BINLOG, in alignment with --log-error-
verbosity, --log-warnings, --log-syslog-*, and the new --log-error-services.

- if a log-event has LOG_ITEM_LOG_SUPPRESSED set on it (to indicate "we threw 
away n events in the last window"), further rate-limiters should not throw this 
item away, so this information is not lost. no such guarantees are made for the 
drop verb.

- Mostly due to current limitations of our ability to configure
  multiple instances of a given log-service,

  - We do not currently support multiple service pipelines / tee-ing,
    as in

    log-event created
            |
           tee
           /  \
      filter  filter
         |      |
       sink   sink

  - Likewise, multiple instances of the filter language
    being loaded is not a supported configuration, as
    they'd currently both use the same rule-set (derived
    from the value in @@global.log_error_filter_rules),
    which seems without merit:

    log-event created
            |
    filter-language
            |
    filter-language
            |
          sink