WL#9342: Logging services: log filter (filtering engine)
Affects: Server-8.0 — Status: Complete
LOG FILTER/S The umbrella WL#9323 defines "logging TNG", which has as one of its main goals "structured logging" (i.e. logging entries that have data beyond a single plaintext log message, such as a separate error number, etc.). It stands to reason that if we have these rich data, it becomes easy and desirable to filter them. This implies the following points of order: - implement a new filtering engine that can handle the structured log events defined in WL#9323. The first implementation aims to maintain compatibility with the features it replaces; its primary goal is to change the code over to the new model. - that filtering engine will the built-in default. the user can not accidentally misconfigure their system to have no filters at all. - by default, the engine tries to emulate 5.7 behavior (heeding current configuration variable --log-error-verbosity and setting up the same rate- limiting for selected "spammy" messages, and so on) - throttles for binlog.cc, connection_handler_per_thread.cc log_event.cc - the logging framework shall expose the necessary calls for plug-in services to implement a filter. It should be possible to run such a filter instead of the built-in default. This will allow users to offer alternative, more powerful filtering, allow power-users to create custom filter plug-in services, and so on.
Preamble A filter service has two major functional parts; it has to identify log messages it needs to act on (e.g. select "informational" type messages when a low verbosity is configured), and then apply an action to them (in this case, suppress the line entirely). In a manner of speaking, we support selection and projection in the baseline implementation. Selection: A primary concern of a filter are log messages (aka "log lines"). The filter will usually use certain fields (such as "error code" or "priority") to make its decisions on whether or not an entire line should appear (or be gagged, throttled, etc.). This functionality can be used to emulate previous behaviour, and more. Projection: Now that a log entry may have multiple fields (such as error-code, message, etc.), the filter service may discard individual fields at its choosing. Operating on field-level constitutes new behaviour (as previously, there was only one data-item, a plain-text error message). Finally, an "action" may also generate synthetic new fields. The filter service is provided with key-value pairs describing the fields, a count of items in the collection, and a bit vector of seen types in this collection. It may modify these according to the following rules: Non-Func-Req 1 Actions Actions are applied to entire log lines (e.g. "suppress"), or to individual key/value pairs within a log line (e.g. "delete field"). Filter services may implement different types of actions, for instance, a service may offer to rewrite messages of the string type. The "stock" filter service specified in this WL MUST implement the actions needed to replicate current behaviour ("suppress line" for rate-limiting and verbosity-filtering based on error severity). A service MAY implement actions beyond that (e.g. for string editing, such as to apply a 'basename' type operation to a string containing a file path and name etc.). Non-Func-Req 1.1 Suppression of log lines A filter MAY elect to discard a prospective log line in its entirety. Func-Req 1.1 Suppression of log lines Suppression of entire lines SHALL be available to the user as emulation of the previous log_error_verbosity and error rate throttling functionalities. Non-Func-Req 1.2 Suppression of key/value pairs A filter MAY remove individual key/value pairs from the collection. (It MUST adjust the item count if it does so. It SHOULD adjust the bit vector of seen types if it removes the last item of a given type.) Func-Req 1.2 Suppression of key/value pairs Where 1.1 deals with suppression of entire "log lines", an implementation MAY also suppress individual key/value pairs within those lines, i.e. it helps us gag individual fields within a line. Non-Func-Req 1.3 Creation of key/value pairs A filter MAY add key/value pairs to a log line. Since "newer" items override older ones, this lets us override defaults generated by log item sources. Func-Req 1.3 Creation of key/value pairs A simple example for this would be to select messages with a specific error code, and to override their label: one issue with the previous logic was that very important informational messages, such as "server ready on port ...", would either get suppressed by commonly used verbosity settings because of their "informational" classification, or would have to be tagged with ERROR_LEVEL, guaranteeing their appearance in the log, but forcing the misleading label of "ERROR" when none had occurred. NB Not all log writer plugins may support separate labels and severities. Non-Func-Req 1.4 Throttling/rate-limiting The default filter should offer rate-limiting similar to the existing Log_throttle class's as one of its functionalities. Rate-limiting should be available as an ACTION-VERB for any CONDITION. That is to say, the CONDITION may create various equivalence classes, such as: "allow only a limited number of messages with error code 15 per minute (but leave messages with other error codes unaffected)" "allow only a limited number of messages of the information-type per minute (but leave messages with higher priorities, such as errors, unaffected)" Func-Req 1.4 Throttling/rate-limiting In the initial implementation, the default filter's rate-limiting shall replace/emulate the previous use of the Log_throttle class for error messages. Summary messages will be standardized; this may result in certain test-.result files requiring appropriate updates. Section 2 Filters Non-Func-Req 2.1 Conditions A condition consists of a comparator ("equals" etc.) and a reference item to compare it with: A condition with a comparator of "greater than" and a reference item that has a type of LOG_PRIORITY and a value of 0 will match all log-lines containing a LOG_PRIORITY item with a value of 1 or 2. When testing just for presence or absence of an item with a given key, a value need not be set on the reference item. Non-Func-Req 2.1.1 Data types in conditions The default plugin must at minimum support comparison of integer-form data as is required to filter based on log line severity/priority, and to throttle based on error number. A log filter service may implement further comparisons, e.g. for string-type data. Func-Req 2.2 Conditions The default filter service MUST be able to model at least the cases required to emulate pre-patch behaviour, i.e. it must be able to select lines for throttling (based on error code) and for suppression (based on log_error_verbosity). It MAY implement further comparators.
I-1 Semantics NO CHANGE, until service configuration is more clearly defined. The first implementation of the filter will initially support the current UI (log_error_verbosity system variable) for configuration, making it a drop- in replacement for the current technology. The server will set up rules to replicate the current throttling behaviour etc. I-2 Instrumentation NO CHANGE I-3 Error and Warnings YES. Summaries of rate-limiting should be uniform across throttled error messages. I-4 Install/Upgrade NO CHANGE I-5 Commercial plugins NO CHANGE. Changing filtering to be a service opens the way for commercial plugins, but creating one is beyond the scope of this WL entry. I-6 Replication NO CHANGE I-7 Xprotocol NO CHANGE I-8 Protocols NO CHANGE I-9 Security NO CHANGE Future filters may elect to obfuscate parts of plain-text messages however, etc. I-10 Log Files YES, unsurprisingly. :) See I-3 I-11 MySQL clients NO CHANGE I-12 Auth plugins NO CHANGE I-13 Globalization NO CHANGE I-14 Configuration NO CHANGE. Compatibility with the current method of configuration is a goal. While the filter engine described in this WL SHALL heed the 5.7 variables, other filter services NEED NOT. I-15 File formats NO CHANGE (see log writers for that) I-16 Plugins/APIs NO CHANGE. (Will use the new APIs introduced by the "services -- the next generation" and "logging -- the next generation" WLs however.) I-17 Other programs NO CHANGE I-18 Storage engines NO CHANGE I-19 Data types NO CHANGE
1 CONFIGURATION (compatibilty with --log_error_verbosity) Some 5.7 behaviour is to be emulated in the filtering component. This can be done with relative ease by setting up the appropriate filter rules. 2 CONCURRENCY While each error has its own grab bag of fields and string buffer (and therefore requires no locking), the built-in filter's rule-set will be shared among concurrent calls. Therefore, the following cases are expected: - change (clear/append/modify) rule-set => exclusive lock - apply filters (i.e. check conditions; apply action/verb on match) => shared lock - if a rule has internal state (e.g. throttling -- how many of the same message have we seen in this window? when will the window end?) and an update of this state is required, we'll need to upgrade to an => exclusive lock 3 FILTERING STAGE: MATCHING STAGE At run time, the filter iterates over its rule-set. For each rule, it the condition contains a well-known item, it looks for an item of that type in the event. If the condition contains an ad hoc-item, it looks for an item of any ad hoc-time with the given key in the event. If there is a match, the filter will verify whether the storage class of the value in the event and that in the condition are either both strings, or both not. If that's not the case, it flags an error. Otherwise, it now compares both values using the requested comparator, and reports the result.
Copyright (c) 2000, 2021, Oracle Corporation and/or its affiliates. All rights reserved.