WL#11875: update early "buffered" logging to make full use of the new error logging system
Affects: Server-8.0 — Status: Complete — Priority: Medium
Information passed to the logger before it is told what to log and what file to log to is collected in a buffer, and printed once we know where to. This system predates the new rich logging system and is somewhat simple. On one hand, it uses the value of log_error_verbosity active when the error was thrown, not when it is later printed. This means that for events occurring before arguments from the command-line or my.cnf (or SET PERSIST) are processed, the default value of log_error_verbosity will apply: mysqld ... --log-error-verbosity=1 for instance will show some warnings generated before the verbosity was set to "errors only", which leads to surprising behaviour and bug tickets. Similarly, the buffering mechanism, not being aware of the capabilities of the new logging, will only buffer information used in the default error-logging format. This prevents some of the richer filtering (e.g. --log-error-suppression-list) and logging that should be available when more powerful log services are used, and thus once more produces slightly surprising behaviour. Buffered logging should buffer all information passed to it, not just a subset, and then flush it to the appropriate logs using the settings and services requested by the user once those settings have been obtained. This will provide richer logging and better filtering even for early messages, bring the behaviour more in line with later logging, and make overall server behaviour less surprising.
Func-Req 1 - log-service pipeline Use the user-configured log-services (filters and sinks) when the buffered log- events are flushed. (This can be shown using a non-standard sink such as the JSON-writer, or by showing that log_error_suppression_list now also applies to the very earliest items.) Func-Req 2 - full "structured" buffering Buffering of early log-items shall preserve the entirety of each log-event, not just the information needed for traditional logging. (This can be shown by dumping the early items with a structured log-writer such as the JSON one.) Non-func req 1 - fall back on "traditional logging" If loadable logging components are configured, but can not be activated, or if in the short window after command-line arguments / my.cnf are read, but before the InnoDB storage engine (which holds the table containing the configured components) is fully available, a fatal issue occurs (e.g., the InnoDB engine decides we must shut down), log-events up to that point must still be dumped; the default pipeline set-up of "internal filter + internal writer" shall then be used, producing traditional error log format. This is still an improvement over the current behaviour as log_error_verbosity and log_error_suppression_list would be correctly applied to all events up to that point. (This can be shown by certain InnoDB test cases still passing correctly, which they wouldn't if we omitted to flush certain error information before shutting down. The required behaviour can be achieved by flushing the error log at strategic points before calling exit() in the InnoDB engine.) In a similar vein, if said window (InnoDB initialization) takes unusually long, the error logging subsystem will periodically flush updates to the traditional log (as that, being the built-in log-writer, is already available) to keep the user updated and show that the server is responsive. If and when InnoDB and subsequently external components become available, all events up to that point shall be written to the user-selected log-services (with early intermediate updates not being emitted to the traditional log a second time).
I-1 Semantics NO CHANGE I-2 Instrumentation NO CHANGE I-3 Error and Warnings NO CHANGE I-4 Install/Upgrade NO CHANGE I-5 Commercial plugins NO CHANGE. I-6 Replication NO CHANGE I-7 XProtocols NO CHANGE I-8 Protocols NO CHANGE I-9 Security NO CHANGE I-10 Log Files YES. [LOG01] Buffered messages from early in the start-up now leverage the full power of the structured logging system, that is, they are subject to filtering (using log_error_verbosity and log_error_suppression_list, or other mechanisms if a filter service other than the default is used), and, being written in all requested formats, may now feature all non-filtered key/value pairs in writers that support this, rather than just the basic "traditional error log" information. I-11 MySQL clients NO CHANGE I-12 Auth plugins NO CHANGE I-13 Globalization NO CHANGE I-14 Configuration NO CHANGE I-15 File formats NO CHANGE (see log files for that) I-16 Plugins/APIs NO CHANGE I-17 Other programs NO CHANGE I-18 Storage engines NO CHANGE I-19 Data types NO CHANGE
PREAMBLE To log error events, we need to know what to log, where to log it to, and so on. As this behaviour is user-configurable, we don't have that necessary information until the options (start-up parameters, configuration file assignments, and so on) have been received, and persisted system variables restored. Various parts of the server are initialized before this happens. If they try to log information to the error-log, we hold on to their information until log- writing is ready; then we write all these buffered messages at once. This is a legacy system that is very basic and predates the new rich error logger; it therefore does not save as much of the information submitted to it as it could. In this WL, we will try to rectify it by providing richer logging, and better filtering, to these "early messages." Specifically, "buffered logging" turns its arguments into a "classic error-log"- style plaintext log-line, and appends this line to a special buffer. Once an error-logging destination is set, this buffer is historically written there without further processing. Early logging should instead buffer its input as key/value pairs so they can be replayed as rich output to log-writers that support this, and so log-filtering can be applied as needed. PROPOSED SOLUTION Rather than producing the log-line when the event is submitted, buffer the whole structured event, and only filter and format the information at the time of printing (when we know how to filter it, how to format it, and where to print it). 1 - BUFFERING log_builtins.cc, knowing the internal representation of the structured event, shall buffer these events fully during the "buffered logging" phase. The events shall be received by a new internal function, log_sink_buffer() with the same signature as all other log-sinks. This "sink" receives the events, puts them in an internal list, and deep-copies all keys (in case they were dynamically generated) as well as all string values (in case they used local buffers in the caller). All other (that is, numeric) values can be shallow-copied (i.e. it will suffice to copy the log-item, which in this case will not point to an external- to-it allocation). This log_sink will be configured as the only item in the logging pipeline during start-up (as filtering is deferred until later). The special sink will be identified by a new log-service characteristics flag ("chistic"), LOG_SERVICE_BUFFER. The special sink is enabled automatically during the phase it is needed, and then disabled; the user can not explicitly configure it to be part of the logging pipeline later. As an internal component, we do not need to look it up in the component framework, enabling us to set up buffered writing before initialization of the component framework. 2 - FLUSHING A new function, log_sink_buffer_flush(), will release all buffered events. Callers can ask it to simply discard the buffered information, as in the case of "daemonize" when discard_error_log_messages() is used. In the vastly more common case however (when it is called from flush_error_log_messages()), it will first seek to print the information. This can happen at two points: a) once the logging pipe-line as configured by the user is fully set up (both the sequence of services and their parameters); b) before that. The second case will arise if the server needs to shutdown during start-up, before the component infrastructure is fully initialized (and external logging components are therefore unavailable). This can be detected by the flush function as in this case, the logging pipeline still contains the buffer-sink as its only element. In this case, the buffered information will be flushed through a default pipeline of "internal filter + internal writer", resulting in a "traditional error log". log_error_verbosity and log_error_suppressing_list will apply. (Certain exit-points in InnoDB will be amended to flush-before-exit.) In the case of a), "flushing after full, correct initialization of the component system", the buffered information will simply be filtered and written using the user-set configuration, as one might expect. 3 - "TIMEOUT" As described above in "Non-func req 1", buffering has a pseudo-timeout. That is to say, if start-up takes unexpectedly long (e.g. because InnoDB cannot acquire a lock on its database), we may not have all configured external log-writers yet that we would need to log as configured. A conflicting goal to deferring the error-logging until we have all the requested components however is to let the user know the server isn't just unresponsive or broken. In such cases, we flush updates to the "traditional log" periodically (using the built-in, already available log-writer), and flush the complete backlog of events to external writers later as they become available. The time-out for this behavior is a trade-off between wishing to log exactly as configured, and not leaving the DBA in the dark in case a problem that needs attention arises before full logging is available. To this end, two constants are defined, LOG_BUFFERING_TIMEOUT_AFTER, currently set to 60 seconds / 1 minute, and LOG_BUFFERING_TIMEOUT_EVERY, currently set to 10 seconds. With these settings, we'd wait for 60 seconds hoping start-up will complete as expected; if that proves not to be the case, we'd flush the error events so far to the traditional log (once the first event after that initial time-out is added), and then every 10 seconds (once the first event after that subsequent time-out is added). DBAs should understand those values to be guide-lines, not guarantees, that may be fine-tuned by Oracle engineers in subsequent releases.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.