WL#11875: update early "buffered" logging to make full use of the new error logging system

Affects: Server-8.0 — Status: Complete

Information passed to the logger before it is told what to log and what file to 
log to is collected in a buffer, and printed once we know where to. This system 
predates the new rich logging system and is somewhat simple. On one hand, it uses 
the value of log_error_verbosity active when the error was thrown, not when it is 
later printed. This means that for events occurring before arguments from the 
command-line or my.cnf (or SET PERSIST) are processed, the default value of 
log_error_verbosity will apply:

mysqld ... --log-error-verbosity=1

for instance will show some warnings generated before the verbosity was set to 
"errors only", which leads to surprising behaviour and bug tickets. Similarly, 
the buffering mechanism, not being aware of the capabilities of the new logging, 
will only buffer information used in the default error-logging format. This 
prevents some of the richer filtering (e.g. --log-error-suppression-list) and 
logging that should be available when more powerful log services are used, and 
thus once more produces slightly surprising behaviour.


Buffered logging should buffer all information passed to it, not just a subset, 
and then flush it to the appropriate logs using the settings and services 
requested by the user once those settings have been obtained. This will provide 
richer logging and better filtering even for early messages, bring the behaviour 
more in line with later logging, and make overall server behaviour less 
surprising.

Func-Req 1 - log-service pipeline

Use the user-configured log-services (filters and sinks) when the buffered log-
events are flushed.
(This can be shown using a non-standard sink such as the JSON-writer, or by 
showing that log_error_suppression_list now also applies to the very earliest 
items.)

Func-Req 2 - full "structured" buffering

Buffering of early log-items shall preserve the entirety of each log-event, not 
just the information needed for traditional logging.
(This can be shown by dumping the early items with a structured log-writer such 
as the JSON one.)

Non-func req 1 - fall back on "traditional logging"

If loadable logging components are configured, but can not be activated, or if in 
the short window after command-line arguments / my.cnf are read, but before the 
InnoDB storage engine (which holds the table containing the configured 
components) is fully available, a fatal issue occurs (e.g., the InnoDB engine 
decides we must shut down), log-events up to that point must still be dumped; the 
default pipeline set-up of "internal filter + internal writer" shall then be 
used, producing traditional error log format. This is still an improvement over 
the current behaviour as log_error_verbosity and log_error_suppression_list would 
be correctly applied to all events up to that point.
(This can be shown by certain InnoDB test cases still passing correctly, which 
they wouldn't if we omitted to flush certain error information before shutting 
down. The required behaviour can be achieved by flushing the error log at 
strategic points before calling exit() in the InnoDB engine.)

In a similar vein, if said window (InnoDB initialization) takes unusually long, 
the error logging subsystem will periodically flush updates to the traditional 
log (as that, being the built-in log-writer, is already available) to keep the 
user updated and show that the server is responsive. If and when InnoDB and 
subsequently external components become available, all events up to that point 
shall be written to the user-selected log-services (with early intermediate 
updates not being emitted to the traditional log a second time).

I-1  Semantics

NO CHANGE

I-2  Instrumentation

NO CHANGE

I-3  Error and Warnings

NO CHANGE

I-4  Install/Upgrade

NO CHANGE

I-5  Commercial plugins

NO CHANGE.

I-6  Replication

NO CHANGE

I-7  XProtocols

NO CHANGE

I-8  Protocols

NO CHANGE

I-9  Security

NO CHANGE

I-10  Log Files

YES.
[LOG01] Buffered messages from early in the start-up now leverage the full power 
of the structured logging system, that is, they are subject to filtering (using 
log_error_verbosity and log_error_suppression_list, or other mechanisms if a 
filter service other than the default is used), and, being written in all 
requested formats, may now feature all non-filtered key/value pairs in writers 
that support this, rather than just the basic "traditional error log" 
information.

I-11  MySQL clients

NO CHANGE

I-12  Auth plugins

NO CHANGE

I-13  Globalization

NO CHANGE

I-14  Configuration

NO CHANGE

I-15  File formats

NO CHANGE (see log files for that)

I-16  Plugins/APIs

NO CHANGE

I-17  Other programs

NO CHANGE

I-18  Storage engines

NO CHANGE

I-19  Data types

NO CHANGE

PREAMBLE

To log error events, we need to know what to log, where to log it to, and so on. 
As this behaviour is user-configurable, we don't have that necessary information 
until the options (start-up parameters, configuration file assignments, and so 
on) have been received, and persisted system variables restored.

Various parts of the server are initialized before this happens. If they try to 
log information to the error-log, we hold on to their information until log-
writing is ready; then we write all these buffered messages at once.

This is a legacy system that is very basic and predates the new rich error 
logger; it therefore does not save as much of the information submitted to it as 
it could. In this WL, we will try to rectify it by providing richer logging, and 
better filtering, to these "early messages."

Specifically, "buffered logging" turns its arguments into a "classic error-log"-
style plaintext log-line, and appends this line to a special buffer. Once an 
error-logging destination is set, this buffer is historically written there 
without further processing. Early logging should instead buffer its input as 
key/value pairs so they can be replayed as rich output to log-writers that 
support this, and so log-filtering can be applied as needed.

PROPOSED SOLUTION

Rather than producing the log-line when the event is submitted, buffer the whole 
structured event, and only filter and format the information at the time of 
printing (when we know how to filter it, how to format it, and where to print 
it).

1 - BUFFERING

log_builtins.cc, knowing the internal representation of the structured event, 
shall buffer these events fully during the "buffered logging" phase. The events 
shall be received by a new internal function, log_sink_buffer() with the same 
signature as all other log-sinks. This "sink" receives the events, puts them in 
an internal list, and deep-copies all keys (in case they were dynamically 
generated) as well as all string values (in case they used local buffers in the 
caller). All other (that is, numeric) values can be shallow-copied (i.e. it will 
suffice to copy the log-item, which in this case will not point to an external-
to-it allocation). This log_sink will be configured as the only item in the 
logging pipeline during start-up (as filtering is deferred until later). The 
special sink will be identified by a new log-service characteristics flag 
("chistic"), LOG_SERVICE_BUFFER. The special sink is enabled automatically 
during the phase it is needed, and then disabled; the user can not explicitly 
configure it to be part of the logging pipeline later. As an internal component, 
we do not need to look it up in the component framework, enabling us to set up 
buffered writing before initialization of the component framework.

2 - FLUSHING

A new function, log_sink_buffer_flush(), will release all buffered events.
Callers can ask it to simply discard the buffered information, as in the case of 
"daemonize" when discard_error_log_messages() is used. In the vastly more common 
case however (when it is called from flush_error_log_messages()), it will first 
seek to print the information. This can happen at two points:

a) once the logging pipe-line as configured by the user is fully set up (both  
the sequence of services and their parameters);

b) before that.

The second case will arise if the server needs to shutdown during start-up, 
before the component infrastructure is fully initialized (and external logging 
components are therefore unavailable). This can be detected by the flush 
function as in this case, the logging pipeline still contains the buffer-sink as 
its only element. In this case, the buffered information will be flushed through 
a default pipeline of "internal filter + internal writer", resulting in a 
"traditional error log". log_error_verbosity and log_error_suppressing_list will 
apply.
(Certain exit-points in InnoDB will be amended to flush-before-exit.)

In the case of a), "flushing after full, correct initialization of the component 
system", the buffered information will simply be filtered and written using the 
user-set configuration, as one might expect.

3 - "TIMEOUT"

As described above in "Non-func req 1", buffering has a pseudo-timeout. That is 
to say, if start-up takes unexpectedly long (e.g. because InnoDB cannot acquire 
a lock on its database), we may not have all configured external log-writers yet 
that we would need to log as configured. A conflicting goal to deferring the 
error-logging until we have all the requested components however is to let the 
user know the server isn't just unresponsive or broken. In such cases, we flush 
updates to the "traditional log" periodically (using the built-in, already 
available log-writer), and flush the complete backlog of events to external 
writers later as they become available.
The time-out for this behavior is a trade-off between wishing to log exactly as 
configured, and not leaving the DBA in the dark in case a problem that needs 
attention arises before full logging is available. To this end, two constants 
are defined, LOG_BUFFERING_TIMEOUT_AFTER, currently set to 60 seconds / 1 
minute, and LOG_BUFFERING_TIMEOUT_EVERY, currently set to 10 seconds. With these 
settings, we'd wait for 60 seconds hoping start-up will complete as expected; if 
that proves not to be the case, we'd flush the error events so far to the 
traditional log (once the first event after that initial time-out is added), and 
then every 10 seconds (once the first event after that subsequent time-out is 
added). DBAs should understand those values to be guide-lines, not guarantees, 
that may be fine-tuned by Oracle engineers in subsequent releases.