WL#13681: Make error-log available for queries from a connection

Affects: Server-8.0   —   Status: Complete

The error-log will be made available via a 
performance_schema table. This feature is 
enabled by default and uses the traditional 
error log; the JSON log can be used alternatively.

To include the log-events written during the 
previous shutdown into the information the DBA 
may review, the server will read the previous 
run's error log file at start-up.

This way access to this information can be 
granted using MySQL server means only.
Creating an account for them on operating 
system level and granting them access via 
SSH / read privileges on file-system level, 
or setting up monitoring tools for them, 
will not be necessary.

Additionally, this change should make writing 
certain test cases simpler for MySQL engineers, 
as a test case may now simply query a log table 
rather than having to parse a log file.


# User Story 1

As a user I want to be able to see the contents
of the error log over a remote MySQL connection
without having to login on the system itself to
see the error log.

# User Story 2

A MySQL Shell or other tool/script wants to
read the error log to:
* gather error conditions during automation
* automate resolving of common error conditions
  during this automation
* give the end user a better error message 
  (instead of 'check the error log for an error')
  this without having to run on the local system
  itself but over a remote MySQL Connection
Func-Req 1 - Table

FR 1.1 - Table name

A new table, error_log, will be created in 
performance-schema.

FR 1.2 - Columns and names

The column format follows that of the traditional 
error log file, as laid out in WL#11009/WL#11150:

2017-06-30T08:33:37.659435Z 1 [Note] [MY-003629] [Server] Found data dictionary 
------------1-------------- 2 ---3-- -----4----- ---5---- ----6---...

1 - "LOGGED"        (timestamp with microsecond precision)
2 - "THREAD_ID"     (MySQL thread ID (not OS'), cf. CONNECTION_ID())
3 - "PRIO"          (System/Error/Warning/Note)
4 - "ERROR_CODE"    (cf. WL#11009, prefixed with "MY-" for googleability)
5 - "SUBSYSTEM"     (cf. WL#11150 Server/Repl/InnoDB ...)
6 - "DATA"          (Textual error message / JSON-record)

Note that the above is quoted from earlier WLs
pertaining to the traditional log.
The field delimiters "[" and "]" that are used 
in the traditional log are not considered part
of the value they enclose (e.g. the value is
"MY-003629", not "[MY-003629]"), and the
delimiters are not used in the error_log table.


FR 1.3 - Field values

FR 1.3.2 - Traditional log mode

For traditional logging, the rightmost field shall 
contain only the message.
For the purposes of parsing, the message is 
currently defined as "all characters after the 
subsystem field and its trailing delimiter (' '), 
until the row delimiter ('\n')."

FR 1.3.3 - JSON log mode

For JSON logging, the final field will contain a
JSON representation of the log-event (same as in 
the log file, space permitting).

FR 1.4 - Rows

FR 1.4.1 - Information from current run

Verbosity settings and other filtering that 
selects/modifies what is written to an error 
log file also applies to the values shown in 
the table. Thus, log file and log table should 
contain the same rows, fields, and values, 
for as many rows from the latest row backwards 
as fit in the ring-buffer (see below, 1.4.3).

FR 1.4.2 - Information from previous runs

The MySQL server applies verbosity and other 
filter settings before writing entries to its 
error log files.

With the proposed implementation, the server 
will try to restore error logging data from 
previous runs at start-up. Rows are restored 
as present in the log file:

FR 1.4.2.1 - Absent rows

Rows that were filtered out and therefore not 
written to the log are not restored at start-up.

This specifically means that when the server 
has previously run with low verbosity, re-starting 
it with higher verbosity won't be able to restore 
and make available additional log-events in the log 
table.

FR 1.4.2.2 - Modified rows

Rows that were modified and logged as such 
will be restored with the same modifications.

FR 1.4.3 - Number of rows / Ring-buffer

Error log information is kept in a ring-buffer 
of fixed size. Once the buffer is full, one 
or several entries may be discarded to make room 
for every new event that is added. Discarded events
thus become unavailable for inspection in the
table (but not in other logs such as the file log).

FR 2 - Loading entries from previous runs

At start-up, the MySQL server will try to restore 
error log events from the preceding run by reading 
the error log file (as buffer space allows).

FR 2.1 - Supported log file formats

The log-core is extended to support a parse-function
in log-services. This WL adds parsing for the
"traditional" log format, and the JSON-formatted log.

FR 2.2 - Activation and format selection

The server will consider the first (leftmost)
log-sink in @@global.log_error_services that
provides a parse-function the "authoritative
sink." This sink will govern the format
(e.g. traditional or JSON) that is shown in 
performance_schema.error_log and that is 
read at start-up.

2.2.1 - Default behavior

That variable's default value activates 
the built-in filter and (traditional) sink.

The server will therefore by default try 
to restore information from the traditional 
log and show error messages in the rightmost 
column of the table, "data".

2.2.2 - Restoring from the traditional log

The server will likewise try to restore
from the traditional log if the user has
changed log_error_services, as long as
the leftmost parsing sink is log_sink_internal.

2.2.3 - Restoring from the JSON log

If the user changes log_error_services so that
the leftmost or parsing sink is log_sink_json, 
the server will attempt to restore information 
from a JSON-formatted log. In this case, the 
rightmost column of the table, "data", shows 
the event in JSON-formatting.

2.2.4 - Deactivating the restore function

If the user changes log_error_services so that
no parsing sink is active (i.e. neither 
log_sink_internal nor log_sink_json are listed
in the variable), the server will not attempt 
to restore a log.

(This is not hard-coded, but a result of no 
other parsers being currently available. 
We are free to add further log-writers with 
parsers later. For instance, an XML log-sink 
with a basic parser could add the capability 
to  display XML formatted entries in the 
performance_schema.error_log table.)

FR 2.3 - Changes to logging subsystem

FR 2.3.1 - Traditional log writer

The traditional log writer log_sink_trad 
will be prevented from outputting '\n' except 
as a record separator identifying the end of 
line. This should not raise problems with UTF-8.
If a subsystem submits arguments in a character 
set whose multi-byte characters may contain a 
'\n' however, these newlines will be modified 
in the interest of preserving the log line. 
This will change (and may invalidate) the 
multi-byte character, but it is considered 
preferable to render a character invalid to 
rendering a line invalid.
I-1  SQL Semantics

NO CHANGE

I-2  Instrumentation

INSTR01
A new performance_schema table, "error_log", is added.

INSTR02
New status variables are added:

Error_log_buffered_bytes:
The number of bytes currently used in the ring-buffer.
This is of interest while the this used size is smaller than the allocated size, 
i.e. while the ring-buffer is still filling up.

Error_log_buffered_events:
The number of events currently in the ring-buffer.
Note that this value can decrease when several small events are expired to make 
room for one larger event.

Error_log_expired_events:
The number of events we had to drop to make room for more.
Clarification: This counts only events that actually previously were buffered.
When a log that is larger than the ring-buffer is loaded at start-up, log-events
that wouldn't fit are not loaded in the first place, and therefore are not
reflected in the value of this counter.

Error_log_latest_write:
Microsecond precision time-since-epoch of last write to the ring-buffer.

I-3  Error and Warnings

NO CHANGE

I-4  Install/Upgrade

NO CHANGE

I-5  Commercial plugins

NO CHANGE.

I-6  Replication

NO CHANGE

I-7  X-Protocol

NO CHANGE

I-8  Protocols

NO CHANGE

I-9  Security

NO CHANGE

Online log is a performance_schema table and requires appropriate privileges to 
read.

I-10  Log Files

LOG01
The traditional log writer should enforce the absence of '\n' in 
its output (other than a single '\n' as separator between log events).
This promotes previous Best Practice to a guarantee.

I-11  MySQL clients

NO CHANGE

I-12  Auth plugins

NO CHANGE

I-13  Globalization

NO CHANGE

I-14  Configuration

NO CHANGE

The length of the backlog could be explicitly configured, but this is not 
currently planned. Instead, the length of the backlog will be indirectly 
dependent on what filtering is configured.

I-15  File formats

NO CHANGE

I-16  Plugins/APIs

NO CHANGE

I-17  Other programs

NO CHANGE

I-18  Storage engines

NO CHANGE

I-19  Data types

NO CHANGE
GENERAL MECHANISM

- We grab a mem blob of the desired size.
- We load min(blob_size, log_size) to the buffer.
  - This specifically means we don't care about what's in the events.
    We're not looking to replay from a specific point (e.g. start of
    last shutdown). We take all the events we can get!
  - We try to access the log specified by --log-error=... (plus any
    required suffixes like ".00.json", using a mechanism for
    constructing the log-file name to read from analog to that for
    constructing the log-file to write to). This specifically means 
    that any data that has been log-rotated away becomes inaccessible
    to us, as we don't know the file's new name, location, compression,
    and so on:
    The "horizon" of the oldest data we can see is bounded by 
    ring-buffer size, contents of the log-file / when log-rotation
    last happened, and by whether the configured primary log-sink
    has a log-parser.

REPLAYING THE JSON LOG

- The JSON log is well-formed, and characters should be escaped as per RfC.
  This should aid in replay.
- On read, the first full event (and all subsequent events) will be
  identifiable as a line beginning with "\n{".
- Lines that do not begin with that magic will be discarded.
- Timestamps will be restored from the ISO8601 representation.

REPLAYING THE TRAD LOG

- The traditional log writes out items as-is. While the messages (i.e. the
  message template strings that come with the MySQL server) themselves are
  in UTF-8, the arguments to substitutions might not be (i.e. they may in
  theory be binary that has no charset at all, or may be in the charset of
  a connection if arguments from the command-line are included).

  On writing an event, the trad log sink (not the logger core) must convert
  '\n' to a replacement character.

  - This will not affect UTF-8.
  - It may break wide-chars in a non-UTF-8 encoding.
    This is considered acceptable.

  A traditional log thus sanitized should be safe to read in, with
  events being stored as individual lines separated by '\n'.

- On read, the first full event (and all subsequent events) will be
  identifiable as a line beginning with "\n20".

- Lines that do not begin with that magic will be discarded.

- Timestamps will be restored from the ISO-8601 representation (as nothing
  else is available).

  - This needs to correctly restore the timezone information, as the
    traditional log may be in UTC or local mode.

    - Fortunately, the information is self-descriptive and complete,
      i.e. it will be parsed correctly no matter whether the settings
      at the time of writing the log and the settings at the time of
      restoring the log match.

INTERACTION WITH BUFFERED LOGGING.

a) When logging to a log-file

At start-up, we flush the buffered log-events.
Then, the log file is read and added to the error-log 
ring-buffer.
This adds error-logging information from the previous run 
as well as the information from this run's start-up.

b) When logging to stderr

This is a common scenario when using mysql-test-run.
In this case, information from the previous run is unavailable,
and there is no log-file to read.
To provide at least the current start-up's log-events,
we write them to the ring-buffer as the buffered log-items are flushed.

NOTES

=> This WL might make writing some (but not all, as we'll still have to verify 
that the sinks correctly write their respective formats) tests that check log 
output easier.


The log-services class is extended to support two optional methods:

- parse_log_line: parses a log-line written by that log-service,
  and on success adds the event to the ring-buffer.
  Services that support this should return the characteristic
  LOG_SERVICE_LOG_PARSER when queried.
  This is an optional feature; log-sinks are not required to support it.

- get_log_name: Return the log-name of a specific instance of
  the log-service in question (when a pointer to that instance is
  provided by the caller), or the default log-name otherwise.
  This can be used by the server to determine the default log-name
  (based on --log-error=...) for that log-service so that log can
  be opened at start-up.


NOTE ON TIMESTAMPS

Timestamps in our logs are in ISO8601 format and 
include timezone information. --log-timestamps=... 
select which timezone (UTC or "SYSTEM") we write 
timestamps for; either selection should generally 
restore correctly, even if settings (system timezone 
or log-timestamps) change between runs. The common 
issues that all logs with non-UTC timestamps tend 
to share (non-monotonic anomalies with DST etc.) 
apply; the server may in certain cases (such as 
the clock being turned back) modify timestamps 
to ensure timestamps are strictly increasing.

log-timestamps affects how we write timestamps to 
our log files. performance_schema.error_log, being 
a table, ignores @@global.log_timestamps and 
observes @@session.time_zone instead. It is suggested 
that sessions reading performance_schema.error_log 
execute

  SET @@session.time_zone=@@global.log_timestamps;

so that timestamps in the error log file and the 
error log table agree and need to be correlated.