WL#6661: Error logging: Allow control of verbosity

Affects: Server-5.7 — Status: Complete

Description
Dependent Tasks
High Level Architecture
Low Level Design

We want to give the DBA control of how "verbose" the MySQL Server should be
(error/warning/note) when writing to the error log. This work is partly
motivated by
http://datacharmer.blogspot.jp/2012/03/mysql-56-too-verbose-when-creating-
data.html

A possible design could be three levels of verbosity:
1) Write only "Errors"
2) Write "Errors + Warnings"
3) Write "Errors + Warnings + Notes" (default)

Two other requests we have gotten related to error logging from MEM:
- Change the timestamp format printed to a more standard format (syslog)
- Convert existing fprintf(stderr, ...) in the server layer to use
  error logging API whenever possible.
These should also be handled in the scope of this worklog.

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-2.html
http://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html#sysvar_log_error_verbosity
http://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html#sysvar_log_timestamps
http://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html#sysvar_log_warnings
http://dev.mysql.com/doc/refman/5.7/en/error-log.html (last parts about verbosity 
and format)

WL#6613: Refactor logging code - split logging and binlogging code

SCOPE

#0a Certain warnings may be outside the immediate scope of this
    worklog, such as warnings printed in emergency states where
    the log system is considered non-functional or uninitialized.

  b Not all of InnoDB uses the mysqld-proper logging facilities
    at this point. WL#6755 "InnoDB: Use of error logging API in
    InnoDB" is designed to amend this.   Inno warnings will be
    subject to the new facilities to the extent that they use them;
    increasing such use falls into the scope of WL#6755, not WL#6661.



NOMENCLATURE

#1  In the context of this worklog, information items for the error
    logging facilities are assumed to be tagged with one of three
    levels of urgency.  The most urgent type of message we log is
    the error (message), followed by the warning, and the note,
    respectively.

    In the server, this is modeled by sql_print_(errror|warning|information),
    and (ERROR|WARNING|INFORMATION)_LEVEL for use by the server proper,
    and MY_(ERROR|WARNING|INFORMATION)_LEVEL for plugins, respectively.

    The Windows event log uses error, warning, and information levels.

    The UNIX syslog offers Notice and Information levels, with Notice
    conveying more urgency.

    This document shall use the user-visible nomenclature of error, warning,
    note however, so for symmetry with the errors, warnings, and notes we
    issue to clients is maintained.  This is done in the hope of keeping
    the user experience consistent and accessible.



HELP / --help

#2a --help shows help, as before

  b The default verbosity shall be errors only, as per the request
    that lead to the worklog in the first place, but this shouldn't
    make much of a difference as we have a short code-path anyway
    (no loading of plugins etc.).



SHOW VARIABLES / --help --verbose

#3a --help --verbose shows help and variables, as before.

  b The default verbosity shall be errors only, as per the request
    that lead to the worklog in the first place, so that failures
    to load plugins (that could result in fewer options being shown)
    will still be flagged, but the output will remain as terse as
    circumstances allow otherwise.


VERBOSITY

--log_error_verbosity

#4a A new system-variable log_error_verbosity shall govern the
    verbosity of error messages not issued to the client, but
    logged on the server side, to the error log file, syslog,
    Windows event log, or similar facilities.

  b The default setting for log_error_verbosity is
    ERRORS+WARNINGS+NOTES.

    This maintains the current default of not gagging any of
    (previously ungaggable) errors, warnings, and notes that
    may be logged to the error logging facilities.

  c log_error_verbosity requires a numeric argument.
    The values 1, 2, 3  shall request errors, +warnings, +notes,
    respectively.

  d log_error_verbosity shall be settable on the command-line at startup.

  e log_error_verbosity shall be a global variable settable at runtime by
    those with the SUPER privilege.



--verbose

#5  Use of --verbose without --help shall result in an error being
    issued to the points the user to --log_error_verbosity.



--log_warnings

#6  Information items previously governed by --log_warnings/-W
    shall now be governed by --log_error_verbosity, like so:

  a Items previously printed when log_warnings=1 or higher was
    requested will now count as warnings and be printed at
    log_error_verbosity=2 or higher.

    [NOTE: log_warnings=1 was the previous default. These
           warnings will still be printed by default. cf. #4b]

  b Items previously printed when log_warnings=2 was requested
    will now count as notes and be printed at log_error_verbosity=3.

    [NOTE: This additional information will now also be printed
           by default. cf. #4b]

  c Gagging of all log_warnings items, previously achieved by
    requesting log_warnings=0, will now be achieved by setting
    log_error_verbosity=1, "errors only."



#7  Use of --log_warnings shall result in a deprecation warning,
    pointing the user to --log_error_verbosity.



#8  Use of --log_warnings shall, until such a time as it is removed,
    attempt to emulate its previous behavior:

  a --log_warnings= shall result in an appropriate --error_log_verbosity
    to be set, i.e. an explicit

      --log_warnings=0   will set --verbosity=1, errors only
      --log_warnings=1   will set --verbosity=2, +warnings
      --log_warnings=2   will set --verbosity=3, +notes


#9  Current behavior of -W (with no numerical argument) increases
    the current value by 1 for each time it is given. This shall be
    maintained while -W exists.

    Absent of any -W, the first -W currently therefore increases
    the default of 1 by 1 for a total of 2. It follows that the only
    way to set level 0 is therefore the explicit -W0.

    Henceforth, a) the effects formerly controlled by log_warnings
    shall be controlled by log_error_verbosity (cf. #6) and
    b) log_error_verbosity shall default to maximum verbosity (cf. #4b).
    The effect formerly requested by a sole -W (maximum verbosity) is
    already the default, so -W as the lone option will now have no
    visible effect.



COMBINATIONS

#A  No combination of options will cause the final verbosity
    to be outside of the allowed range (1..3).



#B  If log_error_verbosity= or log_warnings=
    are both given, or either form appears multiple times, the
    assignment that is processed last (the rightmost assignment
    on the command-line, or the bottommost assignment in a .cnf)
    shall prevail.



BOOTSTRAP / --bootstrap

#C  Like --help, --bootstrap shall lower the default for error_log_verbosity
    to "errors only".  This is done in an attempt not to "flood"
    the user with messages at install time, and potentially have them miss
    error messages.  This directly addresses one of the suggestions that led
    to this WL item.



TIMESTAMPS

[NOTE: The following applies to log-files.  Log tables already use
       the TIMESTAMP type and can be converted on SELECT at will.]



#D  Log file lines shall be timestamped in ISO 8601 / RFC 3339
    format %04d-%02d-%02dT%02d:%02d:%02d.%06lu%s, with the tail %s
    being Z for zulu time, and of the form +|- hh:mm otherwise.

    Log tables shall be changed to likewise support microsecond
    resolution in timestamps; new column type shall be TIMESTAMP(6).



#Ea A new command-line option / system-variable --log-timestamps
    can be used to select UTC or local time timestamps, respectively.

  b Valid values shall be:    SYSTEM | UTC

  c Default shall be:         UTC

  d The value of the global variable may be changed at runtime by
    those with the SUPER privilege.

  e "SYSTEM" is an analog of current behavior (use of localtime_r(),
    plus now the timezone variable).



OTHER CHANGES
(attn docs team)

Clients:
The default function for printing error messages will now
prepend the running program's name (just the name, for
"bin/mysqladmin.exe", we'd get "mysqladmin", no path, no ".exe").
This is particularly useful for debugging mysql_upgrade, which
in turn calls various other binaries: the new behavior makes it
much easier to tell whether a warning was thrown by mysql_upgrade,
or by one of its child process (and in the latter case, which).
Severity levels (which are added by the default wrapper around above
function) also now have brackets around them, so lines that used to
start with e.g. "Warning"/"WARNING" will now start "my_name: [Warning] "
These default reporters can be overridden, but in practice, the clients
don't, while the server does (early in its initialization, from whence
it goes through our new-and-improved function that adds ISO timestamps,
filters by --log_error_verbosity, etc.). So as a general rule, "this only
applies to clients."

Fixes for logging, after discussion with Mark:

The server proper prints the MySQL thread ID (connection ID)
to its general and slow query logs.

In contrast, it used to print the server's process ID to the
error log. This has changed; for consistency, this column now
also shows a connection ID in the rare case that that's applicable,
and 0 otherwise (e.g. when the condition was raised in a thread
that has no connection).

The server will now report its process ID at start-up instead.
This message is logged early at startup, so the PID will be
available to monitoring even if the server is accidentally
double-started by the DBA (in which case we likely won't get
all the way to ER_STARTUP, and won't see both PIDs in the pid-file).

Some legacy reporting in InnoDB prints a thread ID to the error
log that is not the MySQL thread-ID / connection ID, but the system's
thread ID. To distinguish this from connection IDs, this thread ID is
printed in hexadecimal. WL#6661 now adds a proper 0x prefix to those
hex values, for clarity when the value does not contain a...f.

James Day observes:

Doesn't seem to meet key diagnostic needs yet, Notes are not of equal
importance. Warnings aren't of equal importance either.

Some examples:

1. Critical notes that must never be suppressed, whatever the setting, Always
logged to error log because they are critical diagnostic and status information
that cannot normally be recovered later by changing a setting (looking back in
time is often needed for these):

2013-03-01 20:01:38 2752 [Note] InnoDB: 1.2.10 started; log sequence number 1625987
2013-02-25 22:25:04 5732 [Note] bin\mysqld: ready for connections.
Version: '5.6.10-enterprise-commercial-advanced'  socket: ''  port: 3306  MySQL
Enterprise Server - Advanced Edition (Commercial)

Replication start and stop IO and SQL thread positions.

2013-03-01 23:08:53 2752 [Note] InnoDB: Starting shutdown...
2013-03-01 23:08:54 2752 [Note] InnoDB: Shutdown completed; log sequence number
1804146
2013-02-25 22:28:08 5732 [Note] bin\mysqld: Normal shutdown

2. High noise level warnings, best treated as low priority warnings or notes.
These are noise because of their very high frequency if they happen:

client connection errors
warning that statement may not be safe for statement replication mode

Both of those can generate huge numbers of error log entries.

3. Low priority notes:

Routine plugin start and stop are examples.

So three verbosity settings might be:

1.
Errors: all
Warnings: critical only
Notes: critical only

2.
Errors: all
Warnings: critical, normal priority
Notes: critical, normal (but no notes currently expected to be normal priority).

3.
Errors: all
Warnings: critical, normal and low priority (all)
Notes: critical, normal, low priority (all)

All existing warnings are normal priority unless mentioned here or otherwise.
All existing notes are low priority unless mentioned here or otherwise mentioned.

Support team can generate a list of warnings that should be critical, normal,
low priority, Will not be many exceptions. Code can have priority level 1,2 or 3
in function call.

Result: level 1, errors and critical notes end up being displayed, highest
priority items, Turn on level 2 and you get all but common spam warnings and
most notes. Level 3 gets you all. And we can adjust priority based on Support
and other feedback.


Tatiana says:
I appreciate that some things should always be printed.
The name for that in the current nomenclature is "error."

Now, I understand that you may feel that that's not a very
good name for something that is not a problem, just generally
important to know. In that case, we'd have to change the system
to not gag according to error, warning, note at all, but according
to critical, important, supplementary.  Which I'm not sure you'll
notice is the exact same mechanism as specified anyway.

I very much appreciate beautiful (or at least clear) language as a
concern, but at the same time, it was suggested that we should mirror
the error, warning, note classification that is sent to the client.

This gives us three possible courses of action:

- Keep the 3 level gagging, let both factions reason out what the nomenclature
for the bike shed should be.

- Keep the 3 level gagging, use critical etc., and let individual messages use
words like "error" in the descriptive text if they wish to. This might result in
a mess as it'd hard to apply consistently, especially across localizations.

- Keep 3 level gagging, pass *two* parameters, one for severity (critical etc.)
one for "name" (error, warning, note), and then print the composite (e.g.
[critical error]), but filter according to severity, not the name tag. This
would imply changing every call to s.th. like error_log_message(prio,
funky_name_tag, actual_message), and then manually adjusting the prio for the
outliers. Might give us a chance to drop the print_(error|warning|information)
trifecta, of which sql_print_information is badly named anyway (as it prints a
label of "Note") -- in which case it would probably be reasonable to change the
symbol to NOTE_LEVEL too, as well as supporting MY_NOTE_LEVEL, while
keeping-but-discouraging MY_INFORMATION_LEVEL to maintain compatibility with
existing plugins.

No matter whether we beautify the naming, the upshot will be that we'll need a
change request for each of the outliers (that don't fall into the classical
combinations "error / high prio", "warning / mid prio", "note / low prio").


PREAMBLE

The thing to keep in mind is that the server now defaults to its most verbose,
"INFORMATION_LEVEL."  This gives us the following guidelines:

a) Items that previously were printed when running with defaults
   will be printed when running with defaults now.

b) Some items that previously weren't printed when running with
   defaults (but could be requested) will be printed now (but can
   be muted).

c) Some items that previously (wrongly, because they didn't use
   the proper facilities) could not be muted can be muted now
   (but aren't by default).



CODE

On a code level, (almost) all fprintf(stderr, ...) have been converted to
sql_print_(error|warning|information).  This means they can now be filtered.

sql_print_(error|warning|information) was already used in the code in many
places, but since there was no central gagging facility, most items could not be
gagged, and some implemented their own gagging, not always on the level the
label implied (a line could appear with a "warning" label in the label, while
actually requiring the higher verbosity level of "information" to be printed, etc.):

- Many lines of the form if(verbosity>WARNING_LEVEL) sql_print_warning()
  had their now redundant if() cut. => no change in observable behavior

- We do not at this point allow gagging of error messages
  (only of warning/info).  The behavior for all sql_print_error()
  is therefore also unchanged (couldn't be bagged before, can't be now).

- Where sql_print_warning / sql_print_information did heed the
  verbosity level (log_warning, the predecessor of log_error_verbosity),
  but the label ("Warning", "Information") did not reflect the
  verbosity level required for this item to be printed
  (WARNING_LEVEL, INFORMATION_LEVEL, respectively), care was taken
  to align label and level in a way reflecting the spirit of the
  original code, usually towards the end of no or minimal change
  in user-observable behavior (i.e. a different label on the log line).
  Additionally, these occurrences are listed below for a review beyond
  the immediate (and completed) code review, should this become
  desirable, as well as for the information of groups beyond #runtime.

- Where sql_print_warning / sql_print_information were unconditional,
  they now heed the verbosity level set by the user.  That level now
  defaults to maximum verbosity, meaning that all such items as were
  printed unconditionally before shall still be printed by default.
  Additionally, the user can now elect to mute information type log
  items, or both warning type and information type log items.
  It follows that by default, the goal was once again to have no change
  in user-observable behavior (but offer options beyond the default, in
  which case behavior will change).



EXPECTED IMPLICATIONS

--bootstrap and therefore the installation process should now print
fewer non-essential items, to guard against overwhelming the new MySQL user.
This should improve the user experience.

Similar for --help --verbose.

There are some additional "information" type log items that previously
were available on request, and are now printed by default.  This may
provide support & development with additional diagnostics.

Conversely, a mechanism for gagging many diagnostic log items can be utilized by
the user to suppress items that may have been beneficial in supporting said
user.  There may be a case for suggesting to supply log snippets with the
highest verbosity level set when requesting support, if this is not already the
case.  (The "Don't set verbosity too low,
shoot yourself in the foot, or try to ski through a revolving door."
clause.)



GLOBAL CHANGES

#0 --bootstrap and --help behave differently in that they gag
   everything but errors, to de-spam installation.

#1 log_warnings, both as a system variable and on the command
   line, will throw a deprecation warning pointing the user
   at log_error_verbosity.

#2 Where appropriate, tools (mysqlupdate, mysqladmin, mysql,
   "items that live in client/") might prefix warnings with
   their name. This is mostly useful in the context of mysqlupgrade,
   which in turn run mysql, mysqlcheck, etc. With the new semantics,
   it's easier to see which of the tools threw a warning (as opposed
   to just seeing that there was one).

#3 log_error_verbosity and log_timestamps were added, as per the spec.
   See there.

#4 Timestamps now show microseconds.

#5 Timestamps in log files (not tables) can be set to reflect UTC, or
   system time.  See #3.  They're ISO compliant.

#6 We now consistently log thread-ID (rather than process-ID)
   if we have it.
   Additionally, process-ID is issued during start-up, but can
   still also be retrieved from the PID file.



INDIVIDUAL CHANGES

Below is a list of explicitly affected items.
The list including implicitly affected items can easily be
retrieved using

   find mysql-5.7/ -iname "*.[ch]*" \
     -exec egrep -HI "sql_print_(warning|information)" {} \; 

a) In debug servers, mf_keycache.c will run a sanity check,
   cache_empty(). This is now deemed INFORMATION_LEVEL.
   Non-debug servers do not print at any level now, and never did.

b) In my_largepage.c, several non-lethal failure warnings were
   formerly prefixed with "Warning: "; these are now issued at
   WARNING_LEVEL.

c) In my_open.c, EXTRA_DEBUG function my_print_open_files()
   now prints its information at INFORMATION_LEVEL.

d) In my_thr_init.c, normal operation always prints its errors
   (ERROR_LEVEL), while the specific EXTRA_DEBUG_THREADS items
   are issued at WARNING_LEVEL for the one warning, and
   INFORMATION_LEVEL otherwise.

e) thr_alarm.c prints its single warning at WARNING_LEVEL.

f) my_default.cc prints the three items previously prefixed with
   "Warning:" at WARNING_LEVEL (everything else at ERROR_LEVEL).

g) my_getopt.cc's "password on command line insecure" warning is
   issued at WARNING_LEVEL.

h) "init called twice" warnings in semisync_(master|slave).cc
   are issued at WARNING_LEVEL.

i) SSL: client_authentication.cc
   "Can't locate server public key", "Public key is not in PEM format"
   issued at WARNING_LEVEL.

j) sql_authentication.cc
   "access denied: no password" was always issued at INFORMATION_LEVEL,
   but previously wrongly had a "Warning" label on the log line. It is
   now issued with the correct label.
   Likewise for regular "access denied" message.
   Likewise for "must change password"
 ! Conversely, X509 subject and issuer mismatch messages were called
 ! info, but issued at WARNING_LEVEL. Now issued at INFORMATION_LEVEL.

k) filesort.cc
   printed some information at INFORMATION_LEVEL, but labeled it a
   warning. Now labeled information.

l) log_event.cc
   issues where reported at INFORMATION_LEVEL, but received a warning
   label. label was corrected.

m) sql_connect.cc
   "aborting connection" was printed at INFORMATION_LEVEL, but labeled
   a warning. Label corrected.

n) sql_plugin.cc
 ! "plugin disabled" always labeled info, but previously printed
   at WARNING_LEVEL, now at INFORMATION_LEVEL.

o) ut0ut.cc
   already printed thread_ID in hex, but now uses a hex prefix to
   make that more clear

p) rpl_slave.cc
   Slave reconnect was labeled information, but could be
   printed at WARNING_LEVEL. Now consistently printed at
   INFORMATION_LEVEL.

q) mi_dbug.c
   fprintf(stderr, "Warning: ...") for EXTRA_DEBUG check_table_is_closed
   now printed at WARNING_LEVEL.

r) mysqld.cc
   Buffered log messages may now include "information" type items,
   which previously ALWAYS were discarded. 

-------------------------------------8<--------------------------

NOTES TO SELF

Current Error logging API:

typedef void (*my_error_reporter)(enum loglevel level, const char *format, ...)
  ATTRIBUTE_FORMAT_FPTR(printf, 2, 3);

Function pointers (for buffered logging until logger is ready)
mysqld.cc: my_charset_error_reporter
mysqld.cc: my_getopt_error_reporter

Implemented by:
conf_to_src.c: default_reporter(...) - dummy (for tool), drops all messages
charset.c:     default_reporter(...) - dummy,            drops all messages
my_getopt.cc:  default_reporter(...) - fprintf's all levels to stderr
mysqld.cc:     charset_error_reporter(...) - error_log_print()
mysqld.cc:     buffered_option_error_reporter(...) - Buffered_log::print()
               *drops* [Note]s. Should user-setting override this?
               The rest go through sql_print_error|warning()
mysqld.cc:     option_error_reporter(...) - error_log_print(),
               special cases --bootstrap

sql_plugin.cc: reap_plugins() - Shutdown notices via sql_print_information() 
               depend on !--bootstrap

the following 3 wrap error_log_print({ERROR|WARNING|INFORMATION}_LEVEL,
                                     format, args)
log.cc:        sql_print_error(format, args...)
log.cc:        sql_print_warning(format, args...)
log.cc:        sql_print_information(format, args...)

log.cc:        my_plugin_log_message(plugin, plugin_log_level, format, args) -
               translates MY_{ERROR|WARNING|INFORMATION}_LEVEL to
               to {ERROR|WARNING|INFORMATION}_LEVEL, prefixes plugin name,
               then calls error_log_print()

log.cc:        sql_print_message_func sql_print_message_handlers[];
               ln264, until WL#6613, only used in event_scheduler.cc,
               which now uses sql_print_error|warning|information(), depending
               on SL_ERROR|WARNING|NOTE:

log.cc:        error_log_print(loglevel, format, args...) - 
               print_buffer_to_file(),
               on Win also print_buffer_to_nt_eventlog()

log.cc:        init_error_log(uint error_log_printer) - removed by WL#6613

log.cc:        log_error(loglevel, format, args...) - removed by WL#6613