WL#14628: Compatibility option to enable old replication terminology in P_S instrumentation

Affects: Server-8.0   —   Status: Complete

EXECUTIVE SUMMARY
=================

We introduce a compatibility option to enable the provious terminology
in strings shown in performance_schema, information_schema, SHOW
PROCESSLIST, SHOW REPLICA STATUS, and slow query log.  The strings
were modified in 8.0.26 by WL#14194.

In WL#14194, a number of performance_schema instrumentation names were
changed. This includes locks, condition variables, memory allocations,
thread names, thread stages, and thread commands. The result is that
the observed contents of performance_schema tables have changed. This
may impact monitoring tools that check these strings. Such monitoring
tools can enable the option when upgrading to 8.0.26, to ensure that
they keep functioning as expected.

USER STORIES
============

U1. As a person that manages a MySQL deployment observed by monitoring
    tools,
    - I need to upgrade the server without changing instrumentation
      names,
    - so that upgrade does not have to wait for the monitoring tools
      to be adjusted to use the new names.

U2. As a person that manages a MySQL deployment observed by multiple
    independent monitoring tools,
    - I need to enable the old instrumentation names only for some
      sessions,
    - so that I can fix each tool independently, and in the process
      disable the compatibility mode selectively for those tools that
      have been fixed.

SCOPE
=====

The option only impacts terminology visible in performance_schema and
modified in WL#14194.

The option impacts all places where changed instrumentation names
occur: performance_schema and information_schema tables,
SHOW PROCESSLIST, SHOW REPLICA STATUS, and slow query log.

LIMITATIONS
===========

There is no way to enable compatibility for each string individually.
A monitoring tool needs to adjust all strings at once.

There is no way to enable compatibility for each table, log file, or
command individually.

While performance_schema/information_schema tables and SHOW
PROCESSLIST/SHOW REPLICA STATUS show the old names based on the
session option; slow query log shows old names based on the global
option.  The reason is that the performance_schema/information_schema
tables and SHOW PROCESSLIST/SHOW REPLICA STATUS compute the result
sets dynamically, so this computation can be intercepted, whereas the
logs exist in shared global storage - files or tables - which are
accessed through generic methods such as SELECT, which cannot be
intercepted.
1. FUNCTIONAL REQUIREMENTS
==========================

FR1. There shall be a configuration option that controls whether
     instrumentation names existing prior to 8.0.26 are shown instead
     of the newer names.

FR2. In commands and tables whose contents are only materialized by a
     session that observes them, i.e.,
     perforance_schema/information_schema tables, SHOW PROCESSLIST,
     and SHOW REPLICA STATUS, the session value for the configuration
     option shall decide whether the old or new names are used.

FR3. In logs whose contents are written to shared storage, i.e., slow
     query log, the global value for the configuration option shall
     decide whether the old or new names are used.

FR4. The configuration option shall be deprecated.
1. NEW OPTION
=============

NAME: terminology_use_previous

SCOPES: session, global, persisted, command-line

DYNAMIC: yes

PRIVILEGES: none

TYPE: enumeration

VALUES: NONE, BEFORE_8_0_26

DEFAULT: NONE

PRIVILEGES: NONE

REPLICATED: NO

DEPRECATED: YES

DESCRIPTION:
  Make monitoring tables and statements use the identifiers that were
  in use before they were changed in a given release. That includes
  names for mutexes, read/write locks, condition variables, memory
  allocations, thread names, thread stages, and thread commands. When
  the session option is set to BEFORE_8_0_26, the session uses the
  names that were in use until 8.0.25, when it selects from
  performance_schema tables, or selects from
  INFORMATION_SCHEMA.PROCESSLIST, or issues SHOW PROCESSLIST or SHOW
  REPLICA STATUS.  When the global option is set to BEFORE_8_0_26, new
  sessions use BEFORE_8_0_26 as default for the session option, and in
  addition the thread commands that were in use until 8.0.25 are
  written to the slow query log.

2. IMPACT OF NEW OPTION
=======================

2.1. Categories of instrumentation names
----------------------------------------

The option affects the following categories of instrumentation names:

- Mutexes.
- Read/write locks.
- Condition variables.
- Memory allocations.
- Thread names.
- Thread stages.
- Thread commands.

2.2. Full list of instrumentation names affected by the option
--------------------------------------------------------------

In all the following cases, the words 'master', 'slave', and 'mts'
have been replaced by 'source', 'replica', and 'mta'. Capitalization
has been preserved. In two cases, other parts of the identifier were
modified; each of them is marked with a footnote with a justification.

Mutexes:
   Old: "Master_info::data_lock"
   New: "Source_info::data_lock"
   Old: "Master_info::run_lock"
   New: "Source_info::run_lock"
   Old: "Master_info::sleep_lock"
   New: "Source_info::sleep_lock"
   Old: "Master_info::info_thd_lock"
   New: "Source_info::info_thd_lock"
   Old: "Master_info::rotate_lock"
   New: "Source_info::rotate_lock"
   Old: "Slave_reporting_capability::err_lock"
   New: "Replica_reporting_capability::err_lock"
   Old: "key_mts_temp_table_LOCK"
   New: "key_mta_temp_table_LOCK"
   Old: "key_mts_gaq_LOCK"
   New: "key_mta_gaq_LOCK"
   Old: "Relay_log_info::slave_worker_hash_lock"
   New: "Relay_log_info::replica_worker_hash_lock"
   Old: "LOCK_slave_list"
   New: "LOCK_replica_list"
   Old: "LOCK_slave_net_timeout"
   New: "LOCK_replica_net_timeout"
   Old: "LOCK_sql_slave_skip_counter"
   New: "LOCK_sql_replica_skip_counter"

Read/write locks:
   Old: "LOCK_sys_init_slave"
   New: "LOCK_sys_init_replica"

Condition variables:
   Old: "Relay_log_info::slave_worker_hash_lock"
New[1]: "Relay_log_info::replica_worker_hash_cond"
   Old: "Master_info::data_cond"
   New: "Source_info::data_cond"
   Old: "Master_info::start_cond"
   New: "Source_info::start_cond"
   Old: "Master_info::stop_cond"
   New: "Source_info::stop_cond"
   Old: "Master_info::sleep_cond"
   New: "Source_info::sleep_cond"
   Old: "Master_info::rotate_cond"
   New: "Source_info::rotate_cond"
   Old: "Relay_log_info::mts_gaq_cond"
   New: "Relay_log_info::mta_gaq_cond"

Memory allocations:
   Old: "Slave_job_group::group_relay_log_name"
   New: "Replica_job_group::group_relay_log_name"
   Old: "rpl_slave::check_temp_dir"
   New: "rpl_replica::check_temp_dir"
   Old: "SLAVE_INFO"
   New: "REPLICA_INFO"
   Old: "show_slave_status_io_gtid_set"
   New: "show_replica_status_io_gtid_set"
   Old: "Relay_log_info::mts_coor"
   New: "Relay_log_info::mta_coor"

Thread names:
   Old: "slave_io"
   New: "replica_io"
   Old: "slave_sql"
   New: "replica_sql"
   Old: "slave_worker"
   New: "replica_worker"

Thread stages:
   Old: "Changing master"
   New: "Changing replication source"
   Old: "Checking master version"
   New: "Checking source version"
   Old: "Connecting to master"
   New: "Connecting to source"
   Old: "Flushing relay log and master info repository."
   New: "Flushing relay log and source info repository."
   Old: "Killing slave"
   New: "Killing replica"
   Old: "Master has sent all binlog to slave; waiting for more updates"
   New: "Source has sent all binlog to replica; waiting for more updates"
   Old: "Queueing master event to the relay log"
   New: "Queueing source event to the relay log"
   Old: "Reconnecting after a failed master event read"
New[3]: "Reconnecting after a failed source event read"
   Old: "Reconnecting after a failed registration on master"
New[3]: "Reconnecting after a failed registration on source"
   Old: "Registering slave on master"
   New: "Registering replica on source"
   Old: "Sending binlog event to slave"
   New: "Sending binlog event to replica"
   Old: "Slave has read all relay log; waiting for more updates"
   New: "Replica has read all relay log; waiting for more updates"
   Old: "Waiting for slave workers to process their queues"
   New: "Waiting for replica workers to process their queues"
   Old: "Waiting for Slave Worker queue"
   New: "Waiting for Replica Worker queue"
   Old: "Waiting for Slave Workers to free pending events"
   New: "Waiting for Replica Workers to free pending events"
   Old: "Waiting for Slave Worker to release partition"
   New: "Waiting for Replica Worker to release partition"
   Old: "Waiting for master to send event"
   New: "Waiting for source to send event"
   Old: "Waiting for master update"
   New: "Waiting for source update"
   Old: "Waiting for the slave SQL thread to free enough relay log space"
New[2]: "Waiting for the replica SQL thread to free relay log space"
   Old: "Waiting for slave mutex on exit"
   New: "Waiting for replica mutex on exit"
   Old: "Waiting for slave thread to start"
   New: "Waiting for replica thread to start"
   Old: "Waiting for the slave SQL thread to advance position"
   New: "Waiting for the replica SQL thread to advance position"
   Old: "Waiting to reconnect after a failed master event read"
New[3]: "Waiting to reconnect after a failed source event read"
   Old: "Waiting to reconnect after a failed registration on master"
New[3]: "Waiting to reconnect after a failed registration on source"
   Old: "Waiting until MASTER_DELAY seconds after master executed event"
   New: "Waiting until SOURCE_DELAY seconds after source executed event"

Thread commands:
   Old: "Register Slave"
   New: "Register Replica"

Footnotes:
[1] Also corrected 'lock' to 'cond'.
[2] Also removed the word 'enough' to keep the text within the length limit.
[3] These were missed in WL#14194, and this worklog renames them.

2.3. Monitoring tables, commands, and logs that expose the names
----------------------------------------------------------------

Mutexes are visible with a 'wait/synch/mutex/' prefix in:
- the NAME column of performance_schema.mutex_instances
- the EVENT_NAME column of all performance_schema.events_waits_*
  tables, i.e.:
  - events_waits_summary_by_account_by_event_name
  - events_waits_summary_by_host_by_event_name
  - events_waits_summary_by_instance
  - events_waits_summary_by_thread_by_event_name
  - events_waits_summary_by_user_by_event_name
  - events_waits_summary_global_by_event_name
  - events_waits_current
  - events_waits_history
  - events_waits_history_long

Read/write locks are visible with a 'wait/synch/rwlock/' prefix in:
- the NAME column of performance_schema.rwlock_instances
- the EVENT_NAME column of all performance_schema.events_waits_*
  tables

Condition variables are visible with a 'wait/synch/cond/' prefix in:
- the NAME column of performance_schema.cond_instances
- the EVENT_NAME column of all performance_schema.events_waits_*
  tables

Memory allocations are visible with a 'memory/sql/' prefix in:
- the EVENT_NAME column of all performance_schema.memory_summary_*
  tables, i.e.:
  - memory_summary_by_account_by_event_name
  - memory_summary_by_host_by_event_name
  - memory_summary_by_thread_by_event_name
  - memory_summary_by_user_by_event_name
  - memory_summary_global_by_event_name

Thread names are visible with a 'thread/sql/' prefix in:
- the NAME column of performance_schema.threads

Thread stages are visible:
- with a 'stage/sql/' prefix, in the EVENT_NAME column of all
  performance_schema.events_stages_* tables, i.e.:
  - events_stages_summary_by_account_by_event_name
  - events_stages_summary_by_host_by_event_name
  - events_stages_summary_by_thread_by_event_name
  - events_stages_summary_by_user_by_event_name
  - events_stages_summary_global_by_event_name
  - events_stages_current
  - events_stages_history
  - events_stages_history_long
- without prefix, in:
  - the PROCESSLIST_STATE column of performance_schema.threads
  - the STATE column of performance_schema.processlist
  - the STATE column of SHOW PROCESSLIST
  - the STATE column of INFORMATION_SCHEMA.PROCESSLIST
  - the Replica_IO_State column of SHOW REPLICA STATUS
  - the Slave_IO_State column of SHOW SLAVE STATUS
  - the Replica_SQL_Running_State of SHOW REPLICA STATUS
  - the Slave_SQL_Running_State of SHOW SLAVE STATUS

Thread commands are visible:
- with a 'statement/com/' prefix, in the EVENT_NAME column of all
  performance_schema.events_statements_history* and
  performance_schema.events_statements_summary_*_by_event_name tables,
  i.e.:
  - events_statements_summary_by_account_by_event_name
  - events_statements_summary_by_host_by_event_name
  - events_statements_summary_by_thread_by_event_name
  - events_statements_summary_by_user_by_event_name
  - events_statements_summary_global_by_event_name
  - events_statements_history
  - events_statements_history_long
- without a prefix, in:
  - the PROCESSLIST_COMMAND column of performance_schema.threads
  - the COMMAND column of performance_schema.processlist
  - the COMMAND column of SHOW PROCESSLIST
  - the COMMAND column of INFORMATION_SCHEMA.PROCESSLIST
  - the sql_text column of mysql.slow_log

Note: in general, thread commands are also visible in the command_type
column of mysql.general_log, and in the COMMAND_CLASS element of the
audit log. However, this worklog only alters one thread command -
Register Replica - and by chance that is excluded from both the
general log and from the audit log.

2.4. How the global/session options affect tables, commands, and logs
---------------------------------------------------------------------

Among all monitoring tables, commands, and logs listed in 2.3, some
are computed/materialized only in the moment a session observes them,
whereas others are sent to shared storage, such as a table or file:
- The values in performance_schema and information_schema tables are
  computed by the client that selects from it.
- The values in SHOW PROCESSLIST and SHOW REPLICA STATUS statements
  are computed by the client that executes SHOW.
- The values in the slow log are stored in a table or file.

For names computed by the client that show them, the name is only
visible to that client.  For names sent to shared storage, any client
may observe them.  Therefore:
- the session option decides which name should be used in
  performance_schema and information_schema tables, and in SHOW
  PROCESSLIST and SHOW REPLICA STATUS;
- the global option decides which names should be used in the slow
  log.

2.4.1. Slow log, User story U2, and overloading of global option
................................................................

It can be limiting that names in logs are only affected by the global
option, for two reasons:

 1. User Story U2, which justifies using a session option, may apply
    equally well to logs.  I.e., there may be two different monitoring
    applications observing the slow log, for instance, and one of them
    may be fixed to understand the new terminology before the other.

    However, we do not see a better solution, since is not possible to
    translate the contents of a table or file on a per-session basis.
    The reason is that the string is in shared storage (not
    per-session), and will be processed using generic techniques that
    cannot be intercepted in order to translate a string.  For
    instance, the file may be accessed directly by third party tools
    such as grep, i.e., not even through a client.

    So we do not address user story U2 for logs.

 2. The global option is overloaded: one the one hand, like most other
    global options, it serves as the default for the session option in
    new sessions.  On the other hand, it controls what is written to
    the slow log.  An application where clients expect new
    terminology, but uses a monotoring application that expects old
    terminology in the slow log, cannot both satisfy the needs of the
    application and the expected default for the sessions.

    We overload the option in order to create too many options.

While we do not have first hand information, we guess that the impact
on applications of these limitations are likely to be small.  The
reason is that there is only one translated identifier that appears in
a log, "Register Replica", and:
- "Register Replica" is an internal command, used only by a replica
  when it initates the replication connection.  So there is not much
  reason for users to watch it.
- "Register Replica" is fast - just a few CPU instructions.  So it is
  unlikely to ever end up in any slow log.

The first limitation cannot be fixed; users will have the trouble of
upgrading all slow log parsers at the same moment.  In case the second
limitation turns out to be an issue, we can introduce a separate
option at that time, allowing slow log parsers to expect different
terminology than the session default.  We leave that as possible
future work.

3. SECURITY
===========

No impact on security.

4. UPGRADE, DOWNGRADE, AND CROSS-VERSION REPLICATION
====================================================

It does not harm upgrade, downgrade, or cross-version replication that
this feature did not exist in 8.0.25 and was added in 8.0.26.

The feature provides a means to upgrade to 8.0.26 without affecting
applications.
We implement this worklog in several steps.

 0. Move LOCK_replica_list to the global list

    Context:

    This is a simplification that was relevant in an earlier version of
    the worklog.  Now the worklog does not depend on it, but we keep the
    refactoring since it improves the code.

    Background:

    There is an API for registering mutex names with performance_schema.
    Other server modules than replication don't invoke the API from the
    module, but add the mutex to the list in mysqld.cc and rely on the
    call to the API in mysqld.cc.  For other instrumentation classes than
    mutexes - condition variables, memory allocations, thread stages,
    thread names, etc - replication depends on the invocation in
    mysqld.cc.  So it is only replication mutexes that have a special
    case.

    Problems:

    - Replication mutexes are initialized in a different way, compared
      to other instrumentation, which is a bit inconsequent.
    - There is a need for a specific call to a cleanup routine in
      replication.
    - There is an error condition where slave_list_inited has not been
      initialized.  By initializing the mutex together with other mutexes,
      we ensure it happens very early during server start, so no code that
      uses the mutex can execute before that.

    Fix:

    Move the initialization of replication mutex instrumentation to the
    global initialization.

 1. Encapsulate access to P_S instrumentation names

    This is a refactoring only and has no observable consequences; hence
    no test case.

    We encapsulate all instrumentation names in performance_schema in a
    class.  This will allow us to change the implementation of the class
    to use a backward compatible name, in the future.

 2. New option that enables old terminology

    Introduce the new configuration option
    terminology_use_previous.  When set to NONE, the server is
    not in compatibility mode and uses the newest version of all names.
    When set to BEFORE_8_0_26, it uses the names that were present up to
    and including 8.0.26.

    We introduce a framework that translates new names to old names.  The
    framework uses std::unordered_map objects where the keys are new names
    and values are old names.  There is such an object for each
    instrumentation class, i.e., one for mutexes, one for thread names,
    etc. All those maps are collected in a std::unordered_map where the
    keys are instrumentation class identifiers and values are the
    maps. Finally, there is a vector where each element contains the
    translation maps for a specific version.  This makes the framework
    extensible, so in case we modify more names in the future, the
    configuration option can be extended with a new enumeration value and
    the vector with a new element.

    We modify the class introduced in Step 1, so that:

    - The setter for PFS_instr_name takes the instrumentation class as an
      argument.  For each version in the vector, it looks up the
      instrumentation class to get the translation map, and looks up the
      name in that map.  If the name is found, the setter caches the
      alternative name in member variables of PFS_instr_name.  It also
      caches the version in which the name appeared.

    - The getter for PFS_instr_name checks if the session configuration
      option is enabled.  If it is, and the version specified in the option
      is older than the version cached in the member variable, it returns
      the alternative name cached in the member variable.

 3. Fix tests after adding option

    Fix tests that break whenever you add an option.

 4. Encapsulate access to command_name array

    This is a refactoring that has no observable consequences; hence no
    test case.

    We encapsulate all access to the 'command_name' array.  This will
    allow us to change the implementation of the class to use a compatible
    name when the option is enabled.

 5. Translate statement names

    Intercept the Command_name class, so that it translates to old names
    according to the @@terminology_use_previous option.  When
    the name is going to be returned in a result-set, it should use the
    session option to determine to use the old or new name.  When the name
    is going to be written to a log, it should use the global option,
    since the observer of the log is not the current session.

    Since we only have one command name that needs translation, and don't
    currently plan to have others, we simplify the implementation of
    Command_class and allow only one name to be translated.  The
    limitation is only in the layout of private members, not in the public
    API, and can be generalized later if needed.

 6. Encapsulate THD::proc_info

    This is a refactoring that has no observable consequences; hence no
    test case.

    We do the following small refactorings related to thread stage info:

    - Encapsulate all access to THD::proc_info.  This will allow us to
      change the implementation of the class to use a compatible name when
      the option is enabled.  This is the primary objective of the patch,
      and the majority of the changes.

    - Remove a case where group replication checks the current thread
      stage by comparing a string, and instead make it compare the thread
      stage keys.  This is more efficient, and not sensitive to changing the
      string.

    - Change type of THD::m_current_stage_key from unsigned int to
      PSI_stage_key.  PSI_stage_key is a type alias for unsigned int, and
      describes better the meaning of this variable.

    - Move the definition of THD::m_proc_info near where related members
      are defined.

    - Change name of the parameter to sql_show.cc:thread_state_info to a
      more descriptive one.  This also facilitates the next patch where
      another parameter is added.

 7. Translate thread stage stored in THD

    Intercept access to THD::proc_info where it is necessary to convert
    the name according to @@session.terminology_use_previous.
    This makes INFORMATION_SCHEMA.PROCESSLIST and SHOW REPLICA STATUS show
    results according to @@session.terminology_use_previous.

    Many places in the code should not translate the string, because they
    just save the old stage in order to revert it later.  So we introduce
    a new member function that translates the string, and uses it only
    where needed.

    The existing terminology_use_previous framework assumes that
    translated strings have a prefix, corresponding to the
    performance_schema instrumentation class, for instance, "stage/sql/".
    However, this prefix is added only in performance_schema tables, and
    not in INFORMATION_SCHEMA.PROCESSLIST or SHOW REPLICA STATUS.
    Therefore, we add an argument to
    terminology_use_previous::lookup that allows the caller to
    specify that the given string does not have a prefix and the return
    value should not have one.

 8. Test case

    Test cases to verify that the compatibility option works as expected.

 9. Translate four more stage names

    Background:

    WL#14194 replaced outdated replicaion terminology in, among other
    things, thread stages.  This was implemented by inspecting all
    PSI_stage_info definitions.

    WL#14628 adds a compatibility option that enables the old names.

    Problem:

     1. The thread stage text can be set without going through the
        PSI_stage_info framework. There were three such texts that
        contained outdated terminology, which were missed in WL#14194.

     2. The code for this was using arrays of char* where C++ classes are
        more appropriate. And the last array element was never used.

     3. There were three debug symbols that enabled big code blocks, which
        were never used in any tests.

    Fix:

    1a. Use the PSI_stage_info framework. Change to new terminology.

    1b. Add the strings to the terminology_use_previous
        framework introduced in WL#14628.

     2. Restructure the code to use a class instead of arrays of char*.
        Do not re-introduce the last array element in the class.

     3. Remove the debug symbols and the code blocks they enable.