WL#9452: Log Position Lock

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

This worklog implements a mechanism to block the server for a very short window
of time, e.g. during a backup, so that consistent information regarding data,
replication, and logs is gathered. This eliminates the need to run FLUSH TABLES
WITH READ LOCK, which in the general case is quite an expensive operation since
it locks the entire server down.

The goal of this work is to define a server state, which blocks the server for a
very short moment only just to retrieve a consistent and current set of required
information from the backup point of view. Examples:
- Server GTID state (GTID_EXECUTED);
- Binary log positions;
- Replication channels receiver thread relay log positions;
- Storage engines state (i.e. InnoDB Log Sequence Number (LSN));

There is no need to have explicit ways to enter and leave the state. It is
sufficient, if there is a way for backup to retrieve the required consistent
information. The state can then be reached implicitly, which does further reduce
the locking time.
Functional requirements
=======================

F-1: A new performance schema table shall be introduced.

  F-1.1: Queries to the new table without BACKUP_ADMIN privileges shall return
         nothing and shall generate a warning message.

  F-1.2: The new table shall display the following information:

    F-1.2.1: SERVER_UUID;

    F-1.2.2: Server master state (GTID_EXECUTED + binary log coordinates);

    F-1.2.3: Replication channels receiver thread coordinates w.r.t. its own
             relay log (file name and position);

    F-1.2.4: Storage engines information (i.e. InnoDB LSN).

  F-1.3: The client session will enter "Collecting Logs Positions State" before
         collecting the information to be displayed;

  F-1.4: The client session will leave "Collecting Logs Positions State" after
         collecting the information to be displayed;

F-2: When the "Collecting Logs Positions State" is reached:

  F-2.1: The creation of new log entries for the binary log is blocked;

  F-2.2: Changes into GTID_EXECUTED are blocked;

  F-2.3: The creation/removal of new replications channels is blocked;

  F-2.4: Storage engines may block their activities;

  F-2.5: I/O thread positions cannot be flushed.

F-3: The blockage of F2.* must leave the logs in consistent states.

F-4: The "Collecting Logs Positions State" must be reachable independent from
     FLUSH TABLES <table list> WITH READ LOCK and the "Backup Lock".
     That is, the following sequence must be possible:
     - LOCK INSTANCE FOR BACKUP (take the "Backup Lock");
     - FLUSH TABLES <table list> WITH READ LOCK;
     - Collect log positions (enter/leave "Collecting Logs Positions State");
     - UNLOCK TABLES;
     - UNLOCK INSTANCE;


Non-Functional requirements
===========================

NF-1: It shall not be possible to deadlock the MySQL instance by querying the
      new performance schema table and:
      - Operating replication channels;
      - Operating the binary log;
      - Operating GTID_EXECUTED;
      - Applying any InnoDB workload;
      - Applying any non-transactional workload;
      - Installing and uninstalling new (storage engines) plugins;
      - Querying the new performance schema table in another client session.
Interface Specification
=======================

I-1: Introduce a new perpetual table at performance_schema database:

       NAME: instance_logs_status
       DESCRIPTION: Display all instance logs information necessary to copy log
                    files that represent the server state.
       FIELDS: SERVER_UUID VARCHAR(36),
               MASTER JSON,
               CHANNELS JSON,
               STORAGE_ENGINES JSON

I-2: An error message shall be presented to the client session when trying to
     collect the contents of the table at I-1 without having BACKUP_ADMIN
     privileges: "Access denied; you need (at least one of) the BACKUP_ADMIN
     privilege(s) for this operation". This message is already defined as
     ER_SPECIFIC_ACCESS_DENIED_ERROR.

I-3: An error message shall be presented to the client session when trying to
     collect the contents of the table at I-1 and any resource required for the
     operation fails to provide the related information:

       NAME: ER_UNABLE_TO_COLLECT_INSTANCE_LOG_STATUS
       TEXT: "Unable to collect information for column '%-.192s': %-.192s."
       FIELDS: 1st: field name (MASTER, CHANNELS or STORAGE_ENGINES)
               2nd: the reason of the failure. Up to now, the only reason of
                    such failure is OOM (either failed to instantiate resources
                    or failed to allocate memory to collect information);

I-4: No new options
I-5: No new files
I-6: No changes in existing syntax
I-7: No new commands
I-8: No new tools

High Level Specification
========================

The goal of this worklog is to provide all the necessary information to copy
related log files properly in a consistent way.

The necessary information is composed of:

- SERVER_UUID:
  Identifies the instance.

- GTID_EXECUTED:
  To restore the server GTID_STATE.

- Binary log coordinates:
  To tell backup up to which point it should copy.

- Replication channels coordinates:
  To tell backup up to which point it should copy.

- InnoDB Log Sequence Number (LSN):
  To tell backup up to which point it should copy.

A highly desired (but optional) information is:

- InnoDB Log Sequence Number of last checkpoint (LSN_checkpoint):

In order to collect this information in a consistent way, the server will first
check for the BACKUP_ADMIN privilege. When not having the privilege, no
information will be collected, no row will be displayed when querying the table
and the server will throw the error specified at I-2.

Having required privileges, the server will acquire the following mutexes:

1) Replication channels map (write lock)

   Once acquired, it shall block:
   - Creation of new channels;
   - Removal of existing channels;
   - Channels administrative tasks (START SLAVE, STOP SLAVE, CHANGE MASTER TO,
     RESET SLAVE, FLUSH RELAY LOGS);

2) LOCK_plugin

   Needed to iterate over installed and ready storage engines plugins,
   collecting the handlerton of those able to participate in the collection
   process.

   This mutex is released as soon as the iteration finished.

3) Replication channels receiver thread data lock (per replication channel)

   Once acquired, it shall block:
   - The receiver thread to flush its positions after writing content to the
     relay log.

   This must be done before blocking storage engines updates, because the
   receiver thread will attempt to write into slave master info repositories
   after writing a new event to the relay log.

4) Binary log lock

   Once acquired, it shall block:
   - Any transaction to be committed and binary logged;
   - FLUSH BINARY LOGS;
   - RESET MASTER;

5) Global SID map (write lock)

   Once acquired, it shall block:
   - SET @@GLOBAL.GTID_EXECUTED = '< GTID_SET >';

6) Storage engines

   Any storage engine required to show is log information should provide a way
   of blocking its activity.

   Example: InnoDB should lock its log to prevent new changes to be added before
            collecting its LSN information.


MASTER JSON field output
------------------------

It shall display a single JSON object with three keys: GTID_EXECUTED, binary log
file name and binary log position.

Example:

{ "binary_log_file": "relay-log-bin.000099",
  "binary_log_position": 8192,
  "gtid_executed": "0d366d21-be71-11e7-90d4-e4b318a30e85:1-2"},


CHANNELS JSON field output
--------------------------

It shall display an array of channels, composed by the channel name, the relay
log file name and the relay log file position that is consistent with master
info repositories.

Example:

{ "channels": [
  { "channel_name": "",
    "relay_log_file": "relay-log-bin.000099",
    "relay_log_pos": 8192},
  { "channel_name": "ch1",
    "relay_log_file": "ch-relay-log-bin.000099",
    "relay_log_pos": 16384},
  ]
}

Being consistent with master info repositories information, backup can copy the
relay log files for each channel, up to the informed one, up to the informed
position.

Restoring master info repositories and relay logs up to the specified
coordinates should allow a START SLAVE to collect events from the master
starting at the correct master coordinates without losing or duplicating events
in the relay log file.

Note: Such information is currently not collected nor stored. It must be
      implemented a way of collecting this information and keeping it consistent
      with master coordinates on master info repositories. More details about
      this at Step-1 of Low-Level Design.


STORAGE_ENGINES JSON field output
---------------------------------

It shall display a JSON object with one key per relevant storage engines. Each
storage engine should provide its own relevant information from backup point of
view.

Example:
  {"InnoDB": {"LSN": 14426884}}


Table output sample
-------------------

mysqld.2>SELECT * FROM performance_schema.instance_log_status\G
*************************** 1. row ***************************
    SERVER_UUID: e089ec66-e0f4-11e7-87e8-e4b318a30e85
         MASTER: {"gtid_executed": "",
                  "binary_log_file": "slave-bin.000001",
                  "binary_log_position": 155}
       CHANNELS: {"channels": [{"channel_name": "",
                                "relay_log_file": "slave-relay-bin.000002",
                                "relay_log_position": 371}]}
STORAGE_ENGINES: {"InnoDB": {"LSN": 18050415,
                             "LSN_checkpoint": 18050415}}
1 row in set (0,03 sec)



Minimal table output sample
---------------------------

In this example, GTID_MODE is OFF, GTID_EXECUTED is empty, binary logging is
disabled and there are no replication channels configured.

mysqld.1>SELECT * FROM performance_schema.instance_log_status\G
*************************** 1. row ***************************
    SERVER_UUID: 57e69083-c67b-11e7-8b8e-b86b232ba6ca
         MASTER: {"gtid_executed": ""}
       CHANNELS: {"channels": []}
STORAGE_ENGINES: {"InnoDB": {"LSN": 14426884,
                             "LSN_checkpoint": 14426884}}}
1 row in set (0,02 sec)
Low-Level Design
================

The implementation of the worklog can be split into 5 steps to easy the review
process (of both WL design and patches).

Step-1: Introduce Master_info::flushed_relay_log_info
-----------------------------------------------------

Collecting current relay log coordinates directly from its relay_log is
error prone from the backup/restore point of view.

After writing into the relay log file, the receiver thread will acquire the
Master_info::data_lock to flush the new master coordinates into the Master_info
repository.

When a thread acquires Master_info::data_lock after the relay log write and
before the receiver thread acquire Master_info::data_lock will make the receiver
thread to block before flushing into Master_info repository. At this point, the
relay_log coordinates are not in sync with Master_info repository.

Collecting instance log status at the above mentioned state would lead to
restoring the relay log to a state other than the one represented at
slave_master_info table.

So, this step will introduce a new LOG_INFO variable into Master_info class to
hold the relay log coordinates (file name and position) representing the last
master coordinates that were flushed into the Master_info repository.

Two new functions will be added to Master_info:

  a) Master_info::update_flushed_relay_log_info()

     This function will be called by Master_info::flush_info() at the end of a
     successful flush of the Master_info content into the repository.

     It will sync flushed_relay_log_info with current relay_log coordinates.

  b) void Master_info::get_flushed_relay_log_info(LOG_INFO* linfo)

     This function will be used by Step-4(b) to collect relay log coordinates.


Step-2: Introduce Instance_log_resource class
---------------------------------------------

Any resource going to participate in the instance log collection shall inherit
from this class.

Below follows the design of the class:

class Instance_log_resource
{
public:
  Instance_log_resource(Json_object* json_object_arg)
    : json_object(json_object_arg)
  {
  };

  // Functions to be called to collect the backup information
  void lock() { do_lock() };
  void unlock()
  {
    do_unlock();
    json_object= nullptr;
  };
  bool collect_info() { return do_collect_info(); };

  // Functions to be overrided by the resources
  virtual void do_lock() { };
  virtual void do_unlock() { };
  virtual bool do_collect_info() { return false; };

private:
  Json_object *json_object= nullptr;
};

The worklog will use wrappers, generated by a factory class, to support the
distinct object types that will participate in the collecting process.

Step-3: Introduce the new performance schema table and a new mutex
------------------------------------------------------------------

A global mutex (LOCK_collect_instance_log) will be created to protect
against concurrent attempts to collect the instance logs information.

The new table specification is defined at I-1.

The table_instance_log_status::make_row() function will perform the following
actions:

  a) Check the BACKUP_ADMIN privilege:

     /* Return nothing if THD has no BACKUP_ADMIN privilege */
     Security_context *sctx= current_thd->security_context();
     if (!sctx->has_global_grant(STRING_WITH_LEN("BACKUP_ADMIN")).first)
     {
       // TODO: Generate a warning message here.
       DBUG_RETURN(HA_ERR_RECORD_DELETED);
     }

  b) Acquire LOCK_collect_instance_log;

  c) Acquire channel_map->rwlock();

  d) Generate wrappers for all resources that will collect information in the
     order they shall be locked, informing the Json_object to them insert the
     information at (e). Each wrapper will be added to a resources list. If any
     resource cannot instantiate a wrapper, the process is aborted with an error
     (I-3) and jumps to (j).

  e) For each wrapper in the list, call the "lock" function;

  f) For each wrapper in the list, call the "collect" function. If any of the
     information cannot be collected successfully, set the error flag and stop
     iterating over the wrappers.

  g) For each wrapper in the list, call the "unlock" function;

  h) For each wrapper in the list, remove it from the list an delete it;

  i) Release channel_map mutex;

  j) Release LOCK_collect_instance_log mutex;

  k) If there was no error, fill the performance schema rows.


Step-4: Introduce Master_info, MYSQL_BIN_LOG and Gtid_state wrappers
--------------------------------------------------------------------

  a) Introduce Instance_log_resource_factory class to generate the wrappers
     depending on the object type to be wrapped.

  b) Introduce Master_info (channel) wrapper:

     - Introduce Instance_log_resource_mi_wrapper class;
     - Introduce the factory function of the new wrapper type;
     - Add the wrapper to the resource list extending Step-3 (d);

  c) Introduce MYSQL_BIN_LOG (binary log) wrapper;

     - Introduce Instance_log_resource_binlog_wrapper class;
     - Introduce the factory function of the new wrapper type;
     - Add the wrapper to the resource list extending Step-3 (d);

  d) Introduce Gtid_state (GTID_EXECUTED) wrapper;

     - Introduce Instance_log_resource_gtid_set_wrapper class;
     - Introduce the factory function of the new wrapper type;
     - Add the wrapper to the resource list extending Step-3 (d);


Step-5: Introduce handlerton wrapper
------------------------------------

  a) At struct handlerton:

    1) Introduce lock_hton_log_t to handle the "do_lock";
       It will require a single parameter:
       - handlerton *hton: the handlerton of the storage engine;

    2) Introduce unlock_hton_log_t to handle the "do_unlock";
       It will require a single parameter:
       - handlerton *hton: the handlerton of the storage engine;

    3) Introduce collect_hton_log_info_t to handle the "do_collect_info";
       It will require two parameters:
       - handlerton *hton: the handlerton of the storage engine;
       - Json_object *json: the JSON object to be filled with the log info;

  b) Make handlerton (storage engines) a resource:

     - Introduce Instance_log_resource_hton_wrapper class;
     - Introduce the factory function of the new wrapper type;
     - Add the wrappers to the resource list extending Step-3 (d): only storage
       engines implementing the functions described in (a) will be wrapped and
       added to the resource list.


Step-6: Make relevant storage engines to participate in the process
-------------------------------------------------------------------

Each relevant storage engine to the process should implement its own functions
to support the following three handlerton functions:

  a) lock_hton_log: This function will be called before collecting the relevant
     information;
  b) unlock_hton_log: This function will be called after collecting the relevant
     information;
  c) collect_hton_log_info: This function will be called to collect the relevant
     information. A Json_object* will be passed to the function to add the
     content to the JSON object in a structured way.

Example (please do not consider this a proper implementation yet):

@ storage/innobase/handler/ha_innodb.cc

  innobase_hton->lock_hton_log = innobase_lock_hton_log;
  innobase_hton->unlock_hton_log = innobase_unlock_hton_log;
  innobase_hton->collect_hton_log = innobase_collect_hton_log_info;

...

/** Implements support for do_lock of Instance_log_resource_hton_wrapper
@param[in]  hton    the innodb handlerton
@return false on success */
static
bool
innobase_lock_hton_log(
  handlerton* hton)
{
  bool      ret_val= false;
  DBUG_ENTER("innodb_lock_hton_log");
  DBUG_ASSERT(hton == innodb_hton_ptr);
  log_lock();
  DBUG_RETURN(ret_val);
}

/** Implements support for do_unlock of Instance_log_resource_hton_wrapper
@param[in]  hton    the innodb handlerton
@return false on success */
static
bool
innobase_unlock_hton_log(
  handlerton* hton)
{
  bool      ret_val= false;
  DBUG_ENTER("innodb_unlock_instance");
  DBUG_ASSERT(hton == innodb_hton_ptr);
  log_unlock();
  DBUG_RETURN(ret_val);
}

/** Implements support for do_collect_info of Instance_log_resource_hton_wrapper
@param[in]  hton    the innodb handlerton
@param[out] json_engines   the Json_object to fill with the log information
@return false on success */
static
bool
innobase_collect_hton_log_info(

  handlerton* hton,
  Json_object*    json_engines)
{
  bool      ret_val= false;
  lsn_t  lsn;
  DBUG_ENTER("innodb_collect_hton_log_info");
  DBUG_ASSERT(hton == innodb_hton_ptr);
  log_collect_lsn(&lsn);
  Json_object json_innodb;
  Json_int json_lsn(lsn);
  json_innodb.add_clone("LSN", &json_lsn);
  json_engines->add_clone("InnoDB", &json_innodb);
  DBUG_RETURN(ret_val);
}

@ storage/innobase/include/log0log.h:

/******************************************************//**
Lock log. */
void
log_lock(void);
/******************************************************//**
Unlock log. */
void
log_unlock(void);
/******************************************************//**
Collect log LSN. */
void
log_collect_lsn(
/*=========*/
  lsn_t*  lsn); /*!< out: current lsn */

@ storage/innobase/log/log0log.cc:

/******************************************************//**
Lock log. */
void
log_lock(void)
{
  log_mutex_enter();
}

/******************************************************//**
Unlock log. */
void
log_unlock(void)
{
  log_mutex_exit();
}

/******************************************************//**
Collect log LSN. */
void
log_collect_lsn(
/*=========*/
  lsn_t*  lsn)  /*!< out: current lsn */
{
  *lsn = log_sys->lsn;
}