WL#5125: Refactory of Slave's master.info and relay_log.info

Affects: Server-5.6 — Status: Complete

Description
High Level Architecture
Low Level Design

CONTEXT
=======
We aim at having a slave that, after a crash, may continue its normal operation
without any human intervention. The idea is to make the master.info and
relaylog.info, i.e the replication positions or simply positions, transactional
persistent and reliable. In other words, the idea is to keep the positions in
sync with the execution of transactions on the slave, thus incrementing the
positions when a transaction commits and restoring the previous positions when a
transaction rolls back.

There are two different proposals to accomplish our goals. Specifically, the
WL#2775 and WL#3970. The former proposes to exploit the transactional properties
of the storage engines (e.g. Innodb) and the latter to use a 2-PC mechanisms.
See further details in what follows.

BACKGROUND
==========

Transactional Engines:
----------------------
"In a system using write ahead logging, all modifications are written to a log
before they are applied. Usually both redo and undo information is stored in the
log. The changes are applied in memory, and asynchronously flushed to disk."

2PC:
----
"The two phases of the algorithm are the prepare phase, in which a coordinator
process attempts to prepare all the transaction's participating processes (named
participants or cohorts) to take the necessary steps for either committing or
aborting the transaction, and the commit phase, in which, based on voting
(either "Yes," commit, or "No," abort) of the cohorts, the coordinator decides
whether to commit (only if all vote "Yes") or abort the transaction, and
notifies the result to the cohorts, which follow with the needed actions (commit
or abort) with their transactional resources and their respective portions in
the transaction's output."


DESCRIPTION
===========

WL#2775
-------
It proposes the use of system tables to store the positions and takes advantage
of the transactional properties of the engine.

Requirements:
1 - If the data and positions are stored in different engines, all the engines
involved must support 2PC in order to provide crash-safety.

2 - If the data and positions are stored in the same engine, the engine must be
transactional in order to provide crash-safety.

Advantages:
1 - It may be the fastest approach if data and positions are stored in the same
engine.

2 - Non special requirement is needed if data and positions are stored in the
same engine, which means that all the current transactional engines can be used
with this approach.

Disadvantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will eliminate those files. Since all position data is stored in
database tables, it will not be possible to check the master.info and 
relay-log.info files offline. If administrators are used to manipulate the files
to "fix" replication, this approach will complicate issues for those administrators.

WL#3970
-------
It proposes to keep using the current files, i.e. master.info and relay.info,
and augment the current code base with a 2PC mechanism to make the positions
transactional persistent and reliable.

Requirements:
1 - The engines must support 2PC in order to provide crash-safety.

Advantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will keep the same infra-structure. Thus it is possible to check
the master.info and relay-log.info files offline if administrators are used to
manipulate the files to "fix" replication.

Disadvantages:
1 - It will require the engines to provide 2PC.
2 - It may harm the performance due to extra-fsyncs. See an analysis in what
follows.


ANALYSIS FSYNC
===============

In this analysis, we compare a vanilla MySQL with possible implementations in
order to figure out the number of extra fsyncs required to make the solution
crash-safe.

1 - Storing positions along with the XID

  BACKGROUND:
  If the binlog is enabled, the the current implementation of the 2-PC
  uses the stored XID in the binlog in order to decide if a transaction should
  commit after a failure. In other words, in the second phase of a 2-PC after 
  all the participants have voted to commit a transaction, a failure while
  writing to the binlog would rollback the transaction when the MySQL recovers.

  Although the binlog is a participant in the 2-PC, it does nothing in the
  prepare phase requiring just to fsync in the commit phase.

  In this approach, we propose to store the positions along with XID in the
  binlog file.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

2 - Storing the positions in a different file from the binlog.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  In contrast to the approach described in (1), we propose here to store the
  positions in a new file to be specified by the user.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  - Total 3 extra fsyncs.

3 -  Storing the positions in a different file with the binlog enabled.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  This approach is similar to the one described in (2), but now, we also have
  the slave acting as a master and as such the binlog is enabled. Thus regarding
  fsyncs this approach is the sum of (1) and (2).

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  . And an extra fsync while writing the XID.
  - Total 4 extra fsyncs.

4 - Storing the positions in a system table using the same engine as the data

  BACKGROUND:
  The transactional mechanism of the storage engine will hide any performance
  penalties. Note, however, that the implementation needs to be well designed
  to avoid creating unnecessary entries in the transactional log and keep the
  data in memory.

  EXTRA-FSYNCS:
  - This is the best case and there is no need for extra fsyncs.

5 - Storing the positions in a system table but using a different storage engine
    from the data.

  BACKGROUND:
  Note that if the data is stored in a different storage engine from the
  positions a 2-PC is required. This is equivalent to case 1.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

RELATED ISSUES
==============

There are other bugs and worklogs that also have the goal of making the
slave safe. See a brief list below:

1 - BUG#45292 aims at making the index file safe.

2 - WL#4621 handles the case that the master.info and relay.info are not in sync
and the relaylog is corrupted.

3 - There is no worklog or bug to handle the case that the master gets its
binary log corrupted due to a crash. There is no positional information similar
to what we have on the slave.

CONTEXT
=======
The current code-base is structure as follows:

                            -------------------------------
                            |  Slave_reporting_capability |
                            -------------------------------
                                     ^         ^
                                     |         |
                  ----------------------     ----------------------
                  |     Master_info    |     |   Relay_log_info   |
                  ----------------------     ----------------------

Slave_reporting_capability - provides reporting capabilities.

Master_info - handles information in the master.info file. 

Relay_log_info - handles information in the relay_log.info. 

Both the Master_info and Relay_log_info are designed to store data into files
and have several common internal structures such as thread pointers and mutexes
that are duplicated.

PROPOSAL
========
In this work, we propose to refactory the code and create the class Rpl_info
that will be inherited by the Master_info and Relay_log_info classes. These
classes will provide methods that handle operations that are independent from
the type of persistence used (file, system tables, etc) and will hide their
associated information through a set of getter and setter methods.


                          -------------------------------  
                          |  Slave_reporting_capability |  
                          -------------------------------  
                                         ^     
                                         |     
                               ----------------------
                               |      Rpl_info      |
                               ----------------------
                                     ^         ^
                                     |         |
                  ----------------------     ----------------------
                  |     Master_info    |     |   Relay_log_info   |
                  ----------------------     ----------------------


Each type of persistence will have its own specialization as follows:


                               ----------------------
                               |   Rpl_info_Handler |
                               ----------------------
                                 ^                ^
                                 |                |
                  ----------------------     ----------------------
                  |   Rpl_info_file    |     |   Rpl_info_table   |
                  ----------------------     ----------------------


A factory is responsible for creating and assembling this set of components,
i.e., setting the dependencies between the instances as follows:

                             ----------------------
                             |   Rpl_info_factory |
                             ---------------------- 

              ----------------------     ----------------------
              |     Master_info    |<>---|    Rpl_info_file   |
              ----------------------     ----------------------

              ----------------------     ----------------------
              |   Relay_log_info   |<>---|    Rpl_info_table  |
              ----------------------     ----------------------

In that case, the Master_info will call the handler Rpl_info_file whenever
necessary in order to, among other things, initialize the storage system, reset
it, read from it or write to it. Something similar happens with the
Relay_log_info which may have a different persistent storage system from the
Master_info.

Requirements
============
R1. It shall preserve the current behavior which uses files to unreliable store
both the master.info and relay_log.info data.

R2. It shall enable to easily create other persistence mechanisms as described
in the WL#2775 and WL#3970.

R3. It shall enable to easily introduce options in order to choose the 
persistence mechanisms available.


Rpl_info
========
. This is an abstract class that is extended by both the Master_info and
Relay_log_info.

Slave_reporting_capability
==========================
. This is the reporting class already used by both the Master_info and
Relay_log_info.

Master_info
===========
. Master_info will handle the following information:
  - Master_Log_File - The name of the master binary log currently being read
    from the master.
  - Read_Master_Log_Pos - The current position within the master binary log
    that have been read from the master.
  - Master_Host - The host name of the master.
  - Master_User - The user name used to connect to the master.
  - Master_Password (not shown by SHOW SLAVE STATUS) - The password used to
    connect to
    the master.
  - Master_Port - The network port used to connect to the master.
  - Connect_Retry - The period (in seconds) that the slave will wait before
    trying to reconnect to the master.
  - Master_SSL_Allowed - Indicates whether the server supports SSL connections.
  - Master_SSL_CA_File - The file used for the Certificate Authority (CA)
    certificate.
  - Master_SSL_CA_Path - The path to the Certificate Authority (CA)
    certificates.
  - Master_SSL_Cert - The name of the SSL certificate file.
  - Master_SSL_Cipher - The name of the cipher in use for the SSL connection.
  - Master_SSL_Key - The name of the SSL key file.
  - Master_SSL_Verify_Server_Cert - Whether to verify the server certificate.

Relay_log_info
==============
. Relay_log_info will handle the following information:
  - Relay_Log_File - The name of the current relay log file.
  - Relay_Log_Pos  - The current position within the relay log file. Events up
  to this position have been executed on the slave database.
  - Relay_Master_Log_File - The name of the master binary log file from which
  the events in the relay log file were read.
  - Exec_Master_Log_Pos - The equivalent position within the master's binary log
  file of events that have already been executed.

Rpl_info_Handler
================
. This an abstract class that provides a simple interface to build persistence
mechanisms.

Rpl_info_file
=============
. This a class that encapsulates the common code to access, flush,
initiate and delete a set information stored in a file.

Rpl_info_table
=============
. This a class that encapsulates the common code to access, flush,
initiate and delete a set information stored in a system file. This is going
to be the subject of WL#2775.

Rpl_info_factory
================
. Creates and assembles the set of components presented in this section.

We propose the following design to the Rpl_info, Master_info, Relay_log_info,
Rpl_handler,  Rpl_info_file, Rpl_info_table:

. Explicitly disable copying (copy construction and copy assignment).

class Rpl_info : public Slave_reporting_capability
================
This is an abstract class and provides the following interface:

init_info() This method is used to initialize internal structures that cannot be
initialized in any constructor.

check_info() verifies if the persistent storage system (e.g. file or table) is
correctly created.

flush_info() flushes data to a persistent storage system.

reset_info() cleans the storage system and internal data structures.

end_info() flushes data and closes the storage system.

set_rpl_info_handler() sets the handler that will be responsible for
the persistence.

class Rpl_info_handler :
========================
This class has the same set of life cycle calls as the Rpl_info and along with
a set of setters and getters allow to easily decouple business application from
persistence. This is an abstract class and provides the following interface:

init_info() This method is used to initialize internal structures that cannot be
initialized in any constructor.

check_info() verifies if the persistent storage system (e.g. file or table) is
correctly created.

flush_info() flushes data to a persistent storage system.

reset_info() cleans the storage system and internal data structures.

end_info() flushes data and closes the storage system.

prepare_info_for_read() enables the storage system to receives reads. This is
necessary,
in case of a file, to enable reads from the cache and put the cursor in the
right place.

prepare_info_for_write() enables the storage system to receives writes.

set_info() writes a field to the storage system.

get_info() reads a field from the storage system.

get_number_info() gets the number of fields available.

set_sync_period() configures the number of events after which the info (e.g.
master info, relay log info) must be synced when flush() is called.

class Rpl_info_file : public Rpl_info_handler
=====================
This is a concrete class that inherits from the Rpl_info_handler.

class Rpl_info_handler :
========================
{
public:
  Rpl_info_handler();
  virtual ~Rpl_info_handler();

  set_sync_period(uint period);

  int init_info();
  int check_info();
  int flush_info(const bool force=FALSE);
  int reset_info();
  void end_info();

  prepare_info_for_read();
  prepare_info_for_write();
  get_number_info();

  bool set_info(const char *value);
  bool set_info(int const value);
  bool set_info(float const value);
  bool set_info(my_off_t const value);
  bool set_info(const Server_ids *value);
  bool get_info(char *value, const size_t size,
                const char *default_value);
  bool get_info(int *value,
                int const default_value);
  bool get_info(float *value,
                float const default_value);
  bool get_info(my_off_t *value,
                my_off_t const default_value);
  bool get_info(Server_ids *value,
                const Server_ids *default_value);

private:
  virtual int do_init_info()= 0;
  virtual int do_check_info()= 0;
  virtual int do_flush_info(const bool force)= 0;
  virtual int do_reset_info()= 0;
  virtual void do_end_info()= 0;
  virtual int do_prepare_info_for_read()= 0;
  virtual int do_prepare_info_for_write()= 0;

  virtual bool do_set_info(const int pos, const char *value)= 0;
  virtual bool do_set_info(const int pos, const int value)= 0;
  virtual bool do_set_info(const int pos, const float value)= 0;
  virtual bool do_set_info(const int pos, const my_off_t value)= 0;
  virtual bool do_set_info(const int pos, const Server_ids *value)= 0;
  virtual bool do_get_info(const int pos, char *value, const size_t size,
                           const char *default_value)= 0;
  virtual bool do_get_info(int pos, int *value,
                           const int default_value)= 0;
  virtual bool do_get_info(const int pos, float *value,
                           const float default_value)= 0;
  virtual bool do_get_info(const int pos, my_off_t *value,
                           const my_off_t default_value)= 0;
  virtual bool do_get_info(const int pos, Server_ids *value,
                           const Server_ids *default_value)= 0;

  Rpl_info_handler& operator=(const Rpl_info_handler& handler);
  Rpl_info_handler(const Rpl_info_handler& handler);
};


class Rpl_info_file : public Rpl_info_handler
=====================
{
public:
  Rpl_info_file(const char* param_info_fname);
  virtual ~Rpl_info_file();

private:
  int do_init_info();
  int do_check_info();
  void do_end_info();
  int do_flush_info(const bool force);
  int do_reset_info();

  int do_prepare_info_for_read();
  int do_prepare_info_for_write();
  bool do_set_info(const int pos, const char *value);
  bool do_set_info(const int pos, const int value);
  bool do_set_info(const int pos, const float value);
  bool do_set_info(const int pos, const my_off_t value);
  bool do_set_info(const int pos, const Server_ids *value);
  bool do_get_info(const int pos, char *value, const size_t size,
                   const char *default_value);
  bool do_get_info(const int pos, int *value,
                   const int default_value);
  bool do_get_info(const int pos, float *value,
                   const float default_value);
  bool do_get_info(const int pos, my_off_t *value,
                   const my_off_t default_value);
  bool do_get_info(const int pos, Server_ids *value,
                   const Server_ids *default_value);

  Rpl_info_file& operator=(const Rpl_info_file& info);
  Rpl_info_file(const Rpl_info_file& info);
};


class Rpl_info : public Slave_reporting_capability
================
{
public:
  Rpl_info(const char* type);
  virtual ~Rpl_info();

  int check_info();
  int reset_info();
  void set_rpl_info_handler(Rpl_info_handler * handler);

private:
  Rpl_info& operator=(const Rpl_info& info);
  Rpl_info(const Rpl_info& info);
};


class Master_info: public Rpl_info
=================
{
public:
  int init_info();
  int check_info();
  int flush_info(const bool force= FALSE);

private:
  Master_info& operator=(const Master_info& info);
  Master_info(const Master_info& info);
};


class Relay_log_info: public Rpl_info
=====================
{
public:
  int init_info();
  int check_info();
  int flush_info(const bool force= FALSE);

private:
  Relay_log_info& operator=(const Relay_log_info& info);
  Relay_log_info(const Relay_log_info& info);
};


Additional Information
=======================

The following files shall be added to accommodate the changes:

sql/rpl_info.cc
sql/rpl_info.h
sql/rpl_info_factory.cc
sql/rpl_info_factory.h
sql/rpl_info_file.cc
sql/rpl_info_file.h
sql/rpl_info_handler.h
sql/rpl_info_handler.cc