WL#5125: Refactory of Slave's master.info and relay_log.info

Affects: Server-5.6   —   Status: Complete

CONTEXT
=======
We aim at having a slave that, after a crash, may continue its normal operation
without any human intervention. The idea is to make the master.info and
relaylog.info, i.e the replication positions or simply positions, transactional
persistent and reliable. In other words, the idea is to keep the positions in
sync with the execution of transactions on the slave, thus incrementing the
positions when a transaction commits and restoring the previous positions when a
transaction rolls back.

There are two different proposals to accomplish our goals. Specifically, the
WL#2775 and WL#3970. The former proposes to exploit the transactional properties
of the storage engines (e.g. Innodb) and the latter to use a 2-PC mechanisms.
See further details in what follows.

BACKGROUND
==========

Transactional Engines:
----------------------
"In a system using write ahead logging, all modifications are written to a log
before they are applied. Usually both redo and undo information is stored in the
log. The changes are applied in memory, and asynchronously flushed to disk."

2PC:
----
"The two phases of the algorithm are the prepare phase, in which a coordinator
process attempts to prepare all the transaction's participating processes (named
participants or cohorts) to take the necessary steps for either committing or
aborting the transaction, and the commit phase, in which, based on voting
(either "Yes," commit, or "No," abort) of the cohorts, the coordinator decides
whether to commit (only if all vote "Yes") or abort the transaction, and
notifies the result to the cohorts, which follow with the needed actions (commit
or abort) with their transactional resources and their respective portions in
the transaction's output."


DESCRIPTION
===========

WL#2775
-------
It proposes the use of system tables to store the positions and takes advantage
of the transactional properties of the engine.

Requirements:
1 - If the data and positions are stored in different engines, all the engines
involved must support 2PC in order to provide crash-safety.

2 - If the data and positions are stored in the same engine, the engine must be
transactional in order to provide crash-safety.

Advantages:
1 - It may be the fastest approach if data and positions are stored in the same
engine.

2 - Non special requirement is needed if data and positions are stored in the
same engine, which means that all the current transactional engines can be used
with this approach.

Disadvantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will eliminate those files. Since all position data is stored in
database tables, it will not be possible to check the master.info and 
relay-log.info files offline. If administrators are used to manipulate the files
to "fix" replication, this approach will complicate issues for those administrators.

WL#3970
-------
It proposes to keep using the current files, i.e. master.info and relay.info,
and augment the current code base with a 2PC mechanism to make the positions
transactional persistent and reliable.

Requirements:
1 - The engines must support 2PC in order to provide crash-safety.

Advantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will keep the same infra-structure. Thus it is possible to check
the master.info and relay-log.info files offline if administrators are used to
manipulate the files to "fix" replication.

Disadvantages:
1 - It will require the engines to provide 2PC.
2 - It may harm the performance due to extra-fsyncs. See an analysis in what
follows.


ANALYSIS FSYNC
===============

In this analysis, we compare a vanilla MySQL with possible implementations in
order to figure out the number of extra fsyncs required to make the solution
crash-safe.

1 - Storing positions along with the XID

  BACKGROUND:
  If the binlog is enabled, the the current implementation of the 2-PC
  uses the stored XID in the binlog in order to decide if a transaction should
  commit after a failure. In other words, in the second phase of a 2-PC after 
  all the participants have voted to commit a transaction, a failure while
  writing to the binlog would rollback the transaction when the MySQL recovers.

  Although the binlog is a participant in the 2-PC, it does nothing in the
  prepare phase requiring just to fsync in the commit phase.

  In this approach, we propose to store the positions along with XID in the
  binlog file.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

2 - Storing the positions in a different file from the binlog.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  In contrast to the approach described in (1), we propose here to store the
  positions in a new file to be specified by the user.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  - Total 3 extra fsyncs.

3 -  Storing the positions in a different file with the binlog enabled.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  This approach is similar to the one described in (2), but now, we also have
  the slave acting as a master and as such the binlog is enabled. Thus regarding
  fsyncs this approach is the sum of (1) and (2).

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  . And an extra fsync while writing the XID.
  - Total 4 extra fsyncs.

4 - Storing the positions in a system table using the same engine as the data

  BACKGROUND:
  The transactional mechanism of the storage engine will hide any performance
  penalties. Note, however, that the implementation needs to be well designed
  to avoid creating unnecessary entries in the transactional log and keep the
  data in memory.

  EXTRA-FSYNCS:
  - This is the best case and there is no need for extra fsyncs.

5 - Storing the positions in a system table but using a different storage engine
    from the data.

  BACKGROUND:
  Note that if the data is stored in a different storage engine from the
  positions a 2-PC is required. This is equivalent to case 1.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

RELATED ISSUES
==============

There are other bugs and worklogs that also have the goal of making the
slave safe. See a brief list below:

1 - BUG#45292 aims at making the index file safe.

2 - WL#4621 handles the case that the master.info and relay.info are not in sync
and the relaylog is corrupted.

3 - There is no worklog or bug to handle the case that the master gets its
binary log corrupted due to a crash. There is no positional information similar
to what we have on the slave.