WL#344: Time-delayed Replication

Affects: Server-5.6 — Status: Complete

Description
High Level Architecture
Low Level Design

Pushed to next-mr on 2010-06-26.


Summary
=======

Make replication slave to lag a specified amount of time behind the master.

Purposes and use cases
======================

This feature can be used for several purposes:

(P1) Protect against user mistakes on master. A DBA that makes a disaster on a
     master that has a delayed slave can roll back to the time just before the
     disaster.

     Example: assume there is one master and one slave, the slave is running
     with a delay, and the DBA accidentally drops all tables on the master. The
     mistake can be undone with minimal downtime, as follows:

     slave> STOP SLAVE
     Replace the master by the slave and shut down the old master.
     slave> CHANGE MASTER TO MASTER_DELAY = 0
     slave> START SLAVE SQL_THREAD UNTIL 

     Variations of the above scheme can be used depending on the precise setup
     and the application's requirements.

     Cf. BUG#21639, BUG#22072

(P2) Test how the system behaves when there is a lag. For example, in a real
     application a lag might be caused by a big load on the slave. However, it
     can be difficult to generate a big load. This feature comes in handy to
     simulate the lag without having to simulate the load.

(P3) Debug conditions related to a lagging slave. For example, in BUG#28760,
     the reporter had problems which they suspected were caused by a network
     lag. Debugging these problems was greatly simplified using an early
     version of this worklog.

(P4) Inspect what the database looked like long ago, without having to reload a
     backup. For instance, if the delay is one week, and the DBA did development
     for a couple of days, and the DBA needs to go back and see what the
     database used to look like before the development, then the delayed slave
     can be inspected.

     Cf. BUG#22072 ("how to repeat")

Requirements
============


CHANGE MASTER: new option to set the delay
------------------------------------------

Introduce the following new syntax in CHANGE MASTER:

  CHANGE MASTER TO MASTER_DELAY = ;

 is of type INTEGER and specifies the number of seconds to delay.

If a non-integer value is specified, or if a negative value is specified, then
an error shall be generated and the statement shall not execute.


Semantics: how the option affects the slave threads
---------------------------------------------------

The slave IO thread is not affected by the delay. The master is not affect by
the delay either.

After the slave SQL thread has read an event and before it has executed the
event, the slave SQL thread shall read the event's timestamp (see "Chained
replication and time differences" below for a precise definition of what
timestamp to use). If the event's timestamp is greater than the current time
minus the delay, then the SQL thread sleeps until the delay is reached before
executing the event.

Exception: the SQL thread shall not wait for the following event types:
  START_EVENT_V3
  ROTATE_EVENT
  FORMAT_DESCRIPTION_EVENT
There are two reasons why we should not wait for these events:
 (1) These events don't affect anything else than the internal state of the SQL
     thread;
 (2) ROTATE_EVENT sometimes has wrong timestamp.

If the slave SQL thread is stopped or killed while it is sleeping, or if the
slave server is killed, then:
 - the sleep shall be interrupted immediately (it shall not finish sleeping)
 - the slave SQL thread shall not execute the event it was trying to delay
   (nor shall it execute any other event).


Chained replication and time differences
----------------------------------------

In a chained replication setup, the event's timestamp shall reflect the time
that the event was executed on the immediate master, not the time it was
executed on the master that first executed the event. For example, if the
replication topology is master_1 -> master_2 -> master_3, then the timestamp of
an event received by master_3 is equal to the time when the event was executed
on master_2, not the time when the event was executed on master_1.

If the master and slave have different system times, then the difference shall
be compensated: the delay time shall be independent of any unsynchronized clocks
and reflect the absolute time after the master executed the event.

Note: due to BUG#52308, the timestamp of the immediate master is not known to
the slave. Only the timestamp of the originating master is known. So before
BUG#52308 is fixed, the timestamp will (incorrectly) be the timestamp of the
originating master.


RESET SLAVE: clear the delay
----------------------------

RESET SLAVE shall set the delay to 0.


SHOW PROCESSLIST: new state for the SQL thread
----------------------------------------------

If the SQL thread has read an event that is newer than the current time minus
the delay, then the SQL thread has the state:

  "Waiting until MASTER_DELAY seconds after master executed event"


SHOW SLAVE STATUS: three new fields
-----------------------------------

Introduce three new fields to the output from SHOW SLAVE STATUS:

  SQL_Delay
    The number of seconds that the slave must be behind the master:
    a nonnegative integer.

  Slave_SQL_Running_State
    The state of the SQL thread (similar to Slave_IO_State): a string. The
    value is identical to the "State" of the SQL thread as displayed by
    SHOW PROCESSLIST.
    The name Slave_SQL_Running_State was chosen instead of the name
    Slave_SQL_State (which would rhyme better with the existing
    Slave_IO_State) to avoid possible confusion with SQLSTATE.

  SQL_Remaining_Delay
    When Slave_SQL_Running_State is "Waiting until MASTER_DELAY seconds after
    master executed event", then this field contains the number of seconds
    left of the delay: an integer.
    In other cases, this field is NULL.


Durability of option: relay-log.info
------------------------------------

When the server is restarted, the previous delay value shall be restored. This
is achieved by saving the delay in relay-log.info.

Note: the format of relay-log.info has to be modified in a non-trivial manner.
The previous format of relay-log.info was as follows:

rli->group_relay_log_name [\n-terminated ascii string]
rli->group_relay_log_pos [\n-terminated ascii-encoded number]
rli->group_master_log_name [\n-terminated ascii string]
rli->group_master_log_pos [\n-terminated ascii-encoded number]
possibly a string of "garbage"

Note that the file may end with garbage. The reason is an optimization in
writing the relay log: when a new relay log is written, the old relay log is
overwritten in place. If the new relay log is shorter than the old relay log,
the new relay log is not truncated, so the end of the relay log may contain
parts of previous relay logs.

This means that we cannot add the delay as just another \n-encoded ascii string
at the end of relay-log.info, because there would be no way to distinguish a
relay log in the old format that has garbage after the fourth field from a new
relay log that has a delay as a fifth field. Suppose the DBA executes the
following actions: (1) run a slave using a version before this worklog; (2) shut
down the old slave server when there is garbage at the end of relay-log.info;
(3) upgrade the old slave server to a version after this worklog; (4) start the
new slave server. Then the new slave server would try to read the garbage as a
delay. In the worst case it would use a previous very big binlog position as a
delay, effectively preventing the slave SQL thread from running.

So instead, we will use a new format for relay-log.info, as follows:

number_of_fields_in_file [\n-terminated ascii-encoded integer]
rli->group_relay_log_name [\n-terminated ascii string]
rli->group_relay_log_pos [\n-terminated ascii-encoded integer]
rli->group_master_log_name [\n-terminated ascii string]
rli->group_master_log_pos [\n-terminated ascii-encoded integer]
rli->delay [\n-terminated ascii-encoded integer]
possibly a string of "garbage"

Note: number_of_fields_in_file will be 5 after this worklog, but can be
increased in the future if more fields are needed.

When the new server reads the file, it shall first read just the first line.
If the first line is an integer, then:
 (1) Check that the integer value is 5 or more; if not, generate an error.
 (2) Assume the file uses the new format and read it accordingly.
If the first line is not a number, then read the old format file.
This is analogous to the format of master.info.

It is not possible to confuse the two formats, because rli->group_relay_log_name
always contains a dot ('.'); the integer does not.


CHANGES THAT NEED TO BE DOCUMENTED
==================================

New syntax for CHANGE MASTER statement
--------------------------------------

For http://dev.mysql.com/doc/refman/5.1/en/change-master-to.html

CHANGE MASTER now supports this syntax:

CHANGE MASTER TO MASTER_DELAY = N

N is an integer, 0 <= N < 2^31. After a subsequent execution of START SLAVE
[SQL_THREAD], the slave will guarantee that it does not execute statements until
N seconds later than they were executed on the master. The default is 0.


New restriction in BINLOG statement
-----------------------------------

For http://dev.mysql.com/doc/refman/5.1/en/binlog.html

Before this patch, the BINLOG statement could execute all types of events.

After this patch, the BINLOG statement can only execute
format_description_log_events and row events (including TABLE_MAP_EVENT,
WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, DELETE_ROWS_EVENT, PRE_GA_WRITE_ROWS_EVENT,
PRE_GA_UPDATE_ROWS_EVENT, PRE_GA_DELETE_ROWS_EVENT). It means that the output
from mysqlbinlog --base64-output=always cannot be parsed (--base64-output=always
is documented as "only for debugging and should not be used in a production
system").


(Note [2011-01-11]: After WL#5404, BINLOG will also be able to execute 
ROWS_QUERY_LOG_EVENT. /Sven)

New fields in the output from SHOW SLAVE STATUS
-----------------------------------------------

For http://dev.mysql.com/doc/refman/5.1/en/show-slave-status.html

Three new fields have been added to SHOW SLAVE STATUS. See above, header "SHOW
SLAVE STATUS: three new fields"


New functionality of RESET SLAVE
--------------------------------

For http://dev.mysql.com/doc/refman/5.1/en/reset-slave.html

RESET SLAVE now also does the equivalent of "CHANGE MASTER TO MASTER_DELAY = 0".