WL#342: Replication Heartbeat

Affects: Server-5.5   —   Status: Complete

1. To solve BUG#20435 - extra relay log rotation;
2. to detect failures more easily and precisely;
3. to have better second_behind_master value on slave (BUG#29309).


1. New extension to CHANGE MASTER:
   CHANGE MASTER SET master_heartbeat_period= val;
   to be stored in master info file and mi object
   to provide the value sent to master.

   The value 'val' needs to be within some reasonable interval.
   As the cost of handling creation, sending and treating the event
   on slave side is supposed to be low, the val can be as small as 
   1 seconds even less.

2. Slave io thread prepares and sends the query
     'SET SESSION @master_heartbeat_period= val'
   to master. From the query
   the master's dump thread finds out slave's preference about heartbeat
   sending period.

3. The heartbeat event is created by master's DUMP thread and sent each time
   @master_heartbeat_period elapses which designate nothing has added to binlog
   for the period time.

4. On the slave's side the heartbeat event is handled exclusively by IO thread
   avoiding its recording to relay log as well as engaging slave sql thread.
   Upon receiving the last sent event's coordinates are compared against
   the ones slave io thread maintains and updates per each received event,
   except some "phantoms" including the new Heartbeat.
   The heartbeat event does not update that local information.

5. The Heartbeat period and the number of received event should be monitorable
    via SHOW STATUS like 'slave_heartbeat period'  and
        SHOW STATUS like 'slave_received_heartbeats' respectively.

A newer heartbeat-aware slave will not have any error response from 
an unintelligent old master about the slave connecting time query 
  set @master_heartbeat_period=val;
and naturally the old master will not send heartbeats.
In that case, slave will show its chosen heartbeat value in the status,
but there will be no real actions.

Using a the user variable @master_heartbeat_period instead of the system one
avoids displaying the name within a list of available variables for a plain
user session.
Existence of the user variable master_heartbeat_period can be noticed only via
the general query log.

User observable behaviour
1. Requesting from the slave to send heartbeats from master with a period:

   CHANGE MASTER master_heartbeat_period= val
    where val is the period being of the decimal type with the value in the
    range [0.001, 4294967] seconds.
    Notice, that heartbeats are sent by the master only if there is no
    more unsent events in the actual binlog file for a period longer that
    Whenever the master's binlog is updated with an event, the waiting
    for heartbeat sending condition gets reset.

    If `val' is zero no hearbeats will be sending.
    Notice, heartbeat is active by default with the period

2.  SHOW STATUS like 'slave_heartbeat_period'
    Slave's side status variable which gets the value from either
    CHANGE MASTER, master.info or implicitly as `slave_net_timeout/2'
    (the default).
    The denominator 2 provides a reasonable default period to guarantee
    no reconnection will happen to an idling master upon elapsing

3. SHOW STATUS like 'slave_received_heartbeats';

    The counter that initializes at slave init time, increments by
    every received heartbeat and resets to zero with CHANGE MASTER;
    The memory size for the counter is the size for ulonglong i.e
    normally 8 bytes.
    Overflowing it even with the fastest heartbeat is possible only
    on a cosmic time scale.

    resets the current heartbeat's period to the default (see 2.).
    The check of the valid range remains after 
    computing slave_net_timeout/2 with dropping the period's value to
    the max allowable if the ratio would be greater.

5.  SET @@global.slave_net_timeout=`value less than the current hb period`

    is warned as that'd be an irrational intention.

Suggestion from Jeremy Zawodny at OSCON, July 2001, as a discussion idea:

Need active heartbeat detection mechanism for replication monitoring.  
Writing an error to log file is insufficient notice of failure.  
Instead need a way to trigger an alert message to DBA or 
a method for a watcher program to immediately detect 
when a slave has failed.
Detailed design

A) Data structs and the new object

 1.  Heartbeat event class with fields:
   - master_log_file (the last binlog file name on master)
     master_log_pos (last written event on master) and
     master_current_time at the moment of creating the heartbeat event
     are recorded into existing members of the parent Log_event class.

   Other reasons to inherit heartbeat class from Log_event are as the following:

     - the cost of a single heartbeat event processing will be low enough in
       this case even if heartbeat would be sent several times in a second.

     - Extendability. If we later want to add fields to the event, the
       log_event has dynamic headers...

  2. master_info struct on slave is augmented with

    - heartbeat_period type of float to allow selecting a real number 
      with precision up to 1 millisecond.
    - received_heartbeat type of ulonglong counter.

      Given 1 msec precision and 4 bytes of the heartbeat value's storage the
       maximum value of the heartbeat is bound to be within
       0 and ULONG_MAX/1000 interval. I.e the effective interval 
      for the period is [0.001, 4294967]; zero is excluded to
      mean not to send the heartbeat.

    The value of master_info.heartbeat_period is initialized via three options:

      - syntacticly extended CHANGE MASTER with MASTER_HEARTBEAT_PERIOD=val.
      - reading from master.info (float number format)
      - default (slave_net_timeout/2)

      Notice, there is no way to do that from server startup options which is
      a consequence of a deprecation BUG#21490.

  3. To change master dumping thread to wait with a timeout at
       MYSQL_LOG::wait_for_update() as
       pthread_cond_timedwait(&update_cond, &LOCK_log, master_heartbeat_period);

       If at return from waiting there is ETIMEDOUT or ETIME
       error condition then heartbeat event is to be sent.

  4. Slave io receives a heartbeat and handles it without recording it in
     relay log. Slave's side waiting status for master's activity is reset
     upon receiving anything from the socket which produces the desired effect -
     no reconnection although no real events are received but a hearbeat only.
     Slave io thread still instantiates the event for checking its validity
     status as is supposed to be done for most of the replication events.
     The event's members log_file_name and log_pos are compared against the
     slave's local knowledge to stop the io thread if log_pos does not match
     the value from the last event except heartbeats the slave has received.
     The file names and the log positions must be equal except the case
     when slave starts with empty master.info and thus does not know
     the last received event from the master; the slave will update its local
     mi->log_file_name upon receiving Rotate event (normally it should happen
     in some fraction of second after connecting). 
  5. Slave does not need to do anything special if heartbeat does not come.
     The current logics for reconnecting upon elapsing slave_net_timeout makes
     its job.
  6. Monitoring facilities are added via "standard" procedure - new status var
     and its display through a function. Notice, rli
     struct remains intact and changes are done to master_info only
     as relay_log_info is supposed to deal with relay-loggable feature that
     the heartbeat does not belong to.