WL#6355: Semisync: externalize transactions only after ACK received

Affects: Server-5.7 — Status: Complete

Description
Dependent Tasks
High Level Architecture
Low Level Design

This worklog will implement an option which provides lossless failover than the
original implementation.

PROBLEM
=======
Before the worklog, the master waits for ACK from slave after it has committed 
to the storage engine. This ensures the slave cannot lag arbitrarily, but it can
still lag by a bounded amount:
- If there are N clients on the master, then the master may have committed N 
  transactions that are not on the slave.
- Moreover, even if the committing client on the master has not received the 
  ACK, concurrent clients on the master can see the change.

SOLUTION
========
In this worklog, an option is implemented to make the master wait for ACK after 
preparing the storage engine and writing to the binary log, but before 
committing to the storage engine.

This allows for true lossless failover: if the master crashes, the slave is for 
sure up to date. 


DOCUMENTATION NOTES
===================

User documentation:
http://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html#sysvar_rpl_semi_sync_master_wait_point
http://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html
http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-2.html

WL#5721: Refactor replication dump thread

WHERE TO WAIT
=============
Semi-sync will wait binlog ACKs just after the log file is synced to disk 
and LOCK_sync is released. Otherwise, it will block the sync actions from other 
threads.

It is also possible to wait binlog ACKs after the events are written to the log
file. But this will block other threads to sync the file. It may hurt binlog 
performance.

Because binlog group commit, events for a group transactions are written in 
log file together. And it just reports one binlog update action to semisync. So
semisync will wait until it receives the ACK for the group of transactions. the 
transactions in the front of the queue(group) might be delayed for a long time 
if the group is big and many events should be transformed. And users may find
the data can be seen on slave, but not on master.

But it doesn't matter for this WL, because if users already see the data on 
slave it means data is replacated to slave and its transaction is at least 
prepared on master. No matter slave or master crashes, data is still avaiable 
on one server.


MySQL Options
=============
Right now, it is not clear if there is some side effect on performance. So the
feature is implemented as an option and the old behaviour(waiting after commit)
is kept. So user can control it.

* rpl_semi_sync_master_wait_point
  Values: AFTER_SYNC, AFTER_COMMIT
  Default: AFTER_SYNC
  Type: System variable; Global; Dynamic; Can be used at startup

  AFTER_SYNC: new feature of this worklog.
  AFTER_COMMIT: is current behaviour.
  

Deadlock Problem
=================
AFTER_SYNC wait point introduces a deadlock problem, the detail is below.

  SESSION 1                      SESSION 2            DUMP THREAD
  ---------                      ---------            -----------
1 Flush Stage(mark rotate flag)  

2 Sync Stage

3 Waiting binlog ACK <------------- ACK ------------- Send binlog

4 Commit Stage                                        Waiting New Events
        
5                                Flush Stage

6 Rotate(hold LOCK_log)             
  Waiting for Session2 to commit
7                                                     Waiting for Session1
                                                      release LOCK_log
8                                Sync Stage
                                 Waiting Group ACK

- Session1's rotation can only be done after all prepared transactions are 
  committed. When waiting for the prepared transactions, it has to hold 
  LOCK_log to block other sessions to append binary events. It also blocks dump 
  threads to read binary logs.
- Session2 is waiting for ACK from a slave. It can only get the ACK after all 
  its events are sent to slave. 
- But dump threads are blocked by Session1 in the meantime.

rpl-semi-sync-wait-position
===========================
It is defined as a semisync master global variable, which can be changed
dynamicly. Its type is MYSQL_SYSVAR_ENUM, default value is WAIT_AFTER_COMMIT.

/* The place at where semi sync waits binlog events */
enum enum_wait_point {
  WAIT_AFTER_SYNC,
  WAIT_AFTER_COMMIT
};

static const char *wait_point_names[]= {"AFTER_SYNC", "AFTER_COMMIT", NullS};
static TYPELIB wait_point_typelib= {
  array_elements(wait_point_names) - 1,
  "",
  wait_point_names,
  NULL
};
static MYSQL_SYSVAR_ENUM(
  wait_point,                      /* name     */
  rpl_semi_sync_master_wait_point, /* var      */
  PLUGIN_VAR_OPCMDARG,             /* flags    */
  "help",
  NULL,                            /* check()  */
  NULL,                            /* update() */
  WAIT_AFTER_COMMIT,               /* default  */
  &wait_point_typelib              /* typelib  */
);

After sync hook
===============
There is a hook infrastructure to help the interactive between semisync plugin
and server.

The framework includes:
* Hook and hook point
* Binlog_storage_delegate
* Binlog_storage_observer Interface
* Binlog_storage_observer implementation in semisync

Hook will call Binlog_storage_delegate's after_sync function at hook point. and
the function will call semisync's observer implementation.

Hook point
==========
Because we don't want to hang other sessions which are doing sync at same time.
The hook is called after LOCK_sync is released. So it is called two places.

* For ordered commit, it is called after COMMIT_STAGE queue is fetched
  THD *commit_queue= stage_manager.fetch_queue_for(Stage_manager::COMMIT_STAGE);
  
* For not ordered commit, it is called after LOCK_sync is release.
  mysql_mutex_unlock(&LOCK_sync);

Binlog_storage_delegate
=======================
Add after_sync function.
int Binlog_storage_delegate::after_sync(THD *thd,
                                        const char *log_file,
                                        my_off_t log_pos);

Binlog_storage_observer Interface
=================================
Add after_sync function pointer.
int (*after_sync)(Binlog_storage_param *param, 
                  const char *log_file, my_off_t log_pos);

Semisync Binlog_storage_observer implementation
===============================================
semisync master implements an after_sync function, if
rpl_semi_sync_master_wait_point is set to WAIT_AFTER_SYNC, it will wait ACKs
from slave in this function. Otherwise it will wait in repl_semi_report_commit.

int repl_semi_report_binlog_sync(Binlog_storage_param *param,
                                 const char *log_file,
                                 my_off_t log_pos);