WL#6355: Semisync: externalize transactions only after ACK received
Affects: Server-5.7
—
Status: Complete
This worklog will implement an option which provides lossless failover than the original implementation. PROBLEM ======= Before the worklog, the master waits for ACK from slave after it has committed to the storage engine. This ensures the slave cannot lag arbitrarily, but it can still lag by a bounded amount: - If there are N clients on the master, then the master may have committed N transactions that are not on the slave. - Moreover, even if the committing client on the master has not received the ACK, concurrent clients on the master can see the change. SOLUTION ======== In this worklog, an option is implemented to make the master wait for ACK after preparing the storage engine and writing to the binary log, but before committing to the storage engine. This allows for true lossless failover: if the master crashes, the slave is for sure up to date. DOCUMENTATION NOTES =================== User documentation: http://dev.mysql.com/doc/refman/5.7/en/server-system- variables.html#sysvar_rpl_semi_sync_master_wait_point http://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-2.html
WHERE TO WAIT ============= Semi-sync will wait binlog ACKs just after the log file is synced to disk and LOCK_sync is released. Otherwise, it will block the sync actions from other threads. It is also possible to wait binlog ACKs after the events are written to the log file. But this will block other threads to sync the file. It may hurt binlog performance. Because binlog group commit, events for a group transactions are written in log file together. And it just reports one binlog update action to semisync. So semisync will wait until it receives the ACK for the group of transactions. the transactions in the front of the queue(group) might be delayed for a long time if the group is big and many events should be transformed. And users may find the data can be seen on slave, but not on master. But it doesn't matter for this WL, because if users already see the data on slave it means data is replacated to slave and its transaction is at least prepared on master. No matter slave or master crashes, data is still avaiable on one server. MySQL Options ============= Right now, it is not clear if there is some side effect on performance. So the feature is implemented as an option and the old behaviour(waiting after commit) is kept. So user can control it. * rpl_semi_sync_master_wait_point Values: AFTER_SYNC, AFTER_COMMIT Default: AFTER_SYNC Type: System variable; Global; Dynamic; Can be used at startup AFTER_SYNC: new feature of this worklog. AFTER_COMMIT: is current behaviour. Deadlock Problem ================= AFTER_SYNC wait point introduces a deadlock problem, the detail is below. SESSION 1 SESSION 2 DUMP THREAD --------- --------- ----------- 1 Flush Stage(mark rotate flag) 2 Sync Stage 3 Waiting binlog ACK <------------- ACK ------------- Send binlog 4 Commit Stage Waiting New Events 5 Flush Stage 6 Rotate(hold LOCK_log) Waiting for Session2 to commit 7 Waiting for Session1 release LOCK_log 8 Sync Stage Waiting Group ACK - Session1's rotation can only be done after all prepared transactions are committed. When waiting for the prepared transactions, it has to hold LOCK_log to block other sessions to append binary events. It also blocks dump threads to read binary logs. - Session2 is waiting for ACK from a slave. It can only get the ACK after all its events are sent to slave. - But dump threads are blocked by Session1 in the meantime.
rpl-semi-sync-wait-position =========================== It is defined as a semisync master global variable, which can be changed dynamicly. Its type is MYSQL_SYSVAR_ENUM, default value is WAIT_AFTER_COMMIT. /* The place at where semi sync waits binlog events */ enum enum_wait_point { WAIT_AFTER_SYNC, WAIT_AFTER_COMMIT }; static const char *wait_point_names[]= {"AFTER_SYNC", "AFTER_COMMIT", NullS}; static TYPELIB wait_point_typelib= { array_elements(wait_point_names) - 1, "", wait_point_names, NULL }; static MYSQL_SYSVAR_ENUM( wait_point, /* name */ rpl_semi_sync_master_wait_point, /* var */ PLUGIN_VAR_OPCMDARG, /* flags */ "help", NULL, /* check() */ NULL, /* update() */ WAIT_AFTER_COMMIT, /* default */ &wait_point_typelib /* typelib */ ); After sync hook =============== There is a hook infrastructure to help the interactive between semisync plugin and server. The framework includes: * Hook and hook point * Binlog_storage_delegate * Binlog_storage_observer Interface * Binlog_storage_observer implementation in semisync Hook will call Binlog_storage_delegate's after_sync function at hook point. and the function will call semisync's observer implementation. Hook point ========== Because we don't want to hang other sessions which are doing sync at same time. The hook is called after LOCK_sync is released. So it is called two places. * For ordered commit, it is called after COMMIT_STAGE queue is fetched THD *commit_queue= stage_manager.fetch_queue_for(Stage_manager::COMMIT_STAGE); * For not ordered commit, it is called after LOCK_sync is release. mysql_mutex_unlock(&LOCK_sync); Binlog_storage_delegate ======================= Add after_sync function. int Binlog_storage_delegate::after_sync(THD *thd, const char *log_file, my_off_t log_pos); Binlog_storage_observer Interface ================================= Add after_sync function pointer. int (*after_sync)(Binlog_storage_param *param, const char *log_file, my_off_t log_pos); Semisync Binlog_storage_observer implementation =============================================== semisync master implements an after_sync function, if rpl_semi_sync_master_wait_point is set to WAIT_AFTER_SYNC, it will wait ACKs from slave in this function. Otherwise it will wait in repl_semi_report_commit. int repl_semi_report_binlog_sync(Binlog_storage_param *param, const char *log_file, my_off_t log_pos);
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.