WL#7169: Semisync: make master wait for more than one slave to ack back
Affects: Server-5.7
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog implements an option to make the master wait for more N slaves to acknowledge back, instead of just one, when semisync is ON. Choosing to wait for N slaves (N > 1), increases resiliency to consecutive random failures. It also improves transaction durability as one transaction gets persisted in more than two servers before the results are externalized on the master. USER BENEFIT ============ When using semisync is turned ON, the master holds the session until it gets an acknowledgement from a slave that it has written the transaction to the relay log. If both master and semisync slave crash, then the transaction is lost. After this worklog, the user will be able to specify that a transaction should be written to the relay log in N slaves before the master returns the session to the client. As such, in the event the master and F slaves crash at the same time (where F < N), either the client has not seen the results of its transaction, or the transaction is already persisted in some other slave. Consequently, when promoting a slave to become the new master it will have learned about this transaction (since promotion will create a candidate that knows all transactions in the system [1]). REFERENCES ========== [1] http://svenmysql.blogspot.co.uk/2013/03/flexible-fail-over-policies-using- mysql.html BUG#11762792: NUMBER OF SLAVES THAT MASTER WAITS BEFORE COMMIT USER DOCUMENTATION ================== http://dev.mysql.com/doc/refman/5.7/en/server-system- variables.html#sysvar_rpl_semi_sync_master_wait_for_slave_count http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-3.html
Interface Specification ======================= I-1: rpl_semi_sync_master_wait_for_slave_count It is a system variable defined in semisync master plugin. Users use it to set at least how many acks from different slaves transactions should wait before they can go ahead to engine commit or replying the users. Scope: GLOBAL Dynamic: YES Type: NUMBERIC Range: 1-65535 Default: 1 Note: Current design don't have good performance on big numbers. We choose to make it has better performance with the small values, because nearly no users require to set a big value.
Analysis On Original Semisync Process ===================================== * Transaction Sessions 1. Create ack request node when binlogging 2. Wait until reply_file_name_ and reply_file_pos_ exceed its position 3. engine commit * Dump Threads 1. Send events and ack request 2. Receive the ack 3. reportReplyBinlog() Update reply_file_name_ and reply_file_pos_ after receiving an ack. Rmove all ack request nodes which are smaller than the ack's position. Wake up transaction sessions. Note: 2 and 3 will be done in a separate thread afte WL#6630 Design of the worklog ===================== From above analysis we can know that reply_file_name_ and reply_file_pos_ is the brige between transaction sessions and dump threads. We understand the variables as that any transactions that their positions are smaller than the variables can go ahead. In another words, it means all events before the variables are already replicated to at least N(rpl_semi_sync_master_wait_for_slave_count) slaves. So the design changes the code only related to dump thread, but not transaction session. * Changes on dump thread. 3. if (rpl_semi_sync_master_wait_for_slave_count == 1) call reportReplyBinlog(). otherwise go 4. 4. AckContainer::insert() update ack information. Check if some positions already get acks from at least N (rpl_semi_sync_master_wait_for_slave_count) slaves. If true then return the ack information. 5. Call reportReplyBinlog if insert() return any ack information. * AckContainer A container to maintain the latest ack information of slaves. ack information includes: AckInfo { slave's id max ack's log file name. max ack's log file pos. }; * ack array The container is an array and its size equals to the value of rpl_semi_sync_master_wait_for_slave_count - 1. Any slave can only take one slot and its new ack will cover its old ack if it already takes a slot. When the array is full and a new ack is coming from a slave which does not take a slot, then the events before the minimum ack of the array are already replicated to at least rpl_semi_sync_master_wait_for_slave_count slaves. So call ReportReplyBinlog() to update related variables, and empty minimum acks from the array. * insert() Maintain the ack array, after receiving an ack. DEFINITION: AckInfo* insert(uint32 server_id, const char *log_file_name, my_off_t log_file_pos) LOGIC: 1. Update ack information if there is an old ack of the slave in the array update the slot's position with log_file_name and log_file_name if the slot's server id equals to the server_id. Otherwise add it into an empty slot. 2. if the container is full { find the minimum ack of the global array(including the one inserting) empty the slots that their ack positions are same to the minimum ack. insert the inserting ack to a slot if it isn't the minimum ack. return the minimum ack. } 3. insert the ack into the array. 4. return NULL. * resize() Change the array's size when rpl_semi_sync_master_wait_for_slave_count is changed. DEFINITION: int resize(unsigned int size, const AckInfo **ackinfo); LOGIC: Backup the ack array. Create a new ack array that has size-1 slots if size-1 is greater than 0. Call insert() to insert each ack of the old ack array into the new one. Free the old ack array. if insert() returns any ack pointer, then return it to caller. * Sys_rpl_semi_sync_master_wait_for_slave_count The object of systme variable rpl_semi_sync_master_wait_for_slave_count. DEFINITION: static Sys_var_uint Sys_rpl_semi_sync_master_wait_for_slave_count; * fix_rpl_semi_sync_master_wait_for_slave_count() handle and initialize internal things(e.g global ack array) when rpl_semi_sync_master_wait_for_slave_count is changed. it is called by Sys_rpl_semi_sync_master_wait_for_slave_count. DEFINITION: static void fix_rpl_semi_sync_master_wait_for_slave_count(MYSQL_THD thd, SYS_VAR *var, void *ptr, const void *val); LOGIC: if new value is not same to old value { Call AckContainer::resize to change the array. Call reportReplyBinlog if resize returns any ack pointer. } * Switch off semisync master Semisync master will switch off if slave numbers is less then N(rpl_semi_sync_master_wait_for_slave_count) and it will switch on until it receives an ACK from at least N(rpl_semi_sync_master_wait_for_slave_count) slaves. * How to recovery the crashed master Currently, all binlogged transactions will be recovered. We didn't change the behavior in this worklog. WL#7042 will do some changes to make the recovery more reasonable and practicable.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.