WL#6120: Change master without stopping slave
Affects: Server-5.7
—
Status: Complete
In order to add/alter an option using CHNAGE MASTER TO command, currently it is necessary to do a STOP SLAVE before CHANGE MASTER. This worklog relaxes this constraint. Lets look at the three scenarios below to understand more about this task. 1) BOTH IO AND SQL THREAD ARE STOPPED When both the slave threads are stopped, there will be no change in behaviour. The CHANGE MASTER command will behave as it does now. 2) IO THREAD IS STOPPED, SQL THREAD IS RUNNING In order to switch the I/O thread over to read from another master it is currently necessary to stop the SQL thread as well. This worklog implements support for re-directing the I/O thread to another master without having to stop the SQL thread first, wherever possible. 3) SQL THREAD IS STOPPED, IO THREAD IS RUNNING In order to CHANGE MASTER TO RELAY_LOG_FILE/RELAY_LOG_POS/MASTER_DELAY, we currently have to stop the SQL thread as well. This worklog will allow these CHANGE MASTER options without having to stop the IO thread, if possible.
1) Currently, the way we move from a topology M1->S to M2->S is: a) STOP SLAVE b) SHOW SLAVE STATUS to get (Read_Master_Log_Pos, Master_Log_File) c) START SLAVE UNTILd) SELECT MASTER_POS_WAIT( ) e) CHANGE MASTER f) START SLAVE The proposal is to reduce these steps to just CHANGE MASTER wherever applicable. See points (a-d) below regarding these rules. a) If IO thread is running and SQL thread is stopped: - CHANGE MASTER TO RELAY_LOG_FILE/RELAY_LOG_POS/MASTER_DELAY will be allowed. - All other CHANGE MASTER options will be disallowed b) If SQL thread is running and IO thread is stopped: - CHANGE MASTER TO RELAY_LOG_FILE/RELAY_LOG_POS/MASTER_DELAY will be disallowed - All other CHANGE MASTER options will be allowed. c) CHANGE MASTER TO MASTER_AUTO_POSITION=1 will be allowed only if both IO and SQL threads are stopped. d) If the receiver/applier is running and the slave has open temporary tables, we print a warning on CHANGE MASTER. 2) In the above mentioned change, there could be an instant of time when the IO thread is reading from M2 and the SQL thread is executing events that had been received from M1, *both at the same time*. Also there is no overhead of killing and spawning new threads. 3) Currently, CHANGE MASTER purges relay log files unless the command uses RELAY_LOG_FILE/RELAY_LOG_POS option. This behavior will be kept intact when the both thread are stopped. The reason for this is that we can't remove the relay log(s) with a running SQL thread. When any one thread is running while we do a CHANGE MASTER, we dont delete relaylogs. The relaylog deletion can be handled by using the relay-log-purge option. 4) With Statement based replication (SBR), we don't recommend using temporary tables. One reason is that there is a possibility that the temporary tables are left open forever on a failover. To warn users that there could be such a situation we introduce warnings in the error log when one does a change master or stop slave. More precisely, we follow the following rules: 4.1 change master should never drop temp tables 4.2 We introduce a new command to drop temp tables. 4.3 The options under change master can be grouped under three groups: a) To change a connection configuration but remain connected to the same master. b) To change positions in binary or relay log(eg: master_log_pos). c) To change the master you are replicating from. Change master should generate a warning if there are open temp tables in cases a and b above. 4.4 Stop slave should generate a warning if there are open temp tables.
1) Removed the following check for both IO and SQL threads stopped from change_master() if (thread_mask) // We refuse if any slave thread is running { my_message(ER_SLAVE_MUST_STOP, ER(ER_SLAVE_MUST_STOP), MYF(0)); ret= true; goto err; } 2) Introduced error/warning in the following cases: a) Added a new error- ER_SLAVE_IO_THREAD_MUST_STOP (This operation cannot be performed with a running slave sql thread; run STOP SLAVE IO_THREAD first). b) Added a new warning- ER_WARN_OPEN_TEMP_TABLES_MUST_BE_ZERO (This operation may not be safe when the slave has temporary tables. The tables will be kept open until the server restarts or until the tables are deleted by any replicated DROP statement. Suggest to wait until slave_open_temp_tables = 0.) 4) Added checks for disallowing all CHNAGE MASTER options except RELAY_LOG_FILE/RELAY_LOG_POS/MASTER_DELAY when the SQL thread is running. 5) Added code to allow CHANGE MASTER TO RELAY_LOG_FILE/ RELAY_LOG_POS/ MASTER_DELAY when SQL thread is running and IO thread is stopped. 6) Following is the pseudocode describing changes in this WL: If both receiver and applier are stopped no change in behavior if any change master options affect the receiver: if receiver is executing: error(ER_SLAVE_IO_THREAD_MUST_STOP) if any change master options affect the applier: if applier is executing: error(ER_SLAVE_SQL_THREAD_MUST_STOP) if applier is executing and applier has open temporary tables and the user does a change master/stop slave: warning(ER_WARN_OPEN_TEMP_TABLES_MUST_BE_ZERO) if there was an error above: return TRUE (consistent with current semantics) 7) Added the following tests in mtr's rpl suite. a) rpl_change_master_without_stopping_slaves.test b) rpl_change_master_open_temp_tables.test c) rpl_change_master_relay_log_purge.test 8) While doing this, we also worked on the readability, maintainability of the change_master() function.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.