After a successful failover, it is sometimes required to restore the initial topology and promote the crashed server to become the master again (or even a new server with distinctive hardware characteristics). Sometimes failover can be triggered by a simple network issue (not affecting the health of the initial master server) and after being resolved, it may be desirable to put the old master back in the replication topology. We can do this with several of the high availability utilities.
The goal of this task is simply to replace the new master of a replication topology with the previous one that might have been demoted as result of successful automatic failover execution. It is assumed that the server to be restored as master is healthy and any previous issue (that triggered failover) have been resolved.
Let's consider the previous topology after failover, now with a new master (server2:3312) and three slaves (server3:3313, server4:3314, server:3315), and that we want to promote the initial server (server1:3311) to master again.
Performing this task manually can be delicate as one wrong or missing step can lead to errors and errors in the replication topology or even to the lost of some transaction. Once more MySQL Utilities can provide a precious assistance to perform this task, in this case requiring the user to follow three simple steps to restore the initial topology as shown below.
There are several steps involved in solving this problem. We walk through each in turn.
You must first stop running the mysqlfailover utility instance and start the (old) master to be restored, i.e. server1:3311.
Next, set the old master (server1:3311) as a slave of the current new master (server2:3312):
shell> mysqlreplicate --master=root@server2:3312 --slave=root@server1:3311 -rpl-user=rpl:rpl # master on localhost: ... connected. # slave on localhost: ... connected. # Checking for binary logging on master... # Setting up replication... # ...done.
Next, switchover to the previous master to restore the initial replication topology:
shell> mysqlrpladmin --master=root@server2:3312 \ --slaves=root@server2:3313,root@server4:3314,root@server5:3315 \ --rpl-user=rpl:rpl --new-master=root@server1:3311 --demote-master switchover # Checking privileges. # Performing switchover from master at server2:3312 to slave at server1:3311. # Checking candidate slave prerequisites. # Checking slaves configuration to master. # Waiting for slaves to catch up to old master. # Stopping slaves. # Performing STOP on all slaves. # Demoting old master to be a slave to the new master. # Switching slaves to new master. # Starting all slaves. # Performing START on all slaves. # Checking slaves for errors. # Switchover complete. # # Replication Topology Health: +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ # ...done.
The initial replication topology is now restored and mysqlfailover can be restarted (but using --force) as initially:
shell> mysqlfailover --master=root@server1:3311 \ --slaves=root@server2:3312,root@server3:3313,root@server4:3314,server5:3315 \ --log=log.txt --rpl-user=rpl:rpl --force # Checking privileges. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Sat Jul 27 02:17:12 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000002 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
The most important step is the execution of the switchover command with the mysqlrpladmin utility. The previous steps can be seen as a preparation for switchover. The first step simply makes sure that the server is running and that there is no mysqlfailover instance still running that could affect the correct execution of switchover. The second step sets the old master as a slave of the new master, because the switchover command can only be performed with slaves. This step also allows the old master to catch up with the new master. If many transaction have been performed on the new master it is recommended to wait a while to let the old master catch up before switchover, otherwise the switchover command might take longer.
As expected, the execution of the switchover command requires
the specification of the current and new master with the
--new-master options as
well as the list of slaves in the topology using the
--slaves option (without
need to list the new master). The replication user is
specified with the
option. In this specific example, the use of the option
important, because without it the current master
(server2:3312) is not be demoted and set as a slave of the new
The mysqlrpladmin utility executes and displays information about all required actions to perform switchover. After completing the switchover process, a health report is displayed that you can use to confirm the successful execution of the command and verify that the topology has changed as expected.
After completing these simple steps, the replication topology is back to its initial structure (before failover) with its old master. Therefore, mysqlfailover is ready to be executed again to monitor the topology and reestablish automatic failover.
It is important to wait for the old master to catch up with the new master in order to ensure that no transactions are lost. Depending on the time the old master was down or not accessible it might take a considerable time for the old master to execute all missing transactions. MySQL Utilities provide tools that allow the visualizations of the slaves status, namely the 'health' command of the mysqlrpladmin utility.
An alternative set of steps could have been followed to
perform the desired task, using the failover command from
mysqlrpladmin instead of switchover. In
this case, the old master should be specified in the
candidates list using the option
--candidates to be
chosen as the preferred slave to become the new master (no
need for the --master, --new-master and --demote-master
options). However, an additional step are required to set the
previous master (server2:3312) as a slave of the old master
(server1:3311) using the mysqlreplicate
utility because failover does not demote the previous master
as it assumes that it is not available. Notice that unlike
switchover that fails if the server specified by the
--new-master option does
not meet the requirements to become master, failover choses
another server from the slaves list to become the new master
if the one specified in by the
option is not suitable. It is important to keep this behavior
differences in mind when deciding which command to apply.
The mysqlfailover utility registers its
execution on the servers in order to avoid concurrent
executions of the utility, which may lead to errors and
inconsistent state during failover. If the utility detects
that another instance might be running, it is started in
"fail" mode (not taking any action when it detects that the
master failed). The mysqlfailover instance
registration is cleared when the utility exits, and it is
expected that registration process can fail on crashed or not
accessible servers. The
--force option overwrite
the instance execution check allowing to surpass registration
failure on (crashed) old masters, allowing the
mysqlfailover utility to start in 'auto'