After a successful failover it is sometime required to restore the initial topology and promote the crashed server to master again (or even a new server with distinctive hardware characteristics). Sometimes failover can be triggered by simple network issue (not affecting the health of the initial master server) and after being resolved it might be desirable to put the old master back in the replication topology.
The goal of this task is simply to replace the new master of a replication topology with the previous one that might have been demoted as result of successful automatic failover execution due to some failure. It is assumed that the server to be restored as master is healthy and any previous issue (that triggered failover) have been resolved.
Let's consider the previous topology after failover, now with a new master (server2:3312) and three slaves (server3:3313, server4:3314, server:3315) and that we want to promote the initial server (server1:3311) to master again.
Perform this task manually can be delicate as one wrong or missing step can lead to errors and incorrect replication topology or even to the lost of some transaction. Once more MySQL Utilities can provide a precious assistance to perform this task, in this case requiring the user to follow three simple steps to restore the initial topology as shown below.
There are several steps involved in solving this problem. We walk through each in turn.
You must first stop running the mysqlfailover utility instance and start the (old) master to be restored, i.e. server1:3311.
Next, set the old master (server1:3311) as a slave of the current new master (server2:3312):
mysqlreplicate --master=root@server2:3312 --slave=root@server1:3311 -rpl-user=rpl:rpl# master on localhost: ... connected. # slave on localhost: ... connected. # Checking for binary logging on master... # Setting up replication... # ...done.
Next, switchover to the previous master to restore the initial replication topology:
mysqlrpladmin --master=root@server2:3312 \
--rpl-user=rpl:rpl --new-master=root@server1:3311 --demote-master switchover# Checking privileges. # Performing switchover from master at server2:3312 to slave at server1:3311. # Checking candidate slave prerequisites. # Checking slaves configuration to master. # Waiting for slaves to catch up to old master. # Stopping slaves. # Performing STOP on all slaves. # Demoting old master to be a slave to the new master. # Switching slaves to new master. # Starting all slaves. # Performing START on all slaves. # Checking slaves for errors. # Switchover complete. # # Replication Topology Health: +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ # ...done.
The initial replication topology is now restored and mysqlfailover can be restarted (but using --force) as initially:
mysqlfailover --master=root@server1:3311 \
--log=log.txt --rpl-user=rpl:rpl --force# Checking privileges. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Sat Jul 27 02:17:12 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000002 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
In reality, the main and most important step is the execution of the switchover command with the mysqlrpladmin utility. The previous steps can be seen as a preparation for switchover. The first step simply makes sure that the server is running and that there is no mysqlfailover instance still running that could affect the correct execution of switchover. The second step sets the old master as a slave of the new master, because the switchover command can only be performed with slaves. This step will also allows the old master to catch up with the new master. If many transaction have been performed on the new master it is recommended to wait a while here to let the old master catch up before switchover, otherwise the switchover command might take too much time.
As expected, the execution of the switchover command requires
the specification of the current and new master with the
--new-master options, as
well as the list of slaves in the topology using the
--slaves option (without
need to list the new master). The replication user is
specified with the
--rpl-user option. In
this specific example, the use of the option
important, because without it the current master
(server2:3312) will not be demoted and set as a slave of the
new master (server1:3311).
The mysqlrpladmin utility will execute and display information about all required action to perform switchover. After completing the switchover process, an health report is displayed where it is possible to confirm the successful execution of the command and verify that the topology has changed as expected.
After completing these simple steps, the replication topology
is back to its initial structure (before failover) with its
old master. Therefore, mysqlfailover is
ready to be executed again to monitor the topology and enable
automatic failover. Note that in this particular situation,
the use of the
option might be required because it is likely that some
registration data from the previous failover instance
execution might have be left on the recovered master (due to
its previous crash).
The mysqlfailover utility registers its
execution on the servers in order to avoid concurrent
executions of the utility which will likely lead to errors and
inconsistent state during failover. If the utility detects
that another instance might be running, it will be started in
"fail" mode (not taking any action when it detects that the
master failed). The mysqlfailover instance
registration is cleared when the utility exits, and it is
expected that unregistration fails on crashed or not
accessible servers. The
--force option overwrite
the instance execution check allowing to surpass
unregistration failure on (crashed) old masters, allowing the
mysqlfailover utility to start in 'auto'
It is important to wait for the old master to catch up with the new master in order to guaranty that no transactions are lost. Depending on the time the old master was down or not accessible it might take a considerable time for the old master to execute all missing transactions. MySQL Utilities provide tools that allow the visualizations of the slaves status, namely the 'health' command of the mysqlrpladmin utility.
An alternative set of steps could have been followed to
perform the desired task, using the failover command from
mysqlrpladmin instead of switchover. In
this case, the old master should be specified in the
candidates list using the option
--candidates to be
chosen as the preferred slave to become the new master (no
need for the --master, --new-master and --demote-master
options). However, an additional step will be required to set
the previous master (server2:3312) as a slave of the old
master (server1:3311) using the
mysqlreplicate utility, because failover
will not demote the previous master as it assumes that it is
not available. Notice that unlike switchover that will fail if
the server specified by the
--new-master option does
not meet the requirements to become master, failover will
chose another server from the slaves list to become the new
master if the one specified in by the
--candidates option is
not suitable. It is important to keep this behavior
differences in mind when deciding which command to apply.