Once your replication topology is setup, it is important to consider the possible occurrences of failures in order to maintain the high availability level of your system. Several failures independently from their cause (network connection issue, hard drive crash, cosmic lightning, etc.) can stop the replication process by making the master no longer accessible by its slaves.
In this type of situation, it is desirable to promote one of the slaves to master while the problem with the old master is being solve as it can take some considerable time. It will be even better to have an application to monitor the replicate topology and automatically perform failover, minimizing downtime and keeping replication running.
The goal is to start the mysqlfailover utility to monitor a replication topology and perform failover automatically when required.
When the current master fails, manually promoting a slave to the new master can be a very tedious and error prone task, as all the remaining slave have to point out to the new master and the new master needs to catch up with all the slaves to make sure that no transactions is lost.
Fortunately, the mysqlfailover utility is capable of executing this full process automatically and in a optimized way.
Let's assume that a replication topology with one master (server1:3311) and four slaves (server2:3312, server3:3313, server4:3314, server:3315) was previously setup.
Start the mysqlfailover utility (in console mode - default):
mysqlfailover --master=root@server1:3311 \
--log=log.txt --rpl-user=rpl:rplNOTE: Log file 'log.txt' does not exist. Will be created. # Checking privileges. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Fri Jul 26 10:17:52 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000001 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
Now imagine that the master crashed or is no longer reachable, then after a predefined time interval (by default 15 seconds) we can observe that the failover process will start automatically:
Failover starting in 'auto' mode... # Candidate slave server2:3312 will become the new master. # Checking slaves status (before failover). # Preparing candidate for failover. # Creating replication user if it does not exist. # Stopping slaves. # Performing STOP on all slaves. # Switching slaves to new master. # Disconnecting new master as slave. # Starting slaves. # Performing START on all slaves. # Checking slaves for errors. # Failover complete. Failover console will restart in 5 seconds. [...] MySQL Replication Failover Utility Failover Mode = auto Next Interval = Fri Jul 26 10:25:17 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000001 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server2 | 3312 | MASTER | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
The above example illustrates how to start the mysqlfailover utility to check the health of the replication topology and shows the displayed output when failover occurs.
Basically, we simply need to specify the master's connection
option, the list of slaves with the
--slaves option and the
replication user (login and password) using the --rpl-user
options. In alternative to the
--slaves options, one
can use the
specifying a user and password (or login-path) to connect to
the slaves and the utility will attempt to discover all the
slaves connected with the master using the specified login.
For the above example, '--discover-slaves-login=root' could be
can be very handy especially if there is a huge number of
slaves in the topology, but bear in mind that the explicit
specification of slaves is safer and that discovery can fail
to find some servers. In particular, it is important to note
that in order for slaves to be discovered, they must be
started with the '--report-host' and '--report-port' options
with a appropriate values and they must be correctly connected
to the master (I/O thread running) otherwise discovery will
It is also recommended to use the
--log options to specify
a file to register all events, warning and errors. This is
useful to keep a record of what happened, for example to later
determine when failover occurred and if the process occurred
without any errors.
An important matter to discuss is the order in which the
server are select as candidates for failover. No distinction
is made in terms of the number of transaction to select the
most up-to-date slave to become the new master. The reason is
very simple, this criteria is non-deterministic as many
circumstances (i.e., network load, server maintenance
operations) can temporarily influence the performance of a
slave and could lead to an incorrect selection of the most
appropriate candidate. For example, the slave with the best
hardware should be in the long run the most appropriate
candidate to become the new master, but for some unanticipated
reason it might actually had less transactions than other
servers when the master crashed. Therefore, a more
deterministic criteria based on the order in which the servers
are specified is used, allowing the user to control the order
in which the candidates are selected. The first server that
will meet the required election criteria, consisting on simple
sanity checks (server reachable and running with the required
options: GTID ON and binary logging enabled), will be chosen.
More specifically, the section of the new master will follow
this order: first, sequentially check the list of servers
specified by the
then the servers listed in the
--slaves option, finally
check any discovered slave in an unordered way.
In the above example the mysqlfailover
utility was started in the default console mode, but it can
also be executed as a daemon. For that purpose, the
--daemon option needs to
be used, more specifically simply add '--daemon=start' to the
command line. When mysqlfailover is
executed as a daemon, no output is displayed and all the
information is logged to file specified for the
--log option which is
mandatory in this case. To stop the execution of the
mysqlfailover daemon simply invoke the
utility using the option '--daemon=stop'. No other options is
required to stop the daemon, unless a specific pidfile (to
store the process PID) was specified with the
--pidfile option to
start the daemon and in this case the same option value is
also required to stop it.
Another useful feature is the possibility to run external
script along the execution of the utility to perform
customized actions. The following options can be used to
execute different scripts at distinct moments of the
specify a script to run periodically at each predefined
interval instead of the default check (i.e., master is
reachable and alive) to detect the need to failover,
--exec-before to specify
a script to execute before starting failover,
--exec-after to execute
a script at the end of failover process,
run a script after completing the failover process (before
displaying the health report).