Once your replication topology is setup, it is important to consider the possible occurrences of failures in order to maintain the high availability level of your system. Several failures independently from their cause (network connection issue, hard drive crash, cosmic lightning, etc.) can stop the replication process by making the master no longer accessible by its slaves.
In this type of situation, it is desirable to promote one of the slaves to the master while the problem with the old master is resolved. It is better to have an application to monitor the replicate topology and perform failover automatically, minimizing downtime and keeping replication running smoothly. This is where the mysqlfailover utility shines.
The goal is to start the mysqlfailover utility to monitor a replication topology and perform failover automatically when required.
When the current master fails, manually promoting a slave to the new master can be a very tedious and error prone task, as all the remaining slave have to be redirected to the new master and the new master needs to catch up with all the slaves to make sure that no transactions is lost.
Fortunately, the mysqlfailover utility is capable of executing this full process automatically and in a optimized way.
Let's assume that a replication topology with one master (server1:3311) and four slaves (server2:3312, server3:3313, server4:3314, server:3315) was previously setup.
Start the mysqlfailover utility (in console mode - default):
shell> mysqlfailover --master=root@server1:3311 \ --slaves=root@server2:3312,root@server3:3313,root@server4:3314,root@server5:3315 \ --log=log.txt --rpl-user=rpl:rpl NOTE: Log file 'log.txt' does not exist. Will be created. # Checking privileges. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Fri Jul 26 10:17:52 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000001 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server1 | 3311 | MASTER | UP | ON | OK | | server2 | 3312 | SLAVE | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
Now imagine that the master crashed or is no longer reachable, then after a predefined time interval (by default 15 seconds) we can observe that the failover process starts automatically:
Failover starting in 'auto' mode... # Candidate slave server2:3312 will become the new master. # Checking slaves status (before failover). # Preparing candidate for failover. # Creating replication user if it does not exist. # Stopping slaves. # Performing STOP on all slaves. # Switching slaves to new master. # Disconnecting new master as slave. # Starting slaves. # Performing START on all slaves. # Checking slaves for errors. # Failover complete. Failover console will restart in 5 seconds. [...] MySQL Replication Failover Utility Failover Mode = auto Next Interval = Fri Jul 26 10:25:17 2013 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB master-bin.000001 151 GTID Executed Set None Replication Health Status +----------+-------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +----------+-------+---------+--------+------------+---------+ | server2 | 3312 | MASTER | UP | ON | OK | | server3 | 3313 | SLAVE | UP | ON | OK | | server4 | 3314 | SLAVE | UP | ON | OK | | server5 | 3315 | SLAVE | UP | ON | OK | +----------+-------+---------+--------+------------+---------+ Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries
The above example illustrates how to start the mysqlfailover utility to monitor the health of the replication topology and reconfigure the topology when failover occurs.
To setup this feature, we simply need to specify the master's
connection with the
--master option, the
list of slaves with the
--slaves option and the
replication user (login and password) using the
--rpl-user option. As an
alternative to the
--slaves options, you
can use the
specifying a user and password (or login-path) to connect to
the slaves. The utility attempts to discover all of the slaves
connected with the master using the specified login and
password. For the above example,
'--discover-slaves-login=root' could be used.
can be very handy especially if there is a huge number of
slaves in the topology, but bear in mind that the explicit
specification of slaves is safer and that discovery can fail
to find some servers. In particular, it is important to note
that in order for slaves to be discovered, they must be
started with the '--report-host' and '--report-port' options
with the correct values and they must be connected to the
master (I/O thread running) otherwise discovery fails.
It is also recommended to use the
--log options to specify
a file to register all events, warning and errors. This is
useful to keep a record of what happened. For example, to
determine when failover occurred and if the process completed
without errors or warnings.
An important matter to discuss is the order in which the
servers are select as candidates for failover. No distinction
is made in terms of the number of transactions to select the
most up-to-date slave to become the new master. The reason is
very simple; this criteria is non-deterministic as many
circumstances (i.e., network load, server maintenance
operations) can temporarily influence the performance of a
slave and could lead to an incorrect selection of the most
appropriate candidate. For example, the slave with the best
hardware should be in the long run the most appropriate
candidate to become the new master, but for some unanticipated
reason it might actually have fewer transactions than other
servers when the master crashed. Therefore, a more
deterministic criteria based on the order in which the servers
are specified is used, allowing the user to control the order
in which the candidates are selected. The first server that
meets the required election criteria, consisting on simple
sanity checks (server reachable and running with the required
options: GTID ON and binary logging enabled), is chosen. More
specifically, the selection of the new master follows this
order: first, sequentially check the list of servers specified
option, then the servers listed in the
option, and finally check any discovered slaves in an
In the above example the mysqlfailover
utility was started in the default console mode, but it can
also be executed as a daemon. For that purpose, the
--daemon option needs to
be used, more specifically simply add '--daemon=start' to the
command line. When mysqlfailover is
executed as a daemon, no output is displayed and all the
information is logged to file specified for the
option, which is mandatory in this case. To stop the execution
of the mysqlfailover daemon, simply invoke
the utility using the option '--daemon=stop'. No other options
is required to stop the daemon unless a specific pidfile
(which contains the process PID) was specified with the
option to start the daemon and in this case the same option
value is also required to stop it.
Another useful feature is the possibility to run external
scripts along the execution of the utility to perform
customized actions. The following options can be used to
execute different scripts at distinct moments of the
specify a script to run periodically at each predefined
interval instead of the default check (i.e., master is
reachable and alive) to detect the need to failover,
to specify a script to execute before starting failover,
to execute a script at the end of failover process,
run a script after completing the failover process (before
displaying the health report).