Documentation Home
MySQL Utilities 1.5 Manual
Related Documentation Download this Manual
PDF (US Ltr) - 1.5Mb
PDF (A4) - 1.5Mb
EPUB - 366.9Kb
HTML Download (TGZ) - 288.4Kb
HTML Download (Zip) - 300.4Kb


3.4.3 Setup Automatic Failover

Once your replication topology is setup, it is important to consider the possible occurrences of failures in order to maintain the high availability level of your system. Several failures independently from their cause (network connection issue, hard drive crash, cosmic lightning, etc.) can stop the replication process by making the master no longer accessible by its slaves.

In this type of situation, it is desirable to promote one of the slaves to master while the problem with the old master is being solve as it can take some considerable time. It will be even better to have an application to monitor the replicate topology and automatically perform failover, minimizing downtime and keeping replication running.

Objectives

The goal is to start the mysqlfailover utility to monitor a replication topology and perform failover automatically when required.

When the current master fails, manually promoting a slave to the new master can be a very tedious and error prone task, as all the remaining slave have to point out to the new master and the new master needs to catch up with all the slaves to make sure that no transactions is lost.

Fortunately, the mysqlfailover utility is capable of executing this full process automatically and in a optimized way.

Let's assume that a replication topology with one master (server1:3311) and four slaves (server2:3312, server3:3313, server4:3314, server:3315) was previously setup.

Example Execution

Start the mysqlfailover utility (in console mode - default):

shell> mysqlfailover --master=root@server1:3311 \
--slaves=root@server2:3312,root@server3:3313,root@server4:3314,root@server5:3315 \
--log=log.txt --rpl-user=rpl:rpl
NOTE: Log file 'log.txt' does not exist. Will be created.
# Checking privileges.

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jul 26 10:17:52 2013

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB  
master-bin.000001  151                                       

GTID Executed Set
None

Replication Health Status
+----------+-------+---------+--------+------------+---------+
| host     | port  | role    | state  | gtid_mode  | health  |
+----------+-------+---------+--------+------------+---------+
| server1  | 3311  | MASTER  | UP     | ON         | OK      |
| server2  | 3312  | SLAVE   | UP     | ON         | OK      |
| server3  | 3313  | SLAVE   | UP     | ON         | OK      |
| server4  | 3314  | SLAVE   | UP     | ON         | OK      |
| server5  | 3315  | SLAVE   | UP     | ON         | OK      |
+----------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries        

Now imagine that the master crashed or is no longer reachable, then after a predefined time interval (by default 15 seconds) we can observe that the failover process will start automatically:

Failover starting in 'auto' mode...
# Candidate slave server2:3312 will become the new master.
# Checking slaves status (before failover).
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Disconnecting new master as slave.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.

Failover console will restart in 5 seconds.

[...]

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jul 26 10:25:17 2013

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB  
master-bin.000001  151                                       

GTID Executed Set
None

Replication Health Status
+----------+-------+---------+--------+------------+---------+
| host     | port  | role    | state  | gtid_mode  | health  |
+----------+-------+---------+--------+------------+---------+
| server2  | 3312  | MASTER  | UP     | ON         | OK      |
| server3  | 3313  | SLAVE   | UP     | ON         | OK      |
| server4  | 3314  | SLAVE   | UP     | ON         | OK      |
| server5  | 3315  | SLAVE   | UP     | ON         | OK      |
+----------+-------+---------+--------+------------+---------+


Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries        

Discussion

The above example illustrates how to start the mysqlfailover utility to check the health of the replication topology and shows the displayed output when failover occurs.

Basically, we simply need to specify the master's connection with the --master option, the list of slaves with the --slaves option and the replication user (login and password) using the --rpl-user options. In alternative to the --slaves options, one can use the --discover-slaves-login specifying a user and password (or login-path) to connect to the slaves and the utility will attempt to discover all the slaves connected with the master using the specified login. For the above example, '--discover-slaves-login=root' could be used.

The --discover-slaves-login can be very handy especially if there is a huge number of slaves in the topology, but bear in mind that the explicit specification of slaves is safer and that discovery can fail to find some servers. In particular, it is important to note that in order for slaves to be discovered, they must be started with the '--report-host' and '--report-port' options with a appropriate values and they must be correctly connected to the master (I/O thread running) otherwise discovery will fail.

It is also recommended to use the --log options to specify a file to register all events, warning and errors. This is useful to keep a record of what happened, for example to later determine when failover occurred and if the process occurred without any errors.

An important matter to discuss is the order in which the server are select as candidates for failover. No distinction is made in terms of the number of transaction to select the most up-to-date slave to become the new master. The reason is very simple, this criteria is non-deterministic as many circumstances (i.e., network load, server maintenance operations) can temporarily influence the performance of a slave and could lead to an incorrect selection of the most appropriate candidate. For example, the slave with the best hardware should be in the long run the most appropriate candidate to become the new master, but for some unanticipated reason it might actually had less transactions than other servers when the master crashed. Therefore, a more deterministic criteria based on the order in which the servers are specified is used, allowing the user to control the order in which the candidates are selected. The first server that will meet the required election criteria, consisting on simple sanity checks (server reachable and running with the required options: GTID ON and binary logging enabled), will be chosen. More specifically, the section of the new master will follow this order: first, sequentially check the list of servers specified by the --candidates option, then the servers listed in the --slaves option, finally check any discovered slave in an unordered way.

Permissions Required

The user must have permissions to configure replication.

Tips and Tricks

In the above example the mysqlfailover utility was started in the default console mode, but it can also be executed as a daemon. For that purpose, the --daemon option needs to be used, more specifically simply add '--daemon=start' to the command line. When mysqlfailover is executed as a daemon, no output is displayed and all the information is logged to file specified for the --log option which is mandatory in this case. To stop the execution of the mysqlfailover daemon simply invoke the utility using the option '--daemon=stop'. No other options is required to stop the daemon, unless a specific pidfile (to store the process PID) was specified with the --pidfile option to start the daemon and in this case the same option value is also required to stop it.

Another useful feature is the possibility to run external script along the execution of the utility to perform customized actions. The following options can be used to execute different scripts at distinct moments of the mysqlfailover execution: --exec-fail-check to specify a script to run periodically at each predefined interval instead of the default check (i.e., master is reachable and alive) to detect the need to failover, --exec-before to specify a script to execute before starting failover, --exec-after to execute a script at the end of failover process, --exec-post-failover to run a script after completing the failover process (before displaying the health report).


User Comments
Sign Up Login You must be logged in to post a comment.