Documentation Home
MySQL Utilities 1.5 Manual
Related Documentation Download this Manual
PDF (US Ltr) - 1.5Mb
PDF (A4) - 1.5Mb
EPUB - 366.6Kb
HTML Download (TGZ) - 289.8Kb
HTML Download (Zip) - 301.9Kb


MySQL Utilities 1.5 Manual  /  ...  /  How do you setup and use automatic failover?

3.3.3 How do you setup and use automatic failover?

Once your replication topology is setup, it is important to consider the possible occurrences of failures in order to maintain the high availability level of your system. Several failures independently from their cause (network connection issue, hard drive crash, cosmic lightning, etc.) can stop the replication process by making the master no longer accessible by its slaves.

In this type of situation, it is desirable to promote one of the slaves to the master while the problem with the old master is resolved. It will be even better to have an application to monitor the replicate topology and perform failover automatically, minimizing downtime and keeping replication running smoothly. This is where the mysqlfailover utility shines.

Objectives

The goal is to start the mysqlfailover utility to monitor a replication topology and perform failover automatically when required.

When the current master fails, manually promoting a slave to the new master can be a very tedious and error prone task, as all the remaining slave have to be redirected to the new master and the new master needs to catch up with all the slaves to make sure that no transactions is lost.

Fortunately, the mysqlfailover utility is capable of executing this full process automatically and in a optimized way.

Let's assume that a replication topology with one master (server1:3311) and four slaves (server2:3312, server3:3313, server4:3314, server:3315) was previously setup.

Example Execution

Start the mysqlfailover utility (in console mode - default):

shell> mysqlfailover --master=root@server1:3311 \
--slaves=root@server2:3312,root@server3:3313,root@server4:3314,root@server5:3315 \
--log=log.txt --rpl-user=rpl:rpl
NOTE: Log file 'log.txt' does not exist. Will be created.
# Checking privileges.

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jul 26 10:17:52 2013

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB
master-bin.000001  151                                     

GTID Executed Set
None

Replication Health Status
+----------+-------+---------+--------+------------+---------+
| host     | port  | role    | state  | gtid_mode  | health  |
+----------+-------+---------+--------+------------+---------+
| server1  | 3311  | MASTER  | UP     | ON         | OK      |
| server2  | 3312  | SLAVE   | UP     | ON         | OK      |
| server3  | 3313  | SLAVE   | UP     | ON         | OK      |
| server4  | 3314  | SLAVE   | UP     | ON         | OK      |
| server5  | 3315  | SLAVE   | UP     | ON         | OK      |
+----------+-------+---------+--------+------------+---------+

Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries      

Now imagine that the master crashed or is no longer reachable, then after a predefined time interval (by default 15 seconds) we can observe that the failover process will start automatically:

Failover starting in 'auto' mode...
# Candidate slave server2:3312 will become the new master.
# Checking slaves status (before failover).
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Disconnecting new master as slave.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.

Failover console will restart in 5 seconds.

[...]

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Fri Jul 26 10:25:17 2013

Master Information
------------------
Binary Log File    Position  Binlog_Do_DB  Binlog_Ignore_DB
master-bin.000001  151                                     

GTID Executed Set
None

Replication Health Status
+----------+-------+---------+--------+------------+---------+
| host     | port  | role    | state  | gtid_mode  | health  |
+----------+-------+---------+--------+------------+---------+
| server2  | 3312  | MASTER  | UP     | ON         | OK      |
| server3  | 3313  | SLAVE   | UP     | ON         | OK      |
| server4  | 3314  | SLAVE   | UP     | ON         | OK      |
| server5  | 3315  | SLAVE   | UP     | ON         | OK      |
+----------+-------+---------+--------+------------+---------+


Q-quit R-refresh H-health G-GTID Lists U-UUIDs L-log entries      

Discussion

The above example illustrates how to start the mysqlfailover utility to monitor the health of the replication topology and reconfigure the topology when failover occurs.

To setup this feature, we simply need to specify the master's connection with the --master option, the list of slaves with the --slaves option and the replication user (login and password) using the --rpl-user option. As an alternative to the --slaves options, you can use the --discover-slaves-login specifying a user and password (or login-path) to connect to the slaves. The utility will attempt to discover all of the slaves connected with the master using the specified login and password. For the above example, '--discover-slaves-login=root' could be used.

The --discover-slaves-login can be very handy especially if there is a huge number of slaves in the topology, but bear in mind that the explicit specification of slaves is safer and that discovery can fail to find some servers. In particular, it is important to note that in order for slaves to be discovered, they must be started with the '--report-host' and '--report-port' options with the correct values and they must be connected to the master (I/O thread running) otherwise discovery will fail.

It is also recommended to use the --log options to specify a file to register all events, warning and errors. This is useful to keep a record of what happened. For example, to determine when failover occurred and if the process completed without errors or warnings.

An important matter to discuss is the order in which the servers are select as candidates for failover. No distinction is made in terms of the number of transactions to select the most up-to-date slave to become the new master. The reason is very simple; this criteria is non-deterministic as many circumstances (i.e., network load, server maintenance operations) can temporarily influence the performance of a slave and could lead to an incorrect selection of the most appropriate candidate. For example, the slave with the best hardware should be in the long run the most appropriate candidate to become the new master, but for some unanticipated reason it might actually have fewer transactions than other servers when the master crashed. Therefore, a more deterministic criteria based on the order in which the servers are specified is used, allowing the user to control the order in which the candidates are selected. The first server that will meet the required election criteria, consisting on simple sanity checks (server reachable and running with the required options: GTID ON and binary logging enabled), will be chosen. More specifically, the selection of the new master will follow this order: first, sequentially check the list of servers specified by the --candidates option, then the servers listed in the --slaves option, and finally check any discovered slaves in an unordered way.

Permissions Required

The user must have permissions to configure replication.

Tips and Tricks

In the above example the mysqlfailover utility was started in the default console mode, but it can also be executed as a daemon. For that purpose, the --daemon option needs to be used, more specifically simply add '--daemon=start' to the command line. When mysqlfailover is executed as a daemon, no output is displayed and all the information is logged to file specified for the --log option, which is mandatory in this case. To stop the execution of the mysqlfailover daemon, simply invoke the utility using the option '--daemon=stop'. No other options is required to stop the daemon unless a specific pidfile (which contains the process PID) was specified with the --pidfile option to start the daemon and in this case the same option value is also required to stop it.

Another useful feature is the possibility to run external scripts along the execution of the utility to perform customized actions. The following options can be used to execute different scripts at distinct moments of the mysqlfailover execution: --exec-fail-check to specify a script to run periodically at each predefined interval instead of the default check (i.e., master is reachable and alive) to detect the need to failover, --exec-before to specify a script to execute before starting failover, --exec-after to execute a script at the end of failover process, --exec-post-failover to run a script after completing the failover process (before displaying the health report).


User Comments
Sign Up Login You must be logged in to post a comment.