MySQL :: MySQL 5.7.6: It is easier to switch master now!

Introduction

One of the primary objectives of MySQL replication is providing an easy failover process i.e. switching to a redundant system if the primary MySQL server fails. In MySQL this translates to switching to the most appropriate slave server in the eventuality of a failure of the master server.

The promotion of a candidate slave to become a new master is based on a lot of factors, undoubtedly the most important factor being that the chosen slave should be most up to date with the primary server (the old master) before we lost it! This blog explains how to use new features in MySQL 5.7.4 and later to make failover easier.

To find the most up to date slave server, a failover script looks at the set of transactions received by the slave and compares it with every other slave to find the one that has received the biggest set of transactions. There could be more sophisticated ways of doing this, for instance you could choose one preferred slave that you want to promote (perhaps it has a better hardware configuration, it has no filters, physical location etc) and make sure it receives every transaction that has been received by all the other slaves. For simplicity though, let’s narrow down our definition of the most appropriate slave to promote to be the one that is most up to date with the lost master.

How to failover using GTID based replication

To denote a set of transactions, MySQL uses GTID sets (a set of global transaction identifiers). To read more about GTIDs, you can refer to our official documentation or developer blogs. To find the set of transactions received by a MySQL slave server, you simply execute:

mysql> SELECT RECEIVED_TRANSACTION_SET FROM peformance_schema.replication_connection_status;
+------------------------------------------+
| RECEIVED_TRANSACTION_SET                 |
+------------------------------------------+
| 4D8B564F-03F4-4975-856A-0E65C3105328:1-4 |
+------------------------------------------+

mysql> SELECT RECEIVED_TRANSACTION_SET FROM peformance_schema.replication_connection_status;

+------------------------------------------+

| RECEIVED_TRANSACTION_SET |

+------------------------------------------+

| 4D8B564F-03F4-4975-856A-0E65C3105328:1-4 |

+------------------------------------------+

Execute this on every slave and compare the sets to find the slave with largest received transaction set- that’s your candidate slave for promotion. Let us call this slave the new master. Before you switch a slave to replicate from the new master, you earlier had to make sure all the transactions the slave has received are executed. In versions of MySQL prior to 5.7.4, you needed to do the following steps:

stop slave.
start slave to replicate until all received transactions are executed.
wait until all received transactions are executed.
switch master to redirect slaves to replicate from new master.
start slave

This would translate to the following MySQL commands:

STOP SLAVE;
START SLAVE UNTIL SQL_AFTER_GTIDS= <received_transaction_set>;
SELECT WAIT_FOR_EXECUTED_GTID_SET(<received_transaction_set>);
CHANGE MASTER TO <new_master_def>;
START SLAVE;

However, in MySQL-5.7.4, we introduced a feature which allows one to selectively stop only that component of replication which requires a change. This means that in the present context, to switch to a new master we only need to stop the receiver module (or in technical terms the I/O thread). The applier threads can continue applying transactions, if there are any pending, while we switch master. Building on this infrastructure we can now reduce the above steps to the following:

Stop the receiver module (STOP SLAVE IO_THREAD).
Switch master (CHANGE MASTER TO <new_master_def>).
Start the receiver module (START SLAVE IO_THREAD).

Note the removal of the wait function (wait_for_gtid_executed_set) to ensure that the received transactions are executed before you switch master. There is no need for that step anymore!

How to failover using non-GTID based replication

If you are not using GTIDs, though we highly recommend you should, you can still take advantage of the current improvements. This means you can change the following part of your failover script:

STOP SLAVE
SHOW SLAVE STATUS to get coordinates (Read_Master_Log_Pos, Master_Log_File)
START SLAVE UNTIL coordinates saved in step (2).
SELECT MASTER_POS_WAIT (coordinates,…).
CHANGE MASTER TO <new_master_def>.
START SLAVE.

to the following simpler steps:

STOP SLAVE IO_THREAD;
CHANGE MASTER TO <new_master_def>;
START SLAVE IO_THREAD.

Conclusion

Using the improvements in the newer versions of MySQL 5.7, failover becomes easier. Our effort to improve MySQL high availability continues and we remain committed to easing the processes. Please do try this, and as always let us know your feedback. You can also use mysqlfailover or MySQL Fabric that automate the failover process.