MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Automatic Asynchronous Replication Connection Failover

MySQL 8.0.22 introduces Automatic Asynchronous Replication connection failover. This feature makes MySQL asynchronous replication fault-tolerant to source connection failure, by automating the process of re-establishment of a replication connection to another available source. If there are source servers in multiple data centers, the user can also configure the replica to fail over between them, to provide Disaster Recovery from failures of an entire data center. The availability of a database can be maintained by keeping an extra copy of the data in a standby replica that receives it continuously from the source using replication. But, as always, there could be cases where the connection to the source fails and the stream of replicated data is interrupted.

In order to explain how the source failure would be handled with this feature, let’s consider the following scenario: there are two sites – New York and London, as shown in Fig 1 below, connected through asynchronous replication channel C2. The servers (S1, S2) in New York site are also connected through replication channel C1 (asynchronous or Group Replication for automatic fail-over), which means that S2 has a copy of the data on S1.

The server (R1) in London site is configured with S1 and S2’s source connection details, so when connection C2 fails (Fig 2) between S1 and R1, R1 establishes a new connection C3 between S2 and R1, as shown in Fig 3.

In the next section, we will walk you through the configuration and usage of the feature.

Usage

Prerequisites

Before enabling the asynchronous connection failover feature, all servers need to use gtid_mode=ON, and the replica that uses asynchronous connection failover must have configured MASTER_AUTO_POSITION=1 with CHANGE MASTER TO.

1. Create a replication channel

On the replica server, create a replication channel ‘ch1’ with the replication credentials of the source servers.

Note: The default values of MASTER_RETRY_COUNT (86400 times), and MASTER_CONNECT_RETRY (60 seconds), were designed for a connection to a single source, and will make the replica retry the same connection for 60 days. When these attempts are exhausted, the asynchronous connection failover mechanism takes over. So to ensure that the asynchronous connection failover mechanism can be activated promptly, set MASTER_RETRY_COUNT to a minimal number that just allows a few retry attempts with the same source, in case the connection failure is caused by a transient network outage.

2. Add the alternate source configuration details for the asynchronous replication channel

The feature is controlled on the replica. The UDF asynchronous_connection_failover_add_source() can be used to add connection configuration for an alternate source.

The arguments for this UDF have the following meanings:

  • channel/host/port/network_namespace: The replication channel connection attributes specifying how to connect to this alternate source.
  • weight: When multiple alternate sources are specified, the failover operation will try to connect to source having highest weight. If weight is not specified, it defaults to 50.

On replica, configure two sources:

The values inserted can be verified in the replication_asynchronous_connection_failover table of the performance_schema database.

3. Enable the feature

Enable the feature for the replication channel ‘ch1’ using CHANGE MASTER TO SOURCE_CONNECTION_AUTO_FAILOVER=1. This statement can be used when the replication threads are running.

4. Start replica channel

Start the replication channel ‘ch1’ on the replica server, and check its I/O thread status:

5. Stop mysqld on source1 server

To see the asynchronous connection failure mechanism working, log in to the host machine of source1 (node1 in this example), and stop the source server.

6. Verify that the replication channel has automatically failed over, that is, connected to source2

Before failing over to the next configured alternate source, the receiver thread on the replica will attempt to connect to the current source MASTER_RETRY_COUNT times (which is 2 in this case), as configured by CHANGE MASTER TO command in the first step. See the error log below.

Note: Asynchronous connection failover kicks in only when a replication connection fails. So even if an old source which has a higher weightage comes back online, the replication connection is not re-connected to the old source. Similarly in our example, even if source1 comes back online (mysqld is started) after some time, the asynchronous replication connection will stay connected to source2 till the connection fails or the user stops it.

7. Delete source configurations

If you later need to delete one or more alternate sources, use the asynchronous_connection_failover_delete_source() UDF on the replica.

Example:

MySQL Router

Please note that this feature does not fully replace the MySQL Router
when it acts as the replication source endpoint in front of a MySQL
InnoDB Cluster. Most notably, this feature has the following
limitations:

  • it does not automatically learn about new members or members
    that are removed
  • it does not follow the primary role, it stays connected to whatever
    host it was connected to
  • it does not follow the majority network partition
  • it does not care if a host is not part of the group any longer, as
    long as it can connect, it will

So, please, be aware that this feature does not replace the router
in those use cases scenarios that must observe such requirements.

Conclusion

We showed how to configure replicas to enable asynchronous connection fail over between multiple source servers. When a source or the connection to it fails, this feature allows the replica, even in different data center, to automatically connect to a new source seamlessly. Hope you will enjoy this feature, and that it provides improved fault tolerance for your cross-datacenter replication channels.

See documentation for more details on how to configure and use the feature, and see worklog for the technical specification.