MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Asynchronous Replication Connection Failover - Automatic source list

In our last blog post about the Automatic Asynchronous Replication Connection Failover feature which was released in MySQL 8.0.22, we showed you its usage and benefits. We now present here additional changes we added in MySQL 8.0.23 to further enhance this feature by making the receiver aware of the source’s group membership changes.

The MySQL 8.0.23 introduced these important enhancement to the feature:

  • This feature keeps the receiver’s source list in sync with the source membership and quorum changes if the source is a group member (Group Replication enabled). Earlier the potential source list needs to be maintained by user/DBA, but in MySQL 8.0.23 this has been automated for Group Replication members, so that
    • new group members or potential sources are automatically added to the source list,
    • failed group members or sources are automatically removed from the source list,
    • asynchronous replication channel to a source who is no longer in the group majority (lost quorum) will be stopped and re-connected to another source.
  • It also introduces another important change where the receiver will always stay connected to the source having the highest failover weight through the asynchronous replication channel, even when the existing asynchronous replication connection isn’t failed/disconnected.

There are many other changes also added to the feature, please check the Requirements section for details.

Now, let’s see more about these enhancements in the next Usage section with examples.

Usage

To explain the feature’s enhancements, let’s consider the following scenario:

  • We have two sites – New York and London, as shown in Fig 1 below.
  • The group setup in the New York site has the following Group Replication members: S1 is Primary, S2 and S3 are Secondaries.
  • A standby replica R1 is set up in London which is connected to the New York site through asynchronous replication channel C1.

1. Create a replication channel

On the replica server R1, create a replication channel ‘C1’ with the replication credentials of the source servers.

Note: The default values of SOURCE_RETRY_COUNT (86400 times), and SOURCE_CONNECT_RETRY (60 seconds), were designed for a connection to a single source, and will make the replica retry the same connection for 60 days. When these attempts are exhausted, the asynchronous connection failover mechanism takes over. So to ensure that the asynchronous connection failover mechanism can be activated promptly, set SOURCE_RETRY_COUNT and SOURCE_CONNECT_RETRY to a minimal number that just allows a few retry attempts with the same source, in case the connection failure is caused by a transient network outage.

2. Add the alternate source configuration details for the asynchronous replication channel

The asynchronous_connection_failover_add_managed() UDF can be used to add managed source configuration details.

The arguments for this UDF have the following meanings:

  • channel/host/port: The replication channel connection attributes specifying how to connect to this alternate source.
  • network_namespace: The parameter network_namespace is reserved for future use. Please use the CHANGE REPLICATION SOURCE command to set channel network_namespace parameter.
  • managed_type: The type of managed service which needs to be provided. Currently, the only value it would accept and work upon is ‘GroupReplication’, through which group member connection details can be synced.
  • managed_name: The identifier to determine the source which needs to be managed. When the managed_type is ‘GroupReplication’, the managed_name would be group_replication_group_name system variable’s value.
  • primary_weight/secondary_weight: When the managed_type is ‘GroupReplication’, the primary_weight and secondary_weight would be assigned to failover weight of the primary and secondary of the group respectively.

The user/DBA only needs to execute asynchronous_connection_failover_add_managed, the rest of potential sources (group members) would be gathered and added automatically by the feature, which can be verified using performance_schema.replication_asynchronous_connection_failover table. Initially this table has only one entry, the rest of potential sources (group members) will be gathered and added later.

3. Enable the feature

Enable the feature for the replication channel ‘C1’ using CHANGE REPLICATION SOURCE TO SOURCE_CONNECTION_AUTO_FAILOVER=1.

4. Start replica channel

Start the replication channel ‘C1’ on the replica server, and check its I/O thread status:

5. Potential sources automatically gathered and added

After the feature is enabled, it would automatically add new sources to the source list and if required re-connects asynchronous replication. This can be verified in the performance_schema.replication_asynchronous_connection_failover table.

6. State change of source which is also a group member

When a new source is added or an existing source fails or is removed, the source list is updated on the receiver automatically without any need for user/DBA intervention.

After S1 fails, the receiver will automatically connect to the next available source with the highest failover weight which is S2 in this case.

7. Failover weight change of source

Whenever a source with a higher failover weight is added/updated then-current connected source, then asynchronous replication channel is re-connected to a source with highest failover weight, and source list on the receiver is also updated accordingly.

As shown in Fig 3 below, a new source S4 with SOURCE_UUID: 37d345fs-341d-66hf-5f54-2345e247680 is added and connected with the rest of the sources using asynchronous replication channel C2.

The new source S4 is added to the source list using asynchronous_connection_failover_add_source() UDF with the highest failover weight i.e. 90.

It can be clearly seen below in the performance_schema.replication_asynchronous_connection_failover table on the receiver R1 that S4 has the highest failover weight.

The asynchronous replication channel C1 should re-connect to S4, which can be verified through performance_schema.replication_connection_status on receiver R1.

8. Role change of source which is also a group member

When a role of a source who is also a group member changes (i.e. when Primary is switched), the new Primary is assigned a weight of the Primary_weight variable, and if it is higher than the rest of the sources for the channel, then the asynchronous replication connection gets re-connected to this new Primary. This change will be useful for User/DBA who want the receiver to always stay connected to the Primary of the group, to keep binary logs on the receiver in sync with minimum delay.

Initially performance_schema.replication_asynchronous_connection_failover table has below configuration for channel C1. The primary S1 has the highest failover weight of 80 for channel C1.

The primary for the group is changed to S2.

The asynchronous replication connection C1 is re-connected to S2 as its having the highest failover weight now, which can be seen below in the performance_schema.replication_asynchronous_connection_failover table.

9. Delete managed source configuration details

The asynchronous_connection_failover_delete_managed() UDF can be used to delete managed configuration details.

Example:

Conclusion

We showed you new enhancements added to the Asynchronous replication connection failover feature with examples. We showed you when there are state or role changes in the source’s group membership configuration, how the receiver sync those changes, and re-connect if needed, automatically without any user intervention. These enhancements would help increase MySQL system availability and also ease maintenance by automating the process of addition and removal of potential sources. Hope you will enjoy these enhancements to the feature, and that this automation would improve and ease system setup and would reduce errors, thereby providing improved fault tolerance and better system availability.

See documentation for more details on how to configure and use the feature, and see worklog for the technical specification.