Beginning with MySQL 8.0.22, you can use the asynchronous connection failover mechanism to automatically establish an asynchronous (source to replica) replication connection to a new source after the existing connection from a replica to its source fails. The asynchronous connection failover mechanism can be used to keep a replica synchronized with multiple MySQL servers or groups of servers that share data. The list of potential source servers is stored on the replica, and in the event of a connection failure, a new source is selected from the list based on a weighted priority that you set.
From MySQL 8.0.23, the asynchronous connection failover mechanism also supports Group Replication topologies, by automatically monitoring changes to group membership and distinguishing between primary and secondary servers. When you add a group member to the source list and define it as part of a managed group, the asynchronous connection failover mechanism updates the source list to keep it in line with membership changes, adding and removing group members automatically as they join or leave. Only online group members that are in the majority are used for connections and obtaining status. The last remaining member of a managed group is not removed automatically if it leaves the group, so that the configuration of the managed group is kept, but you can delete a managed group manually if it is no longer needed.
To activate asynchronous connection failover for a replication
channel set SOURCE_CONNECTION_AUTO_FAILOVER=1
on the CHANGE REPLICATION SOURCE TO
statement (from MySQL 8.0.23) or CHANGE
MASTER TO
statement (before MySQL 8.0.23) for the
channel. GTID auto-positioning must be in use for the channel
(SOURCE_AUTO_POSITION = 1
|
MASTER_AUTO_POSITION = 1
). You can set this
option while the replica is running.
When the existing connection to a source fails, the replica
first retries the same connection the number of times specified
by the SOURCE_RETRY_COUNT
|
MASTER_RETRY_COUNT
option of the
CHANGE REPLICATION SOURCE TO
|
CHANGE MASTER TO
statement. The
interval between attempts is set by the
SOURCE_CONNECT_RETRY
|
MASTER_CONNECT_RETRY
option. When these
attempts are exhausted, the asynchronous connection failover
mechanism takes over. Note that the defaults for these options,
which were designed for a connection to a single source, make
the replica retry the same connection for 60 days. To ensure
that the asynchronous connection failover mechanism can be
activated promptly, set SOURCE_RETRY_COUNT
|
MASTER_RETRY_COUNT
and
SOURCE_CONNECT_RETRY
|
MASTER_CONNECT_RETRY
to minimal numbers that
just allow a few retry attempts with the same source, in case
the connection failure is caused by a transient network outage.
Suitable values are SOURCE_RETRY_COUNT=3
|
MASTER_RETRY_COUNT=3
and
SOURCE_CONNECT_RETRY=10
|
MASTER_CONNECT_RETRY=10
, which make the
replica retry the connection 3 times with 10-second intervals
between.
Also set a source list on the replica for the replication channel.
You set and manage source lists using the
asynchronous_connection_failover_add_source
and
asynchronous_connection_failover_delete_source
UDFs to add and remove single replication source servers. To add
and remove managed groups of servers, use the
asynchronous_connection_failover_add_managed
and
asynchronous_connection_failover_delete_managed
UDFs instead.
The UDFs name the relevant replication channel and specify the
host name, port number, network namespace, and weighted priority
(1-100, with 100 being the highest priority) of a MySQL instance
to add to or delete from the channel's source list. For a managed
group, you also specify the type of managed service (currently
only Group Replication is available), and the identifier of the
managed group (for Group Replication, this is the value of the
group_replication_group_name
system variable). When you add a managed group, you only need to
add one group member, and the replica automatically adds the rest
from the current group membership. When you delete a managed
group, you delete the entire group together.
In MySQL 8.0.22, the asynchronous connection failover mechanism is
activated following the failure of the replica's connection to the
source, and it issues a
START REPLICA |
SLAVE
statement to attempt to connect to a new source.
In this release, the connection fails over if the replication I/O
thread stops due to the source stopping or due to a network
failure. The connection does not fail over in any other
situations, such as when the replication threads are stopped by a
STOP REPLICA |
SLAVE
statement.
From MySQL 8.0.23, the asynchronous connection failover mechanism also fails over the connection if another available server on the source list has a higher priority (weight) setting. This function ensures that the replica stays connected to the most suitable source server at all times, and it applies to both managed groups and single (non-managed) servers. For a managed group, a source’s weight is assigned depending on whether it is a primary or a secondary server. So assuming that you set up the managed group to give a higher weight to a primary and a lower weight to a secondary, when the primary changes, the higher weight is assigned to the new primary, so the replica changes over the connection to it. The asynchronous connection failover mechanism additionally changes connection if the currently connected managed source server leaves the managed group, or is no longer in the majority in the managed group.
When failing over a connection, the source with the highest
priority (weight) setting among the alternative sources listed in
the source list for the channel is chosen for the first connection
attempt.
. The replica checks first that it can connect to the source
server, or in the case of a managed group, that the source server
has ONLINE
status in the group. If the highest
weighted source is not available, the replica tries with all the
listed sources in descending order of weight, then starts again
from the highest weighted source. If multiple sources have the
same weight, the replica orders them randomly. If the replica
needs to start working through the list again, it includes and
retries the source to which the original connection failure
occurred.
The source lists are stored in the
mysql.replication_asynchronous_connection_failover
and
mysql.replication_asynchronous_connection_failover_managed
tables, and can be viewed in the Performance Schema table
replication_asynchronous_connection_failover
.
The replica uses a monitor thread to track the membership of
managed groups and update the source list
(thread/sql/replica_monitor
). The setting for
the SOURCE_CONNECTION_AUTO_FAILOVER
option of
the CHANGE REPLICATION SOURCE TO
|
CHANGE MASTER TO
statement, and the
source list, are transferred to a clone of the replica during a
remote cloning operation.
The requirements for using the asynchronous connection failover mechanism are as follows:
GTIDs must be in use on the source and the replica (
gtid_mode=ON
), and theSOURCE_AUTO_POSITION
|MASTER_AUTO_POSITION
option of theCHANGE REPLICATION SOURCE TO
|CHANGE MASTER TO
statement must be enabled on the replica, so that GTID auto-positioning is used for the connection to the source.The same replication user account and password must exist on all the source servers in the source list for the channel. This account is used for the connection to each of the sources. You can set up different accounts for different channels.
The replication user account must be given
SELECT
permissions on the Performance Schema tables, for example, by issuingGRANT SELECT ON performance_schema.* TO '
repl_user
';The replication user account and password cannot be specified on the statement used to start replication, because they need to be available on the automatic restart for the connection to the alternative source. They must be set for the channel using the
CHANGE REPLICATION SOURCE TO
|CHANGE MASTER TO
statement on the replica, and recorded in the replication metadata repositories.