To activate asynchronous connection failover for a replication
channel set SOURCE_CONNECTION_AUTO_FAILOVER=1
in a CHANGE REPLICATION SOURCE TO
statement for this channel. GTID auto-positioning must be in use
for the channel (SOURCE_AUTO_POSITION = 1
).
When the existing connection to a source fails, the replica
first retries the same connection the number of times
specified by the SOURCE_RETRY_COUNT
option
of the CHANGE REPLICATION SOURCE
TO
. The interval between attempts is set by the
SOURCE_CONNECT_RETRY
option. When these
attempts are exhausted, the asynchronous connection failover
mechanism takes over. Note that the defaults for these
options, which were designed for a connection to a single
source, make the replica retry the same connection for 60
days. To ensure that the asynchronous connection failover
mechanism can be activated promptly, set
SOURCE_RETRY_COUNT
and
SOURCE_CONNECT_RETRY
to minimal numbers
that just allow a few retry attempts with the same source, in
case the connection failure is caused by a transient network
outage. Suitable values are
SOURCE_RETRY_COUNT=3
and
SOURCE_CONNECT_RETRY=10
, which make the
replica retry the connection 3 times with 10-second intervals
between.
You also need to set the source list for the replication
channel, to specify the sources that are available for failover.
You set and manage source lists using the
asynchronous_connection_failover_add_source
and
asynchronous_connection_failover_delete_source
functions to add and remove single replication source servers.
To add and remove managed groups of servers, use the
asynchronous_connection_failover_add_managed
and
asynchronous_connection_failover_delete_managed
functions instead.
The functions name the relevant replication channel and specify
the host name, port number, network namespace, and weighted
priority (1-100, with 100 being the highest priority) of a MySQL
instance to add to or delete from the channel's source list. For
a managed group, you also specify the type of managed service
(currently only Group Replication is available), and the
identifier of the managed group (for Group Replication, this is
the value of the
group_replication_group_name
system variable). When you add a managed group, you only need to
add one group member, and the replica automatically adds the
rest from the current group membership. When you delete a
managed group, you delete the entire group together.
The asynchronous connection failover mechanism also fails over the connection if another available server on the source list has a higher priority (weight) setting. This feature ensures that the replica stays connected to the most suitable source server at all times, and it applies to both managed groups and single (non-managed) servers. For a managed group, a source’s weight is assigned depending on whether it is a primary or a secondary server. So assuming that you set up the managed group to give a higher weight to a primary and a lower weight to a secondary, when the primary changes, the higher weight is assigned to the new primary, so the replica changes over the connection to it. The asynchronous connection failover mechanism additionally changes connection if the currently connected managed source server leaves the managed group, or is no longer in the majority in the managed group.
When failing over a connection, the source with the highest
priority (weight) setting among the alternative sources listed
in the source list for the channel is chosen for the first
connection attempt.
The replica checks first that it can connect to the source
server, or in the case of a managed group, that the source
server has ONLINE
status in the group (not
RECOVERING
or unavailable). If the highest
weighted source is not available, the replica tries with all the
listed sources in descending order of weight, then starts again
from the highest weighted source. If multiple sources have the
same weight, the replica orders them randomly. If the replica
needs to start working through the list again, it includes and
retries the source to which the original connection failure
occurred.
The source lists are stored in the
mysql.replication_asynchronous_connection_failover
and
mysql.replication_asynchronous_connection_failover_managed
tables, and can be viewed in the Performance Schema
replication_asynchronous_connection_failover
and
replication_asynchronous_connection_failover_managed
tables. The replica uses a monitor thread to track the
membership of managed groups and update the source list
(thread/sql/replica_monitor
). The setting for
the
SOURCE_CONNECTION_AUTO_FAILOVER
option of the CHANGE REPLICATION SOURCE
TO
statement, and the source list, are transferred to
a clone of the replica during a remote cloning operation.