When a joining member connects to an online existing member for
state transfer during distributed recovery, the joining member
acts as a client on the connection and the existing member acts
as a server. When state transfer from the donor's binary log is
in progress over this connection (using the asynchronous
replication channel
group_replication_recovery
), the joining
member acts as the replica and the existing member acts as the
source. When a remote cloning operation is in progress over this
connection, the joining member acts as a recipient and the
existing member acts as a donor. Configuration settings that
apply to those roles outside the Group Replication context can
apply for Group Replication also, unless they are overridden by
a Group Replication-specific configuration setting or behavior.
The connection that an existing member offers to a joining member for distributed recovery is not the same connection that is used by Group Replication for communication between online members of the group.
The connection used by the group communication engine for Group Replication (XCom, a Paxos variant) for TCP communication between remote XCom instances is specified by the
group_replication_local_address
system variable. This connection is used for TCP/IP messages between online members. Communication with the local instance is over an input channel using shared memory.For distributed recovery, by default, group members offer their standard SQL client connection to joining members, as specified by
hostname
andport
. If an alternative port number is specified byreport_port
, that one is used instead.Group members may instead advertise an alternative list of distributed recovery endpoints as dedicated client connections for joining members, allowing you to control distributed recovery traffic separately from connections by regular client users of the member. A member transmits the list of distributed recovery endpoints specified by
group_replication_advertise_recovery_endpoints
to the group when it joins. By default, the member continues to offer the standard SQL client connection as in earlier releases.
Distributed recovery can fail if a joining member cannot
correctly identify the other members using the host name as
defined by MySQL Server's
hostname
system variable. It
is recommended that operating systems running MySQL have a
properly configured unique host name, either using DNS or
local settings. The host name that the server is using for SQL
client connections can be verified in the
Member_host
column of the Performance
Schema table
replication_group_members
. If
multiple group members externalize a default host name set by
the operating system, there is a chance of the joining member
not resolving it to the correct member address and not being
able to connect for distributed recovery. In this situation
you can use MySQL Server's
report_host
system variable
to configure a unique host name to be externalized by each of
the servers.
The steps for a joining member to establish a connection for distributed recovery are as follows:
When the member joins the group, it connects with one of the seed members included in the list in its
group_replication_group_seeds
system variable, initially using thegroup_replication_local_address
connection as specified in that list. The seed members might be a subset of the group.Over this connection, the seed member uses Group Replication's membership service to provide the joining member with a list of all the members that are online in the group, in the form of a view. The membership information includes the details of the distributed recovery endpoints or standard SQL client connection offered by each member for distributed recovery.
The joining member selects a suitable group member from this list to be its donor for distributed recovery, following the behaviors described in Section 20.5.4.4, “Fault Tolerance for Distributed Recovery”.
The joining member then attempts to connect to the donor using the donor's advertised distributed recovery endpoints, trying each in turn in the order they are specified in the list. If the donor provides no endpoints, the joining member attempts to connect using the donor's standard SQL client connection. The SSL requirements for the connection are as specified by the
group_replication_recovery_ssl_*
options described in Section 20.5.4.1.4, “SSL and Authentication for Distributed Recovery”.If the joining member is not able to connect to the selected donor, it retries with other suitable donors, following the behaviors described in Section 20.5.4.4, “Fault Tolerance for Distributed Recovery”. Note that if the joining member exhausts the list of advertised endpoints without making a connection, it does not fall back to the donor's standard SQL client connection, but switches to another donor.
When the joining member establishes a distributed recovery connection with a donor, it uses that connection for state transfer as described in Section 20.5.4, “Distributed Recovery”. The host and port for the connection that is used are shown in the joining member's log. Note that if a remote cloning operation is used, when the joining member has restarted at the end of the operation, it establishes a connection with a new donor for state transfer from the binary log. This might be a connection to a different member from the original donor used for the remote cloning operation, or it might be a different connection to the original donor. In any case, the distributed recovery process continues in the same way as it would have with the original donor.
IP addresses supplied by the
group_replication_advertise_recovery_endpoints
system variable as distributed recovery endpoints do not have
to be configured for MySQL Server (that is, they do not have
to be specified by the
admin_address
system variable
or in the list for the
bind_address
system
variable). They do have to be assigned to the server. Any host
names used must resolve to a local IP address. IPv4 and IPv6
addresses can be used.
The ports supplied for the distributed recovery endpoints do
have to be configured for MySQL Server, so they must be
specified by the port
,
report_port
, or
admin_port
system variable.
The server must listen for TCP/IP connections on these ports.
If you specify the
admin_port
, the replication
user for distributed recovery needs the
SERVICE_CONNECTION_ADMIN
privilege to connect. Selecting the
admin_port
keeps distributed
recovery connections separate from regular MySQL client
connections.
Joining members try each of the endpoints in turn in the order
they are specified on the list. If
group_replication_advertise_recovery_endpoints
is set to DEFAULT
rather than a list of
endpoints, the standard SQL client connection is offered. Note
that the standard SQL client connection is not automatically
included on a list of distributed recovery endpoints, and is
not offered as a fallback if the donor's list of endpoints is
exhausted without a connection. If you want to offer the
standard SQL client connection as one of a number of
distributed recovery endpoints, you must include it explicitly
in the list specified by
group_replication_advertise_recovery_endpoints
.
You can put it in the last place so that it acts as a last
resort for connection.
A group member's distributed recovery endpoints (or standard
SQL client connection if endpoints are not provided) do not
need to be added to the Group Replication allowlist specified
by the
group_replication_ip_allowlist
system variable. The allowlist is only for the address
specified by
group_replication_local_address
for each member. A joining member must have its initial
connection to the group permitted by the allowlist in order to
retrieve the address or addresses for distributed recovery.
The distributed recovery endpoints that you list are validated
when the system variable is set and when a
START GROUP_REPLICATION
statement has been issued. If the list cannot be parsed
correctly, or if any of the endpoints cannot be accessed on
the host because the server is not listening on them, Group
Replication logs an error and does not start.
You can optionally configure compression for distributed
recovery by the method of state transfer from a donor's
binary log. Compression can benefit distributed recovery where
network bandwidth is limited and the donor has to transfer
many transactions to the joining member. The
group_replication_recovery_compression_algorithms
and
group_replication_recovery_zstd_compression_level
system variables determine permitted compression algorithms,
and the zstd
compression level used when
carrying out state transfer from a donor's binary log.
For more information, see
Section 6.2.8, “Connection Compression Control”.
These compression settings do not apply to remote cloning
operations. When a remote cloning operation is used for
distributed recovery, the clone plugin's setting for
clone_enable_compression
applies.
Distributed recovery requires a replication user that has the correct permissions so that Group Replication can establish direct member-to-member replication channels. The replication user must also have the correct permissions to act as the clone user on the donor for a remote cloning operation. The same replication user must be used for distributed recovery on every group member. For instructions to set up this replication user, see Section 20.2.1.3, “User Credentials For Distributed Recovery”. For instructions to secure the replication user credentials, see Section 20.6.3.1, “Secure User Credentials for Distributed Recovery”.
SSL for distributed recovery is configured separately from SSL
for normal group communications, which is determined by the
server's SSL settings and the
group_replication_ssl_mode
system variable. For distributed recovery connections,
dedicated Group Replication distributed recovery SSL system
variables are available to configure the use of certificates
and ciphers specifically for distributed recovery.
By default, SSL is not used for distributed recovery
connections. To activate it, set
group_replication_recovery_use_ssl=ON
,
and configure the Group Replication distributed recovery SSL
system variables as described in
Section 20.6.3, “Securing Distributed Recovery Connections”.
You need a replication user that is set up to use SSL.
When distributed recovery is configured to use SSL, Group
Replication applies this setting for remote cloning
operations, as well as for state transfer from a donor's
binary log. Group Replication automatically configures the
settings for the clone SSL options
(clone_ssl_ca
,
clone_ssl_cert
, and
clone_ssl_key
) to match your
settings for the corresponding Group Replication distributed
recovery options
(group_replication_recovery_ssl_ca
,
group_replication_recovery_ssl_cert
,
and
group_replication_recovery_ssl_key
).
If you are not using SSL for distributed recovery (so
group_replication_recovery_use_ssl
is set to OFF
), and the replication user
account for Group Replication authenticates with the
caching_sha2_password
plugin (the default)
or the sha256_password
plugin (deprecated),
RSA key pairs are used for password exchange. In this case,
either use the
group_replication_recovery_public_key_path
system variable to specify the RSA public key file, or use the
group_replication_recovery_get_public_key
system variable to request the public key from the source, as
described in
Section 20.6.3.1.1, “Replication User With The Caching SHA-2 Authentication Plugin”.