This section describes how to rejoin a server instance to an InnoDB Cluster, restore an InnoDB Cluster from quorum loss or reboot it after an outage, and rescan an InnoDB Cluster after changes.
If an instance leaves the cluster, for example because it lost
connection, and for some reason it could not automatically
rejoin the cluster, it might be necessary to rejoin it to the
cluster at a later stage. To rejoin an instance to a cluster
issue
.
Cluster
.rejoinInstance(instance
)
If the instance has
super_read_only=ON
then you
might need to confirm that AdminAPI can set
super_read_only=OFF
. See
Instance Configuration in Super Read-only Mode for more
information.
In the case where an instance has not had its configuration
persisted (see Section 6.1.5, “Persisting Settings”),
upon restart the instance does not rejoin the cluster
automatically. The solution is to issue
cluster.rejoinInstance()
so that the instance
is added to the cluster again and ensure the changes are
persisted. Once the InnoDB Cluster configuration is persisted
to the instance's option file it rejoins the cluster
automatically.
If you are rejoining an instance which has changed in some way
then you might have to modify the instance to make the rejoin
process work correctly. For example, when you restore a MySQL Enterprise Backup
backup, the server_uuid
changes. Attempting to rejoin such an instance fails because
InnoDB Cluster instances are identified by the
server_uuid
variable. In such a
situation, information about the instance's old
server_uuid
must be
removed from the InnoDB Cluster metadata and then a
must be executed to add the instance to the metadata using it's
new Cluster
.rescan()server_uuid
. For example:
cluster.removeInstance("root@instanceWithOldUUID:3306", {force: true})
cluster.rescan()
In this case you must pass the force
option
to the
method because the instance is unreachable from the cluster's
perspective and we want to remove it from the InnoDB Cluster
metadata anyway.
Cluster
.removeInstance()
If an instance (or instances) fail, then a cluster can lose its quorum, which is the ability to vote in a new primary. This can happen when there is a failure of enough instances that there is no longer a majority of the instances which make up the cluster to vote on Group Replication operations. See Fault-tolerance. When a cluster loses quorum you can no longer process write transactions with the cluster, or change the cluster's topology, for example by adding, rejoining, or removing instances. However if you have an instance online which contains the InnoDB Cluster metadata, it is possible to restore a cluster with quorum. This assumes you can connect to an instance that contains the InnoDB Cluster metadata, and that instance can contact the other instances you want to use to restore the cluster.
This operation is potentially dangerous because it can create a split-brain scenario if incorrectly used and should be considered a last resort. Make absolutely sure that there are no partitions of this group that are still operating somewhere in the network, but not accessible from your location.
Connect to an instance which contains the cluster's metadata,
then use the
operation, which restores the cluster based on the metadata on
Cluster
.forceQuorumUsingPartitionOf(instance
)instance
, and then all the instances
that are ONLINE
from the point of view of the
given instance definition are added to the restored cluster.
mysql-js> cluster.forceQuorumUsingPartitionOf("icadmin@ic-1:3306")
Restoring replicaset 'default' from loss of quorum, by using the partition composed of [icadmin@ic-1:3306]
Please provide the password for 'icadmin@ic-1:3306': ******
Restoring the InnoDB cluster ...
The InnoDB cluster was successfully restored using the partition from the instance 'icadmin@ic-1:3306'.
WARNING: To avoid a split-brain scenario, ensure that all other members of the replicaset
are removed or joined back to the group that was restored.
In the event that an instance is not automatically added to the
cluster, for example if its settings were not persisted, use
to manually add the instance back to the cluster.
Cluster
.rejoinInstance()
The restored cluster might not, and does not have to, consist of all of the original instances which made up the cluster. For example, if the original cluster consisted of the following five instances:
ic-1
ic-2
ic-3
ic-4
ic-5
and the cluster experiences a split-brain scenario, with
ic-1
, ic-2
, and
ic-3
forming one partition while
ic-4
and ic-5
form another
partition. If you connect to ic-1
and issue
to restore the cluster the resulting cluster would consist of
these three instances:
Cluster
.forceQuorumUsingPartitionOf('icadmin@ic-1:3306')
ic-1
ic-2
ic-3
because ic-1
sees ic-2
and
ic-3
as ONLINE
and does
not see ic-4
and ic-5
.
If your cluster suffers from a complete outage, you can ensure
it is reconfigured correctly using
dba.rebootClusterFromCompleteOutage()
. This
operation takes the instance which MySQL Shell is currently
connected to and uses its metadata to recover the cluster. In
the event that a cluster's instances have completely stopped,
the instances must be started and only then can the cluster be
started. For example if the machine a sandbox cluster was
running on has been restarted, and the instances were at ports
3310, 3320 and 3330, issue:
mysql-js> dba.startSandboxInstance(3310)
mysql-js> dba.startSandboxInstance(3320)
mysql-js> dba.startSandboxInstance(3330)
This ensures the sandbox instances are running. In the case of a
production deployment you would have to start the instances
outside of MySQL Shell. Once the instances have started, you
need to connect to an instance with the GTID superset, which
means the instance which had applied the most transaction before
the outage. If you are unsure which instance contains the GTID
superset, connect to any instance and follow the interactive
messages from the
dba.rebootClusterFromCompleteOutage()
operation, which detects if the instance you are connected to
contains the GTID superset. Reboot the cluster by issuing:
mysql-js> var cluster = dba.rebootClusterFromCompleteOutage();
The dba.rebootClusterFromCompleteOutage()
operation then follows these steps to ensure the cluster is
correctly reconfigured:
The InnoDB Cluster metadata found on the instance which MySQL Shell is currently connected to is checked to see if it contains the GTID superset, in other words the transactions applied by the cluster. If the currently connected instance does not contain the GTID superset, the operation aborts with that information. See the subsequent paragraphs for more information.
If the instance contains the GTID superset, the cluster is recovered based on the metadata of the instance.
Assuming you are running MySQL Shell in interactive mode, a wizard is run that checks which instances of the cluster are currently reachable and asks if you want to rejoin any discovered instances to the rebooted cluster.
Similarly, in interactive mode the wizard also detects instances which are currently not reachable and asks if you would like to remove such instances from the rebooted cluster.
If you are not using MySQL Shell's interactive mode, you
can use the rejoinInstances
and
removeInstances
options to manually configure
instances which should be joined or removed during the reboot of
the cluster.
If you encounter an error such as The active session
instance isn't the most updated in comparison with the ONLINE
instances of the Cluster's metadata. then the
instance you are connected to does not have the GTID superset of
transactions applied by the cluster. In this situation, connect
MySQL Shell to the instance suggested in the error message and
issue dba.rebootClusterFromCompleteOutage()
from that instance.
To manually detect which instance has the GTID superset rather
than using the interactive wizard, check the
gtid_executed
variable on
each instance. For example issue:
mysql-sql> SHOW VARIABLES LIKE 'gtid_executed';
The instance which has applied the largest GTID set of transactions contains the GTID superset.
If this process fails, and the cluster metadata has become badly
corrupted, you might need to drop the metadata and create the
cluster again from scratch. You can drop the cluster metadata
using dba.dropMetadataSchema()
.
The dba.dropMetadataSchema()
method should
only be used as a last resort, when it is not possible to
restore the cluster. It cannot be undone.
If you are using MySQL Router with the cluster, when you drop the metadata, all current connections are dropped and new connections are forbidden. This causes a full outage.
If you make configuration changes to a cluster outside of the
AdminAPI commands, for example by changing an instance's
configuration manually to resolve configuration issues or after
the loss of an instance, you need to update the InnoDB Cluster
metadata so that it matches the current configuration of
instances. In these cases, use the
operation, which enables you to update the InnoDB Cluster
metadata either manually or using an interactive wizard. The
Cluster
.rescan()
operation can detect new active instances that are not
registered in the metadata and add them, or obsolete instances
(no longer active) still registered in the metadata, and remove
them. You can automatically update the metadata depending on the
instances found by the command, or you can specify a list of
instance addresses to either add to the metadata or remove from
the metadata. You can also update the topology mode stored in
the metadata, for example after changing from single-primary
mode to multi-primary mode outside of AdminAPI.
Cluster
.rescan()
The syntax of the command is
.
The Cluster
.rescan([options])options
dictionary supports the
following:
interactive
: boolean value used to disable or enable the wizards in the command execution. Controls whether prompts and confirmations are provided. The default value is equal to MySQL Shell wizard mode, specified byshell.options.useWizards
.-
addInstances
: list with the connection data of the new active instances to add to the metadata, or “auto” to automatically add missing instances to the metadata. The value “auto” is case-insensitive.Instances specified in the list are added to the metadata, without prompting for confirmation
In interactive mode, you are prompted to confirm the addition of newly discovered instances that are not included in the
addInstances
optionIn non-interactive mode, newly discovered instances that are not included in the
addInstances
option are reported in the output, but you are not prompted to add them
-
removeInstances
: list with the connection data of the obsolete instances to remove from the metadata, or “auto” to automatically remove obsolete instances from the metadata.Instances specified in the list are removed from the metadata, without prompting for confirmation
In interactive mode, you are prompted to confirm the removal of obsolete instances that are not included in the
removeInstances
optionIn non-interactive mode, obsolete instances that are not included in the
removeInstances
option are reported in the output but you are not prompted to remove them
-
updateTopologyMode
: boolean value used to indicate if the topology mode (single-primary or multi-primary) in the metadata should be updated (true) or not (false) to match the one being used by the cluster. By default, the metadata is not updated (false).If the value is
true
then the InnoDB Cluster metadata is compared to the current mode being used by Group Replication, and the metadata is updated if necessary. Use this option to update the metadata after making changes to the topology mode of your cluster outside of AdminAPI.If the value is
false
then InnoDB Cluster metadata about the cluster's topology mode is not updated even if it is different from the topology used by the cluster's Group Replication group-
If the option is not specified and the topology mode in the metadata is different from the topology used by the cluster's Group Replication group, then:
In interactive mode, you are prompted to confirm the update of the topology mode in the metadata
In non-interactive mode, if there is a difference between the topology used by the cluster's Group Replication group and the InnoDB Cluster metadata, it is reported and no changes are made to the metadata
When the metadata topology mode is updated to match the Group Replication mode, the auto-increment settings on all instances are updated as described at InnoDB Cluster and Auto-increment.
-
updateViewChangeUuid
: Boolean value used to indicate if a value should be generated and set for thegroup_replication_view_change_uuid
system variable on the cluster instances. This system variable supplies an alternative UUID for view change events generated by the group. For MySQL Server instances at release 8.0.27 and above, for an InnoDB Cluster that is part of an InnoDB ClusterSet, thegroup_replication_view_change_uuid
system variable is required and must be set to the same value on all member servers in the cluster. From MySQL Shell 8.0.27, an InnoDB Cluster that is created using thedba.createCluster()
command gets a value generated and set for the system variable on all the member servers. An InnoDB Cluster created before MySQL Shell 8.0.27 might not have the system variable set, but the InnoDB ClusterSet creation process checks for this and fails with a warning if it is absent.By default,
updateViewChangeUuid
is set tofalse
, and if the system variable is not found or does not match on any of the instances, a warning message is returned to let you know you must set a value for the system variable and reboot the InnoDB Cluster. If you setupdateViewChangeUuid
totrue
, the rescan operation generates and sets a value forgroup_replication_view_change_uuid
on all the member servers, following which you must reboot the cluster to implement the changes. Before MySQL Shell 8.0.29, this option is not available, and the
command automatically generates and sets the system variable value in the same way as ifCluster
.rescan()true
was set, with a cluster reboot required afterwards to implement the changes. When you have rebooted the cluster, you can retry the InnoDB ClusterSet creation process. -
upgradeCommProtocol
: boolean value used to indicate if the Group Replication communication protocol version should be upgraded (true) or not (false) to the version supported by the instance in the cluster that is at the lowest MySQL release. By default, the communication protocol version is not upgraded (false). AdminAPI operations before MySQL Shell 8.0.26 upgraded automatically where possible, but the process can cause delays in the cluster. From MySQL Shell 8.0.26, AdminAPI operations that cause a topology change return a message if the communication protocol version can be upgraded, and you can use this option to carry out the upgrade at a suitable time. It is advisable to upgrade to the highest available version of the Group Replication communication protocol to support the latest features, such as message fragmentation for large transactions. For more information, see Setting a Group's Communication Protocol Version.If the value is
true
then the Group Replication communication protocol version is upgraded to the version supported by the instance in the cluster that is at the lowest MySQL release.If the value is
false
then the Group Replication communication protocol version is not upgraded.
Following an emergency failover, and there is a risk of the transaction sets differing between parts of the ClusterSet, you have to fence the cluster either from write traffic or all traffic. Even though you primarily use fencing on clusters belonging to a clusterset, it is also possible to fence standalone clusters from all traffic.
From MySQL Shell 8.0.28, three fencing operations are available:
<Cluster>.fenceWrites()
: Stops write traffic to a primary cluster of a ClusterSet.<Cluster>.unfenceWrites()
: Resumes write traffic.<Cluster>.fenceAllTraffic()
: Fences a cluster from all traffic.
For more details, see Section 8.9.1, “Fencing Clusters in an InnoDB ClusterSet”.