MySQL Shell 8.4  /  ...  /  Fencing Clusters in an InnoDB ClusterSet

8.10.1 Fencing Clusters in an InnoDB ClusterSet

Following an emergency failover, and there is a risk of the transaction sets differing between parts of the ClusterSet, you have to fence the cluster either from write traffic or all traffic.

If a network partition happens, then there is the possibility of a split-brain situation, where instances lose synchronization and cannot communicate correctly to define the synchronization state. A split-brain can occur in situations such as when a DBA decides to forcibly elect a replica cluster to become the primary cluster creating more than one master, leading to the split-brain situation.

In this situation, a DBA can choose to fence the original primary cluster from:

  • Writes.

  • All traffic.

Three fencing operations are available:

  • <Cluster>.fenceWrites(): Stops write traffic to a primary cluster of a ClusterSet. Replica clusters do not accept writes, so this operation has no effect on them.

    It is possible to use on INVALIDATED Replica clusters. Also, if run against a Replica cluster with super_read_only disabled, it will enable it.

  • <Cluster>.unfenceWrites(): Resumes write traffic. This operation can be run on a cluster that was previously fenced from write traffic using the <Cluster>.fenceWrites() operation.

    It is not possible to use cluster.unfenceWrites() on a Replica Cluster.

  • <Cluster>.fenceAllTraffic(): Fences a cluster, and all Read Replicas in that cluster, from all traffic. If you have fenced a cluster from all traffic using <Cluster>.fenceAllTraffic(), you have to reboot the cluster using the dba.rebootClusterFromCompleteOutage() MySQL Shell command.

    For more information on dba.rebootClusterFromCompleteOutage(), see Section 7.8.3, “Rebooting a Cluster from a Major Outage”.

fenceWrites()

Issuing .fenceWrites() on a replica cluster returns an error:

Press CTRL+C to copy
ERROR: Unable to fence Cluster from write traffic: operation not permitted on REPLICA Clusters Cluster.fenceWrites: The Cluster '<Cluster>' is a REPLICA Cluster of the ClusterSet '<ClusterSet>' (MYSQLSH 51616)

Even though you primarily use fencing on clusters belonging to a clusterset, it is also possible to fence standalone clusters using <Cluster>.fenceAllTraffic().

  1. To fence a primary cluster from write traffic, use the Cluster.fenceWrites command as follows:

    Press CTRL+C to copy
    <Cluster>.fenceWrites()

    After running the command:

    • The automatic super_read_only management is disabled on the cluster.

    • super_read_only is enabled on all the instances in the cluster.

    • All applications are blocked from performing writes on the cluster.

    Press CTRL+C to copy
    cluster.fenceWrites() The Cluster 'primary' will be fenced from write traffic * Disabling automatic super_read_only management on the Cluster... * Enabling super_read_only on '127.0.0.1:3311'... * Enabling super_read_only on '127.0.0.1:3312'... * Enabling super_read_only on '127.0.0.1:3313'... NOTE: Applications will now be blocked from performing writes on Cluster 'primary'. Use <Cluster>.unfenceWrites() to resume writes if you are certain a split-brain is not in effect. Cluster successfully fenced from write traffic
  2. To check that you have fenced a primary cluster from write traffic, use the <Cluster>.status command as follows:

    Press CTRL+C to copy
    <Cluster>.clusterset.status()

    The output is as follows:

    Press CTRL+C to copy
    clusterset.status() { "clusters": { "primary": { "clusterErrors": [ "WARNING: Cluster is fenced from Write traffic. Use cluster.unfenceWrites() to unfence the Cluster." ], "clusterRole": "PRIMARY", "globalStatus": "OK_FENCED_WRITES", "primary": null, "status": "FENCED_WRITES", "statusText": "Cluster is fenced from Write Traffic." }, "replica": { "clusterRole": "REPLICA", "clusterSetReplicationStatus": "OK", "globalStatus": "OK" } }, "domainName": "primary", "globalPrimaryInstance": null, "primaryCluster": "primary", "status": "UNAVAILABLE", "statusText": "Primary Cluster is fenced from write traffic."
  3. To unfence a cluster and resume write traffic to a primary cluster, use the Cluster.fenceWrites command as follows:

    Press CTRL+C to copy
    <Cluster>.unfenceWrites()

    The automatic super_read_only management on the primary cluster is enabled, and the super_read_only status on the primary cluster instance.

    Press CTRL+C to copy
    cluster.unfenceWrites() The Cluster 'primary' will be unfenced from write traffic * Enabling automatic super_read_only management on the Cluster... * Disabling super_read_only on the primary '127.0.0.1:3311'... Cluster successfully unfenced from write traffic
  4. To fence a cluster from all traffic, use the Cluster.fenceAllTraffic command as follows:

    Press CTRL+C to copy
    <Cluster>.fenceAllTraffic()

    The super_read_only status is enabled on the primary instance of the cluster instance. Before enabling offline_mode on all the instances in the cluster:

    Press CTRL+C to copy
    cluster.fenceAllTraffic() The Cluster 'primary' will be fenced from all traffic * Enabling super_read_only on the primary '127.0.0.1:3311'... * Enabling offline_mode on the primary '127.0.0.1:3311'... * Enabling offline_mode on '127.0.0.1:3312'... * Stopping Group Replication on '127.0.0.1:3312'... * Enabling offline_mode on '127.0.0.1:3313'... * Stopping Group Replication on '127.0.0.1:3313'... * Stopping Group Replication on the primary '127.0.0.1:3311'... Cluster successfully fenced from all traffic
  5. To unfence a cluster from all traffic, use the dba.rebootClusterFromCompleteOutage() MySQL Shell command. When you have restored the cluster, you rejoin the instances to the cluster by selecting Y when asked if you want to rejoin the instance to the cluster:

    Press CTRL+C to copy
    cluster = dba.rebootClusterFromCompleteOutage() Restoring the cluster 'primary' from complete outage... The instance '127.0.0.1:3312' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: Y The instance '127.0.0.1:3313' was part of the cluster configuration. Would you like to rejoin it to the cluster? [y/N]: Y * Waiting for seed instance to become ONLINE... 127.0.0.1:3311 was restored. Rejoining '127.0.0.1:3312' to the cluster. Rejoining instance '127.0.0.1:3312' to cluster 'primary'... The instance '127.0.0.1:3312' was successfully rejoined to the cluster. Rejoining '127.0.0.1:3313' to the cluster. Rejoining instance '127.0.0.1:3313' to cluster 'primary'... The instance '127.0.0.1:3313' was successfully rejoined to the cluster. The cluster was successfully rebooted. <Cluster:primary>