MySQL Shell 8.0  /  ...  /  Fencing Clusters in an InnoDB ClusterSet

8.9.1 Fencing Clusters in an InnoDB ClusterSet

Following an emergency failover, and there is a risk of the transaction sets differing between parts of the ClusterSet, you have to fence the cluster either from write traffic or all traffic.

If a network partition happens, then there is the possibility of a split-brain situation, where instances lose synchronization and cannot communicate correctly to define the synchronization state. A split-brain can occur in situations such as when a DBA decides to forcibly elect a replica cluster to become the primary cluster creating more than one master, leading to the split-brain situation.

In this situation, a DBA can choose to fence the original primary cluster from:

  • Writes.

  • All traffic.

Three fencing operations are available:

  • <Cluster>.fenceWrites(): Stops write traffic to a primary cluster of a ClusterSet. Replica clusters do not accept writes, so this operation has no effect on them.

    As of 8.0.31, it is possible to use on INVALIDATED Replica clusters. Also, if run against a Replica cluster with super_read_only disabled, it will enable it.

  • <Cluster>.unfenceWrites(): Resumes write traffic. This operation can be run on a cluster that was previously fenced from write traffic using the <Cluster>.fenceWrites() operation.

    It is not possible to use cluster.unfenceWrites() on a Replica Cluster.

  • <Cluster>.fenceAllTraffic(): Fences a cluster from all traffic. If you have fenced a cluster from all traffic using <Cluster>.fenceAllTraffic(), you have to reboot the cluster using the dba.rebootClusterFromCompleteOutage() MySQL Shell command.

    For more information on dba.rebootClusterFromCompleteOutage(), see Section 7.8.3, “Rebooting a Cluster from a Major Outage”.

fenceWrites()

Issuing .fenceWrites() on a replica cluster returns an error:

ERROR: Unable to fence Cluster from write traffic: 
operation not permitted on REPLICA Clusters
Cluster.fenceWrites: The Cluster '<Cluster>' is a REPLICA Cluster 
of the ClusterSet '<ClusterSet>' (MYSQLSH 51616)

Even though you primarily use fencing on clusters belonging to a clusterset, it is also possible to fence standalone clusters using <Cluster>.fenceAllTraffic().

  1. To fence a primary cluster from write traffic, use the Cluster.fenceWrites command as follows:

            <Cluster>.fenceWrites()

    After running the command:

    • The automatic super_read_only management is disabled on the cluster.

    • super_read_only is enabled on all the instances in the cluster.

    • All applications are blocked from performing writes on the cluster.

    cluster.fenceWrites()
        The Cluster 'primary' will be fenced from write traffic
    
    	  * Disabling automatic super_read_only management on the Cluster...
    	  * Enabling super_read_only on '127.0.0.1:3311'...
    	  * Enabling super_read_only on '127.0.0.1:3312'...
    	  * Enabling super_read_only on '127.0.0.1:3313'...
    
    	  NOTE: Applications will now be blocked from performing writes on Cluster 'primary'. 
        Use <Cluster>.unfenceWrites() to resume writes if you are certain a split-brain is not in effect.
    
    	  Cluster successfully fenced from write traffic
  2. To check that you have fenced a primary cluster from write traffic, use the <Cluster>.status command as follows:

          <Cluster>.clusterset.status()

    The output is as follows:

    clusterset.status()
            {
            "clusters": {
            "primary": {
            "clusterErrors": [
            "WARNING: Cluster is fenced from Write traffic. 
             Use cluster.unfenceWrites() to unfence the Cluster."
            ],
            "clusterRole": "PRIMARY",
            "globalStatus": "OK_FENCED_WRITES",
            "primary": null,
            "status": "FENCED_WRITES",
            "statusText": "Cluster is fenced from Write Traffic."
            },
            "replica": {
            "clusterRole": "REPLICA",
            "clusterSetReplicationStatus": "OK",
            "globalStatus": "OK"
            }
            },
            "domainName": "primary",
            "globalPrimaryInstance": null,
            "primaryCluster": "primary",
            "status": "UNAVAILABLE",
            "statusText": "Primary Cluster is fenced from write traffic."
  3. To unfence a cluster and resume write traffic to a primary cluster, use the Cluster.fenceWrites command as follows:

            <Cluster>.unfenceWrites()

    The automatic super_read_only management on the primary cluster is enabled, and the super_read_only status on the primary cluster instance.

            cluster.unfenceWrites()
            The Cluster 'primary' will be unfenced from write traffic
    
            * Enabling automatic super_read_only management on the Cluster...
            * Disabling super_read_only on the primary '127.0.0.1:3311'...
    
            Cluster successfully unfenced from write traffic
  4. To fence a cluster from all traffic, use the Cluster.fenceAllTraffic command as follows:

          <Cluster>.fenceAllTraffic()

    The super_read_only status is enabled on the primary instance of the cluster instance. Before enabling offline_mode on all the instances in the cluster:

          cluster.fenceAllTraffic()
            The Cluster 'primary' will be fenced from all traffic
    
            * Enabling super_read_only on the primary '127.0.0.1:3311'...
            * Enabling offline_mode on the primary '127.0.0.1:3311'...
            * Enabling offline_mode on '127.0.0.1:3312'...
            * Stopping Group Replication on '127.0.0.1:3312'...
            * Enabling offline_mode on '127.0.0.1:3313'...
            * Stopping Group Replication on '127.0.0.1:3313'...
            * Stopping Group Replication on the primary '127.0.0.1:3311'...
    
            Cluster successfully fenced from all traffic
  5. To unfence a cluster from all traffic, use the dba.rebootClusterFromCompleteOutage() MySQL Shell command. When you have restored the cluster, you rejoin the instances to the cluster by selecting Y when asked if you want to rejoin the instance to the cluster:

    cluster = dba.rebootClusterFromCompleteOutage()
    		Restoring the cluster 'primary' from complete outage...
    
    		The instance '127.0.0.1:3312' was part of the cluster configuration.
    		Would you like to rejoin it to the cluster? [y/N]: Y
    
    		The instance '127.0.0.1:3313' was part of the cluster configuration.
    		Would you like to rejoin it to the cluster? [y/N]: Y
    
    		* Waiting for seed instance to become ONLINE...
    		127.0.0.1:3311 was restored.
    		Rejoining '127.0.0.1:3312' to the cluster.
    		Rejoining instance '127.0.0.1:3312' to cluster 'primary'...
    
    		The instance '127.0.0.1:3312' was successfully rejoined to the cluster.
    
    		Rejoining '127.0.0.1:3313' to the cluster.
    		Rejoining instance '127.0.0.1:3313' to cluster 'primary'...
    
    		The instance '127.0.0.1:3313' was successfully rejoined to the cluster.
    
    		The cluster was successfully rebooted.
    
    		<Cluster:primary>