Unlike InnoDB Cluster, which supports automatic failover in the
event of an unexpected failure of the primary, InnoDB ReplicaSet
does not have automatic failure detection or a consensus-based
protocol such as that provided by Group Replication. If the
primary is not available, a manual failover is required. An
InnoDB ReplicaSet which has lost its primary is effectively
read-only, and for any write changes to be possible a new primary
must be chosen. If you cannot connect to the primary, and you
to safely perform a switchover to a new primary as described at
Section 9.6, “Changing the Primary Instance”, use the
operation to perform a forced failover of the primary. This is a
last resort operation that must only be used in a disaster type
scenario where the current primary is unavailable and cannot be
restored in any way.
A forced failover is a potentially destructive action and must be used with caution.
If a target instance is not reachable (or is null), the most
up-to-date instance is automatically selected and promoted to be
the new primary. If a target instance is reachable, it is promoted
to be the new primary. Other reachable secondary instances
replicate from this new primary. The target instance must have the
GTID_EXECUTED set among
reachable instances, otherwise the operation fails.
A failover is different from a planned primary change because it promotes a secondary instance without synchronizing with or updating the old primary. That has the following major consequences:
Any transactions that had not yet been applied by a secondary at the time the old primary failed are lost.
If the old primary is still running and processing transactions, there is a split-brain, and the datasets of the old and new primaries diverge.
If the last known primary is still reachable, the
operation fails, to reduce the risk of split-brain situations. But
it is the administrator's responsibility to ensure that the old
primary is not reachable by the other instances to prevent or
minimize such scenarios.
After a forced failover, the old primary is considered invalid by the new primary and can no longer be part of the ReplicaSet. If you later find an instance that can be recovered, you must remove it from the ReplicaSet and add it as a new instance. A secondary instance is considered invalid if it cannot be switched to the new primary during the failover.
Data loss is possible after a failover because the old primary might have had transactions that were not yet replicated to the secondary being promoted. Moreover, if the instance that was presumed to have failed can still process transactions, for example because the network where it is located is still functioning but unreachable from MySQL Shell, it continues diverging from the promoted instances. Recovering once transaction sets on instances have diverged requires manual intervention and could not be possible in some situations, even if the failed instances can be recovered. Often, the fastest and simplest way to recover from a disaster that required a forced failover is by discarding such diverged transactions and re-provisioning a new instance from the newly promoted primary.