Group Replication's failure detection mechanism is designed to identify group members that are no longer communicating with the group, and expel them as and when it seems likely that they have failed. Having a failure detection mechanism increases the chance that the group contains a majority of correctly working members, and that requests from clients are therefore processed correctly.
Normally, all group members regularly exchange messages with all other group members. If a group member does not receive any messages from a particular fellow member for 5 seconds, when this detection period ends, it creates a suspicion of the fellow member. When a suspicion times out, the suspected member is assumed to have failed, and is expelled from the group. An expelled member is removed from the membership list seen by the other members, but it does not know that it has been expelled from the group, so it sees itself as online and the other members as unreachable. If the member has not in fact failed (for example, because it was just disconnected due to a temporary network issue) and it is able to resume communication with the other members, it receives a view containing the information that it has been expelled from the group.
The responses of group members, including the failed member itself, to these situations can be configured at a number of points in the process. By default, the following behaviors happen if a member is suspected of having failed:
When a suspicion is created, it times out immediately (its lifetime is set to 0), so the suspected member is expelled as soon as the expired suspicion is identified. The member could potentially survive for a further few seconds after the timeout because the check for expired suspicions is carried out periodically.
If an expelled member resumes communication and realises that it was expelled, it does not try to rejoin the group and accepts its expulsion.
When an expelled member accepts its expulsion, it switches to super read only mode and awaits operator attention. (The exception is in releases from MySQL 8.0.12 to 8.0.15, where the default was for the member to shut itself down. From MySQL 8.0.16, the behavior was changed to match the behavior in MySQL 5.7.)
These defaults are set to prioritize the correct operation of the group and the correct handling of requests. However, they might be inconvenient in the case of slower networks or networks with a high rate of transient failures, because in these situations there could be a frequent requirement for operator intervention to fix expelled members. They also do not allow for continued operation of the group to be planned in the case of expected network failures or machine slowdowns. You can use the Group Replication configuration options described in this section to change these behaviors either permanently or temporarily, to suit your system's requirements and your priorities.
Members that have not failed might lose contact with part, but not all, of the replication group due to a network partition. For example, in a group of 5 servers (S1,S2,S3,S4,S5), if there is a disconnection between (S1,S2) and (S3,S4,S5) there is a network partition. The first group (S1,S2) is now in a minority because it cannot contact more than half of the group. Any transactions that are processed by the members in the minority group are blocked, because the majority of the group is unreachable, therefore the group cannot achieve quorum. For a detailed description of this scenario, see Section 18.4.4, “Network Partitioning”. In this situation, the default behavior is for the members in both the minority and the majority to remain in the group, continue to accept transactions (although they are blocked on the members in the minority), and wait for operator intervention. This behavior is also configurable.
Note that where group members are at an older MySQL Server release
that does not support a relevant setting, or at a release with a
different default, they act towards themselves and other group
members according to the default behaviors stated above. For
example, a member that does not support the
system variable expels other members as soon as an expired
suspicion is detected, and this expulsion is accepted by other
members even if they support the system variable and have a longer