Documentation Home
MySQL 9.1 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 40.3Mb
PDF (A4) - 40.4Mb
Man Pages (TGZ) - 259.3Kb
Man Pages (Zip) - 366.4Kb
Info (Gzip) - 4.0Mb
Info (Zip) - 4.0Mb


20.7.7.1 Expel Timeout

You can use the group_replication_member_expel_timeout system variable to allow additional time between the creation of a suspicion and the expulsion of the suspect member. A suspicion is created when one server does not receive messages from another server, as explained in Section 20.1.4.2, “Failure Detection”.

There is an initial 5-second detection period before a Group Replication group member creates a suspicion of another member (or of itself). A group member is then expelled when another member's suspicion of it (or its own suspicion of itself) times out. A further short period of time might elapse after that before the expelling mechanism detects and implements the expulsion. group_replication_member_expel_timeout specifies the period of time in seconds, called the expel timeout, that a group member waits between creating a suspicion, and expelling the suspected member. Suspect members are listed as UNREACHABLE during this waiting period, but are not removed from the group's membership list.

  • If a suspect member becomes active again before the suspicion times out at the end of the waiting period, the member applies all the messages that were buffered by the remaining group members in XCom's message cache and enters ONLINE state, without operator intervention. In this situation, the member is considered by the group as the same incarnation.

  • If a suspect member becomes active only after the suspicion times out and is able to resume communications, it receives a view where it is expelled and at that point realises it was expelled. You can use group_replication_autorejoin_tries to make the member try to rejoin the group automatically at this point. This feature is active by default in MySQL 9.1, and the member makes three auto-rejoin attempts. If the auto-rejoin procedure does not succeed or is not attempted, the expelled member then follows the exit action specified by group_replication_exit_state_action.

The waiting period before expelling a member only applies to members that have previously been active in the group. Non-members that were never active in the group do not get this waiting period and are removed after the initial detection period because they took too long to join.

If group_replication_member_expel_timeout is set to 0, there is no waiting period, and a suspected member is liable for expulsion immediately after the initial 5-second detection period ends. The default is 5, meaning that a suspected member is liable for expulsion 5 seconds after the 5-second detection period has expired. It is not mandatory for all members of a group to have the same setting for group_replication_member_expel_timeout, but it is recommended in order to avoid unexpected expulsions. Any member can create a suspicion of any other member, including itself, so the effective expel timeout is that of the member with the lowest setting.

Consider increasing the value of group_replication_member_expel_timeout from the default in the following scenarios:

  • The network is slow and the default 5 or 10 seconds before expulsion is not long enough for group members to always exchange at least one message.

  • The network sometimes has transient outages and you want to avoid unnecessary expulsions and primary member changes at these times.

  • The network is not under your direct control and you want to minimize the need for operator intervention.

  • A temporary network outage is expected and you do not want some or all of the members to be expelled due to this.

  • An individual machine is experiencing a slowdown and you do not want it to be expelled from the group.

You can specify an expel timeout up to a maximum of 3600 seconds (1 hour). It is important to ensure that XCom's message cache is sufficiently large to contain the expected volume of messages in your specified time period, plus the initial 5-second detection period, otherwise members cannot reconnect. You can adjust the cache size limit using the group_replication_message_cache_size system variable. For more information, see Section 20.7.6, “XCom Cache Management”.

If any members in a group are currently under suspicion, the group membership cannot be reconfigured (by adding or removing members or electing a new leader). If group membership changes need to be implemented while one or more members are under suspicion, and you want the suspect members to remain in the group, take any actions required to make the members active again, if that is possible. If you cannot make the members active again and you want them to be expelled from the group, you can force the suspicions to time out immediately. Do this by changing the value of group_replication_member_expel_timeout on any active members to a value lower than the time that has already elapsed since the suspicions were created. The suspect members then become liable for expulsion immediately.

If a replication group member stops unexpectedly and is immediately restarted (for example, because it was started with mysqld_safe), it automatically attempts to rejoin the group if group_replication_start_on_boot=on is set. In this situation, it is possible for the restart and rejoin attempt to take place before the member's previous incarnation has been expelled from the group, in which case the member cannot rejoin. Group Replication automatically uses a Group Communication System (GCS) feature to retry the rejoin attempt for the member 10 times, with a 5-second interval between each retry. This should cover most cases and allow enough time for the previous incarnation to be expelled from the group, letting the member rejoin. Note that if the group_replication_member_expel_timeout system variable is set to specify a longer waiting period before the member is expelled, the automatic rejoin attempts might still not succeed.

For alternative mitigation strategies to avoid unnecessary expulsions where the group_replication_member_expel_timeout system variable is not available, see Section 20.3.2, “Group Replication Limitations”.