The
group_replication_autorejoin_tries
system variable makes a member that has been expelled or reached
its unreachable majority timeout try to rejoin the group
automatically. The default (3), means that the member
automatically makes 3 attempts to rejoin the group, with 5
minutes between each attempt.
When auto-rejoin is not activated, a member accepts its
expulsion as soon as it resumes communication, and proceeds to
the action specified by the
group_replication_exit_state_action
system variable. After this, manual intervention is needed to
bring the member back into the group. Using the auto-rejoin
feature is appropriate if you can tolerate the possibility of
stale reads and want to minimize the need for manual
intervention, especially where transient network issues fairly
often result in the expulsion of members.
With auto-rejoin, when the member's expulsion or unreachable
majority timeout is reached, it makes an attempt to rejoin
(using the current plugin option values), then continues to make
further auto-rejoin attempts up to the specified number of
tries. After an unsuccessful auto-rejoin attempt, the member
waits 5 minutes before the next try. The auto-rejoin attempts
and the time between them are called the auto-rejoin procedure.
If the specified number of tries is exhausted without the member
rejoining or being stopped, the member proceeds to the action
specified by the
group_replication_exit_state_action
system
variable.
During and between auto-rejoin attempts, a member remains in
super read only mode and displays an ERROR
state on its view of the replication group. During this time,
the member does not accept writes. However, reads can still be
made on the member, with an increasing likelihood of stale reads
over time. If you do want to intervene to take the member
offline during the auto-rejoin procedure, the member can be
stopped manually at any time by using a
STOP GROUP_REPLICATION
statement
or shutting down the server. If you cannot tolerate the
possibility of stale reads for any period of time, set the
group_replication_autorejoin_tries
system variable to 0.
You can monitor the auto-rejoin procedure using the Performance
Schema. While an auto-rejoin procedure is taking place, the
Performance Schema table
events_stages_current
shows the
event “Undergoing auto-rejoin procedure”, with the
number of retries that have been attempted so far during this
instance of the procedure (in the
WORK_COMPLETED
column). The
events_stages_summary_global_by_event_name
table shows the number of times the server instance has
initiated the auto-rejoin procedure (in the
COUNT_STAR
column). The
events_stages_history_long
table
shows the time each of these auto-rejoin procedures was
completed (in the TIMER_END
column). While a
member is rejoining a replication group, its status can be
displayed as OFFLINE
or
ERROR
before the group completes the
compatibility checks and accepts it as a member. When the member
is catching up with the group's transactions, its status is
RECOVERING
.