WL#11568: Group Replication: option to shutdown server when dropping out of the group
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog implements an option allowing the user to define the behavior of the server once it drops out of the cluster. The option allows the user to specify if the server should voluntarily shut itself down or if it switches itself to super read only mode instead (current behavior). USER STORIES ============ - As a developer using MySQL I want to always read data from a MySQL server that is connected to replication so I minimize my chances of reading stale data. - As a MySQL DBA I want my servers to automatically shoot themselves in the head if they involuntarily drop out of replication, so that other components in my system do not engage stale servers, or remove connections automatically to stale servers, or both. - As a system builder, I want my system to react (e.g., close connections, remove server from the pool of "good" servers, etc) whenever a server goes involuntarily offline w.r.t. replication, so that I avoid to pro-actively polling the system to figure that out. - As a proxy tool routing connections to a MySQL server, I want to get my connections to stale servers automatically closed, so that these connections are evicted from my routing cache automatically. PROBLEMS ======== Issues: - When a server drops out of the group, it sets itself as super-read-only, thus still allowing stale reads. - When a server is stuck on a minority partition, it is still readable, thus still allowing stale reads! - There is no notification emitted to other parts of the infrastructure, not even automatic connection closing when a server drops outside of replication. Users want: - MySQL Router to kill all open connections to a server that leaves the group (setting instance to RO is not sufficient). - MySQL Router to kill all open connections to a server that is stuck on a minority. - Those who not use MySQL Router want a more autonomic system where the server automatically restrict access if it runs into an unrecoverable local error (has gone out of sync). What automatic shutdown does (side-effects): - Closes all open connections and prevents client apps from doing stale reads or failed writes. - Allows for systemd or watchdog tool to restart the server (and thus rejoin automatically)
Functional requirements
=======================
FR1: In the case of applier error, the member will change to ERROR
state and leave the group, consequently if
group_replication_exit_state_action= READ_ONLY then the plugin
will set super_read_only=ON and disallow write operations.
FR2: In the case of applier error, the member will change to ERROR
state and leave the group, consequently if
group_replication_exit_state_action= ABORT_SERVER then the
server process must abort.
FR3: In the case of member expel from the group, the member will
change to ERROR state, consequently if
group_replication_exit_state_action= READ_ONLY then the plugin
will set super_read_only=ON and disallow write operations.
FR4: In the case of member expel from the group, the member will
change to ERROR state, consequently if
group_replication_exit_state_action= ABORT_SERVER then the
server process must abort.
FR5: In the case of member is unable to contact a majority of the
group members, after the timeout specified on
group_replication_unreachable_majority_timeout[1] option, if
group_replication_exit_state_action= READ_ONLY then the plugin
will set super_read_only=ON and disallow write operations.
FR6: In the case of member is unable to contact a majority of the
group members, after the timeout specified on
group_replication_unreachable_majority_timeout[1] option, if
group_replication_exit_state_action= ABORT_SERVER then the
server process must abort.
FR7: In the case of recovery error, the member will change to ERROR
state and leave the group, consequently if
group_replication_exit_state_action= READ_ONLY then the plugin
will set super_read_only=ON and disallow write operations.
FR8: In the case of recovery error, the member will change to ERROR
state and leave the group, consequently if
group_replication_exit_state_action= ABORT_SERVER then the
server process must abort.
[1]
https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout
Non-functional requirements
===========================
None.
While we do not have a complete solution, we can
provide a quick solution that would introduce the option:
- name: group_replication_exit_state_action
- values: { READ_ONLY, ABORT_SERVER }
- default: ABORT_SERVER
- scope: global
- dynamic: yes
Similar to binlog_error_action[1]
In MySQL 5.7.7 and later, this variable defaults to ABORT_SERVER, which
makes the server halt logging and shut down whenever it encounters such
an error with the binary log.
Applying this approach to GR, we will pair
group_replication_exit_state_action with
group_replication_unreachable_majority_timeout[2]
-------------------------------------------------
This would provide the following behaviour:
group_replication_exit_state_action= ABORT_SERVER
group_replication_unreachable_majority_timeout= 10 (seconds)
In the case of applier error, the member must change to ERROR state
and that would trigger ABORT_SERVER.
In the case of member expel, the member must change to ERROR state
and that would trigger ABORT_SERVER.
In the case of unreachable majority, after the 10 seconds, the member
must change to ERROR state and that would trigger ABORT_SERVER.
We must not confuse this new fence ability with
group_replication_defer_expel option, which only kicks in when a
majority does exist.
Example timeline:
0) Setup group of 3 members: M1, M2, M3. All member are online at this
point.
-----------------
View from member1:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
-----------------
View from member2:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
-----------------
View from member3:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
1) due to network issues, M1 becomes unreachable to M2 and M3
-----------------
View from member1:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE UNREACHABLE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE UNREACHABLE
-----------------
View from member2:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE UNREACHABLE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
-----------------
View from member3:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796
MEMBER_STATE UNREACHABLE
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
2) M2 and M3 hold a majority. They will wait
group_replication_defer_expel seconds for M1 to return. M1 does not
return and as such M2 and M3 expel M1 from the group
-----------------
View from member1:
None, since the process abort()'ed. If the server is automatically restarted
by some watchdog tool (like systemd for example), it will not be able to
automatically join the group, because its address is still seen as
UNREACHABLE by the group (i.e. it belongs to the group still but the group
thinks it is unreachable). Eventually its membership is dropped and the
server will be able to joing the group again.
-----------------
View from member2:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
-----------------
View from member3:
SELECT MEMBER_ID, MEMBER_STATE FROM
performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC;
MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796
MEMBER_STATE ONLINE
MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796
MEMBER_STATE ONLINE
group_replication_exit_state_action does operate on M1 side, the member
that does not hold a majority. Such member will never propose or decide
to expel any member.
Since the group_replication_exit_state_action default is ABORT_SERVER, on
the unlikely scenario of a split-brain, with
group_replication_unreachable_majority_timeout > 0 all the servers will
abort, which will consequently cause a full group shutdown.
DBA/operator will need to set group_replication_bootstrap_group=ON on one
server and bootstrap the group, only then the others servers can join.
Later when WL#11648 appears we will change Group Replication plugin to
notify server that a critical failure did happen and that the server
configured failure mode must be engaged.
[1]
https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_error_action
[2]
https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout
SUMMARY OF CHANGES
==================
--------------------------------------------
1. New sysvar
--------------------------------------------
A new plugin sysvar will be added, that will be settable via command-line, the
SET statement or by configuration file. This sysvar shall be named
'group_replication_exit_state_action' and shall be a string whose value
shall be one of { READ_ONLY, ABORT_SERVER } (case-insensitive). The string
value itself of the variable shall be mapped to the following enum:
enum enum_exit_state_action {
EXIT_STATE_ACTION_READ_ONLY = 0,
EXIT_STATE_ACTION_ABORT
};
The default value of the sysvar will be EXIT_STATE_ACTION_ABORT. This changes
the default behaviour completly from previous versions so we must be careful
when configuring new instances of MySQL GR from now on. Further on, for
testing purposes on MTR, the variable must be set at my.cnf level to
EXIT_STATE_ACTION_READ_ONLY so we don't break the rest of the tests.
--------------------------------------------
2. Aborting the server when it involuntarily leaves the group
--------------------------------------------
This new variable shall be stored at the global namespace under the name
'exit_state_action_var' in order to be consumed by all the relevant components.
These components are Applier_module and Group_partition_handling.
2.1 Aborting upon member expel or applier error
These scenarios can be handled by checking exit_state_action_var on
Applier_module::kill_pending_transactions(), right after unblocking all pending
transactions on the server that left the group. This is done in order to allow
these local transactions that are waiting certification to rollback.
If the state is EXIT_STATE_ACTION_ABORT we call abort() right there. Otherwise
we will set the read mode to super_read_only.
2.2 Aborting upon majority loss
This scenario is handled by checking exit_state_action_var on
Group_partition_handling::kill_transactions_and_leave(), again, right after
right after unblocking all pending transactions on the server that left the
group. We do this for the same reasons as described in scenario 2.1.
If the state is EXIT_STATE_ACTION_ABORT we call abort() right there. Otherwise
we will set the read mode to super_read_only.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.