WL#11568: Group Replication: option to shutdown server when dropping out of the group
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog implements an option allowing the user to define the behavior of the server once it drops out of the cluster. The option allows the user to specify if the server should voluntarily shut itself down or if it switches itself to super read only mode instead (current behavior). USER STORIES ============ - As a developer using MySQL I want to always read data from a MySQL server that is connected to replication so I minimize my chances of reading stale data. - As a MySQL DBA I want my servers to automatically shoot themselves in the head if they involuntarily drop out of replication, so that other components in my system do not engage stale servers, or remove connections automatically to stale servers, or both. - As a system builder, I want my system to react (e.g., close connections, remove server from the pool of "good" servers, etc) whenever a server goes involuntarily offline w.r.t. replication, so that I avoid to pro-actively polling the system to figure that out. - As a proxy tool routing connections to a MySQL server, I want to get my connections to stale servers automatically closed, so that these connections are evicted from my routing cache automatically. PROBLEMS ======== Issues: - When a server drops out of the group, it sets itself as super-read-only, thus still allowing stale reads. - When a server is stuck on a minority partition, it is still readable, thus still allowing stale reads! - There is no notification emitted to other parts of the infrastructure, not even automatic connection closing when a server drops outside of replication. Users want: - MySQL Router to kill all open connections to a server that leaves the group (setting instance to RO is not sufficient). - MySQL Router to kill all open connections to a server that is stuck on a minority. - Those who not use MySQL Router want a more autonomic system where the server automatically restrict access if it runs into an unrecoverable local error (has gone out of sync). What automatic shutdown does (side-effects): - Closes all open connections and prevents client apps from doing stale reads or failed writes. - Allows for systemd or watchdog tool to restart the server (and thus rejoin automatically)
Functional requirements ======================= FR1: In the case of applier error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= READ_ONLY then the plugin will set super_read_only=ON and disallow write operations. FR2: In the case of applier error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= ABORT_SERVER then the server process must abort. FR3: In the case of member expel from the group, the member will change to ERROR state, consequently if group_replication_exit_state_action= READ_ONLY then the plugin will set super_read_only=ON and disallow write operations. FR4: In the case of member expel from the group, the member will change to ERROR state, consequently if group_replication_exit_state_action= ABORT_SERVER then the server process must abort. FR5: In the case of member is unable to contact a majority of the group members, after the timeout specified on group_replication_unreachable_majority_timeout[1] option, if group_replication_exit_state_action= READ_ONLY then the plugin will set super_read_only=ON and disallow write operations. FR6: In the case of member is unable to contact a majority of the group members, after the timeout specified on group_replication_unreachable_majority_timeout[1] option, if group_replication_exit_state_action= ABORT_SERVER then the server process must abort. FR7: In the case of recovery error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= READ_ONLY then the plugin will set super_read_only=ON and disallow write operations. FR8: In the case of recovery error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= ABORT_SERVER then the server process must abort. [1] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout Non-functional requirements =========================== None.
While we do not have a complete solution, we can provide a quick solution that would introduce the option: - name: group_replication_exit_state_action - values: { READ_ONLY, ABORT_SERVER } - default: ABORT_SERVER - scope: global - dynamic: yes Similar to binlog_error_action[1] In MySQL 5.7.7 and later, this variable defaults to ABORT_SERVER, which makes the server halt logging and shut down whenever it encounters such an error with the binary log. Applying this approach to GR, we will pair group_replication_exit_state_action with group_replication_unreachable_majority_timeout[2] ------------------------------------------------- This would provide the following behaviour: group_replication_exit_state_action= ABORT_SERVER group_replication_unreachable_majority_timeout= 10 (seconds) In the case of applier error, the member must change to ERROR state and that would trigger ABORT_SERVER. In the case of member expel, the member must change to ERROR state and that would trigger ABORT_SERVER. In the case of unreachable majority, after the 10 seconds, the member must change to ERROR state and that would trigger ABORT_SERVER. We must not confuse this new fence ability with group_replication_defer_expel option, which only kicks in when a majority does exist. Example timeline: 0) Setup group of 3 members: M1, M2, M3. All member are online at this point. ----------------- View from member1: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE ----------------- View from member2: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE ----------------- View from member3: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE 1) due to network issues, M1 becomes unreachable to M2 and M3 ----------------- View from member1: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE UNREACHABLE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE UNREACHABLE ----------------- View from member2: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE UNREACHABLE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE ----------------- View from member3: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f05a00b8-1583-11e8-a2e7-0010e0734796 MEMBER_STATE UNREACHABLE MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE 2) M2 and M3 hold a majority. They will wait group_replication_defer_expel seconds for M1 to return. M1 does not return and as such M2 and M3 expel M1 from the group ----------------- View from member1: None, since the process abort()'ed. If the server is automatically restarted by some watchdog tool (like systemd for example), it will not be able to automatically join the group, because its address is still seen as UNREACHABLE by the group (i.e. it belongs to the group still but the group thinks it is unreachable). Eventually its membership is dropped and the server will be able to joing the group again. ----------------- View from member2: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE ----------------- View from member3: SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members ORDER BY MEMBER_PORT ASC; MEMBER_ID f0539a9c-1583-11e8-aa19-0010e0734796 MEMBER_STATE ONLINE MEMBER_ID f0614c0d-1583-11e8-aa49-0010e0734796 MEMBER_STATE ONLINE group_replication_exit_state_action does operate on M1 side, the member that does not hold a majority. Such member will never propose or decide to expel any member. Since the group_replication_exit_state_action default is ABORT_SERVER, on the unlikely scenario of a split-brain, with group_replication_unreachable_majority_timeout > 0 all the servers will abort, which will consequently cause a full group shutdown. DBA/operator will need to set group_replication_bootstrap_group=ON on one server and bootstrap the group, only then the others servers can join. Later when WL#11648 appears we will change Group Replication plugin to notify server that a critical failure did happen and that the server configured failure mode must be engaged. [1] https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_error_action [2] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout
SUMMARY OF CHANGES ================== -------------------------------------------- 1. New sysvar -------------------------------------------- A new plugin sysvar will be added, that will be settable via command-line, the SET statement or by configuration file. This sysvar shall be named 'group_replication_exit_state_action' and shall be a string whose value shall be one of { READ_ONLY, ABORT_SERVER } (case-insensitive). The string value itself of the variable shall be mapped to the following enum: enum enum_exit_state_action { EXIT_STATE_ACTION_READ_ONLY = 0, EXIT_STATE_ACTION_ABORT }; The default value of the sysvar will be EXIT_STATE_ACTION_ABORT. This changes the default behaviour completly from previous versions so we must be careful when configuring new instances of MySQL GR from now on. Further on, for testing purposes on MTR, the variable must be set at my.cnf level to EXIT_STATE_ACTION_READ_ONLY so we don't break the rest of the tests. -------------------------------------------- 2. Aborting the server when it involuntarily leaves the group -------------------------------------------- This new variable shall be stored at the global namespace under the name 'exit_state_action_var' in order to be consumed by all the relevant components. These components are Applier_module and Group_partition_handling. 2.1 Aborting upon member expel or applier error These scenarios can be handled by checking exit_state_action_var on Applier_module::kill_pending_transactions(), right after unblocking all pending transactions on the server that left the group. This is done in order to allow these local transactions that are waiting certification to rollback. If the state is EXIT_STATE_ACTION_ABORT we call abort() right there. Otherwise we will set the read mode to super_read_only. 2.2 Aborting upon majority loss This scenario is handled by checking exit_state_action_var on Group_partition_handling::kill_transactions_and_leave(), again, right after right after unblocking all pending transactions on the server that left the group. We do this for the same reasons as described in scenario 2.1. If the state is EXIT_STATE_ACTION_ABORT we call abort() right there. Otherwise we will set the read mode to super_read_only.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.