WL#12895: Group Replication: Add OFFLINE_MODE to group_replication_exit_state_action
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= The option group_replication_exit_state_action configures how Group Replication behaves when a server leaves the group unintentionally: 1. after encountering an applier error; 2. after encountering an recovery error; 3. in the case of a loss of majority; 4. when another member of the group expels it due to a suspicion timing out; 5. after an error on coordinated group changes[1]; 6. after a primary election error; 7. after a clone provisioning error; 8. when automatic rejoin is enable, after its attempts are exhausted unsuccessfully. This behavior can be: * READ_ONLY disable writes on the server, by enabling server super_read_only[2] option; * ABORT_SERVER shutdown the server. After this worklog is implemented, a new behavior is added: * OFFLINE_MODE close all connections and disallow new ones from users who do not have the CONNECTION_ADMIN or SUPER privilege, by enabling server offline_mode[3] option. OFFLINE_MODE includes the behaviour of READ_ONLY. This automatic connection closure will avoid user sessions to: 1) read stale data and 2) write data to orphans servers, without the need to shutdown the server. Proxy tools, like MySQL Router, will immediately detect when a connection is closed and update its pool of connections avoiding directing users request to failed servers. [1] https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-online-group.html [2] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_super_read_only [3] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_offline_mode USER STORIES ============ - As a system developer using MySQL I want to always read data from a MySQL server that is connected through replication so I minimize my chances of reading stale data. - As a MySQL DBA I want my servers to automatically close users connections if they (the servers) involuntarily drop out of replication, so that other components in my system do not engage stale servers. - As a system builder, I want my system to react (e.g., close connections, remove server from the pool of "good" servers, etc) whenever a server goes involuntarily offline w.r.t. replication, so that I avoid to pro-actively polling the system to figure that out. - As a proxy tool routing connections to a MySQL server, I want to get my connections to stale servers automatically closed, so that these connections are evicted from my routing cache automatically.
Functional requirements ======================= FR1: In the case of applier error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR2: In the case of recovery error, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR3: In the case of member expel from the group, the member will change to ERROR state, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR4: In the case of member is unable to contact a majority of the group members, after the timeout specified on group_replication_unreachable_majority_timeout[2] option, if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR5: In the case of error during a coordinated group change, like switch from single-primary to multi-primary mode, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR6: In the case of error during a primary election, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR7: In the case of error during cloning provisioning, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR8: When automatic rejoin is enable[3], after its attempts are exhausted unsuccessfully, the member will change to ERROR state and leave the group, consequently if group_replication_exit_state_action= OFFLINE_MODE then the plugin will set server offline_mode= ON [1]. FR9: group_replication_exit_state_action= OFFLINE_MODE includes the behaviour of group_replication_exit_state_action= READ_ONLY. [1] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_offline_mode [2] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_unreachable_majority_timeout [3] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_autorejoin_tries Non-functional requirements =========================== None.
DEFINITIONS =========== super_read_only: server super_read_only option https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_super_read_only offline_mode: server offline_mode option https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_offline_mode READ_ONLY: value of group_replication_exit_state_action option https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_exit_state_action OFFLINE_MODE: value of group_replication_exit_state_action option https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_exit_state_action SUMMARY OF THE APPROACH ======================= The DBA can configure how Group Replication behaves when a server leaves the group unintentionally. A server leaves the group unintentionally: 1. after encountering an applier error; 2. after encountering an recovery error; 3. in the case of a loss of majority; 4. when another member of the group expels it due to a suspicion timing out; 5. after an error on coordinated group changes[1]; 6. after a primary election error; 7. after a clone provisioning error; 8. when automatic rejoin is enable, after its attempts are exhausted unsuccessfully. The behavior taken depends on the value of the option group_replication_exit_state_action: - READ_ONLY The default value, switches the server to super read only mode. RW (read-write) transactions are disallowed, RO (read-only) are allowed. - ABORT_SERVER (WL#11568) Shutdowns the server, thence closing all user connections. - OFFLINE_MODE Behavior added on this worklog, closes all connections and disallows new ones from users who do not have the CONNECTION_ADMIN or SUPER privilege. This is done by setting server offline_mode option to ON. OFFLINE_MODE includes the behaviour of READ_ONLY, that is, it also disallows RW transactions, even from users with CONNECTION_ADMIN or SUPER privilege. The full description of offline_mode can be read at https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_offline_mode [1] https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-online-group.html USER INTERFACE ============== The DBA can specify the behaviour by setting the option: - name: group_replication_exit_state_action - values: { READ_ONLY, ABORT_SERVER, OFFLINE_MODE } - default: READ_ONLY - scope: global - dynamic: yes - replicated: no - persistable: PERSIST, PERSIST_ONLY - credentials: SYSTEM_VARIABLES_ADMIN - description: The action that is taken when the server leaves the group unintentionally. Possible values are READ_ONLY, ABORT_SERVER and OFFLINE_MODE. SECURITY CONTEXT ================ SYSTEM_VARIABLES_ADMIN privilege is required to change the option group_replication_exit_state_action. The mysql.session user used internally to access the SQL API requires SUPER or CONNECTION_ADMIN privilege to avoid having its connections closed when server offline_mode= ON. Currently the mysql.session has the SUPER privilege, this worklog will add the CONNECTION_ADMIN one. UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION =============================================== There are no repercussions on upgrade scenarios. OBSERVABILITY ============= The member state after leaving is ERROR (no change on this worklog), which can be check on performance_schema.replication_group_members table: SELECT * FROM performance_schema.replication_group_members; The offline_mode state can be check with: SELECT @@GLOBAL.offline_mode; Existing connections which users do not have CONNECTION_ADMIN or SUPER privilege will be close and unable to establish new connections. DEPLOYMENT AND INSTALLATION =========================== There are no repercussions. PROTOCOL ======== There are no repercussions. FAILURE MODEL SPECIFICATION =========================== The failure model remains unchanged, this worklog introduces a new behaviour on how to handle the involuntary leaves.
SUMMARY OF CHANGES ================== Server core changes ------------------- - Add CONNECTION_ADMIN privilege to mysql.session user. Group Replication changes ------------------------- - Extend the option group_replication_exit_state_action with one more value: OFFLINE_MODE. - When a server leaves involuntary and group_replication_exit_state_action=OFFLINE_MODE: - set server super_read_only[1] to ON through the SQL API; - set server offline_mode[2] to ON through the SQL API; [1] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_super_read_only [2] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_offline_mode
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.