WL#11570: GR: options to defer member eviction after a suspicion
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= Automatically managing the cluster membership requires acknowledging that a server has joined the group, but also that a server has left. The latter may mean that a server left voluntarily (informed others that it was going away) or involuntarily (others need to realize that it has gone away). Realizing that a server has gone away and adjusting the group membership requires monitoring of members' activity and reconfigurations when members are possibly dead. Currently, the period of time that goes between the suspected failure of a node and its eviction from the group, causing a group membership reconfiguration, is immutable. This worklog separates the notion of a suspicion from the time the node is expelled. As such, it creates suspicions for the nodes deemed to have failed, expelling them when a certain time elapses. It also allows users to customize a timeout for these suspicions through the new group_replication_member_expel_timeout parameter, making the cluster tolerant to network and machine delays, as well as suspended nodes, avoiding their eviction if communication is resumed before that timeout elapses. USER STORIES ============ - As a systems administrator, I want to be able to execute maintenance tasks, without causing a node running mysqld to be evicted from the group. - As a MySQL DBA, I want to deploy group replication across a flaky network prone to false suspicions (such as WAN), so that I can have multi-site GR deployments for DR or proximity purposes. - As a MySQL DBA, I want to add a new node or to remove an active node from the group. Since group reconfigurations are suspended while there are active suspicions, I want to be able to force the immediate expulsion of the suspect nodes in order to perform the desired group membership changes. PROBLEMS ======== Issues: - A user knows for sure that a mysqld will be silent for a period of time (not unavailable, simply slower) but has no facility to instruct the GCS expel mechanism to wait for more than that before evicting the node. XCom suspects that a node has failed if it can't communicate with it for more than 5 seconds. Users/Customers want: - A configurable failure detector window to allow for delays or suspension of previously active nodes in the group. Currently, these nodes are expelled immediately after being suspected. NOTES ===== N1. Note that even if one relaxes the expel mechanism from actually evicting the server, this does not mean that one can wait indefinitely. While the server is muted and not evicted, others in the group will have to keep messages in the buffer, so that when the server comes back, they can relay the missing messages to it. If the buffer is exhausted at some point in time, the lagging server must be evicted anyway. N2. The user should ensure all nodes in the group share the same value for the new parameter. Otherwise, the used timeout value will be the one set in the "killer node", which might not be the one that the user really wants to use since the "killer node" status is determined automatically from the node's position in the group. N3. This feature can be used by systems administrators to perform maintenance operations, such as snapshots or even migrate a Virtual Machine to another host, by preventing that slow or paused nodes from being evicted from the group.
------------------------ Functional requirements: ------------------------ FR1. Suspicions must be created for all nodes deemed as failed by XCom's failure detector. FR2. The timeout for suspicions on previously active nodes assumes the current value of the group_replication_member_expel_timeout option. FR3. When a suspicion times out, it is destroyed. If the majority of the group members are active, the suspected member is evicted from the group. FR4. If a suspect node becomes active before its suspicion timeout elapsed, its suspicion is destroyed. FR5. It is possible to update the value of the group_replication_member_expel_timeout option at any time. FR6. No view changes caused by joins or leaves can occur while there are any suspect members in the group. ---------------------------- Non-Functional requirements: ---------------------------- NFR1. This feature will not affect the behavior of the already in-place mechanism of expelling nodes that never became active in the group.
Introduction ============ There are several reasons to delay the eviction of a member from a group: e.g. flaky or slow network, hosts maintenance... This WL allows the user to define the time period that the group should wait for a non-responding node, which can ultimately avoid its eviction. Currently, when a node does not respond during 5 seconds or more, it becomes suspect of having failed. If the node was previously active in the group, it is evicted from the group right-away. Otherwise, if the node was joining the group, a suspicion is created and only when this times out, the node is removed from the group. User Interface ============== The following user interface is suggested: - The period of time, in seconds, that a member waits before expelling from the group any member suspect of having failed. - name: group_replication_member_expel_timeout - unit: seconds - scope: group - value: 0, LONG_TIMEOUT - default: 0 (current behavior) Note: A member, i.e. previously active node, is expelled when its suspicion timeout has elapsed. This timeout takes the value of the new option at the time that the processing thread verifies if the suspicion has timed out. This value is added to the timestamp of the suspicion's creation, which occurs when XCom suspects that it failed, i.e. if the node is silent for 5 seconds or more, and if that sum is inferior to the current timestamp, this means that the suspicion has timed out and the node will eventually be expelled by the node itself if it created a suspicion for its own, or by the killer node in case the group has the majority of members active. On slow networks, or when there are expected machine slowdowns, users can increase the value of this option. The value of the new group_replication_member_expel_timeout option can be defined just like any other Group Replication options through the server's configuration file, my.cnf by default, or using the MySQL client command line interface, where it should be modified exclusively by an administrator. Component Modification ====================== On this WL, we will introduce a new configuration option, group_replication_member_expel_timeout, to define the time that goes between the suspected failure of a group member and its eviction, causing a group membership reconfiguration. Suspicions are created for nodes that do not respond for 5 seconds, which is XCom's failure detector timeout. The minimum allowed value for a suspicion timeout is 0, which corresponds to the current behavior where active nodes that become suspects of having failed are immediately expelled. The existing suspicions mechanism expels only nodes that take too long to join the group, when the corresponding suspicions time out. This mechanism will be modified to expel any node that is a suspect of having failed, in case the group has the majority of its members active. For this purpose, the suspicion's placeholder will be modified to determine the type of the node that is a suspect, in order to use the adequate timeout value. Security Context ================ The functionality introduced by this WL is vulnerable to inconsistency of the value of the new parameter in the various servers of the cluster. Since it is not possible to configure which server is the current "killer node", this server will expel suspects according to the value of its group_replication_member_expel_timeout parameter. If all the servers in the group have different values, this can lead to unpredictable behavior as any server can be the "killer node", using different timeouts to expel other suspect members when the group holds the majority, or even itself, if the server considers itself a suspect and the suspicion times out. Upgrade/Downgrade and Cross-Version Replication =============================================== This WL impacts on cross-version replication in the sense that up to date servers may configure the new group_replication_member_expel_timeout parameter with a value that's not the default one. If this occurs in a multi-version cluster, with the "killer node" being a server running an older version, it may try to remove suspect nodes from the group even if their delay is expected and accounted for on servers running the latest version. Another scenario, is when a server running an older version is suspended or loses connectivity for a period of time that is inferior to the configured timeout of the "killer node", so it returns to the group and receives a view where it is signaled as suspect of having failed, since it will receive all missing messages. This will cause the node expel itself from the group, since it creates its own suspicion and its timeout is 0.
======================= 1. Implementation Steps ======================= During the implementation of this WL, there will be changes to the Group Replication plugin to make it accept and process the new group_replication_member_expel_timeout option, which allows a user to define the waiting period before expelling nodes that were previously active in the group but have been suspected of being dead. The existing suspicions mechanism expels nodes that take too long to join the group, when the corresponding timeout elapses. This mechanism will be modified to expel any node that is a suspect of having failed. For this purpose, the suspicions manager will be modified to store the value of the newly introduced option, and the suspicion's placeholder will be modified to determine the type of node that is a suspect, in order to use the adequate parameter value to determine if the suspicions times out. For nodes that never became active in the group, the value of the existing internal parameter will be used for that, whereas for previously active nodes the new option's value will be used. In any of these cases, if a suspect node becomes active before the corresponding suspicion times out, it receives and applies all the messages that were buffered by the remaining members of the group, and its state should become ONLINE. Except in the case that the value of the group_replication_member_expel_timeout parameter is 0 on a node that returns to a group before being expelled, which means other group members have a higher value set for the parameter. In this case, the returning node should receive all the messages it missed, including the view where it was considered a suspect which causes the node suspect and expel itself from the group. Breaking down what will be done under the scope of this WL, there are the following steps: A-New parameter A.1-Make GR accept parameter A.2-GR conveys the parameter to GCS B-Suspicions and behavior modification B.1-Prevent the eviction of previously active members B.2-Update new and existing suspicions with the member field B.3-Create suspicions for members to expel B.4-Modify suspicions processing thread B.5-Make resumed node leave group if expelled C-Update tests C.1-Update unit tests C.2-Update MTR tests D-Limitations ================ A. New parameter ================ The introduction of a new option, group_replication_member_expel_timeout, is central to this WL, as it will allow the user to customize the waiting period between a suspicion on a node, namely when it is suspected to have failed, and its eviction from the group. A suspicion on a node is created when XCom's failure detector liveness timeout ellapses without receiving any message from that node. This option defines the suspicion timeout for previously active members of the group and its value is defined in seconds. Its default and minimum value is 0 seconds, which means there is no waiting time before expelling this type of suspected nodes, which corresponds to the current behavior. We will refer to these nodes as members, in contrast to the suspected joining nodes, which are non-members, since they still haven't joined the group. A.1 Make GR accept parameter ============================ The value of the new group_replication_member_expel_timeout option can be defined like other Group Replication options through the server's configuration file, my.cnf by default, or using the MySQL client command line interface, where it should be modified exclusively by an administrator. my.cnf ------ [mysqld] group_replication_member_expel_timeout=120 ------ command line ------- mysql> SET GROUP_REPLICATION_MEMBER_EXPEL_TIMEOUT=120; ------- GR will need to accept the new group_replication_member_expel_timeout option, and it will be processed during the plugin's startup or when its value is modified through the client command line interface. For this purpose the plugin.cc must be modified to add the option's corresponding variable, ulong member_expel_timeout_var, and also its registration as a system variable. plugin.cc ------------------------------------------------------------------------------ /* Group communication options */ ... ulong member_expel_timeout_var = 0; ... // GCS module variables ... static MYSQL_SYSVAR_ULONG( member_expel_timeout, /* name */ member_expel_timeout_var, /* var */ PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_PERSIST_AS_READ_ONLY, /* optional var */ "The period of time, in seconds, that a member waits before " "expelling any member suspected of failing from the group.", check_sysvar_ulong_timeout, /* check func. */ update_member_expel_timeout, /* update func. */ 0, /* default */ 0, /* min */ LONG_TIMEOUT, /* max */ 0 /* block */ ); ... static SYS_VAR *group_replication_system_vars[] = { ... MYSQL_SYSVAR(member_expel_timeout), NULL, }; ------------------------------------------------------------------------------ The existing check_sysvar_ulong_timeout function is used to verify if the received option's value is a positive integer lower than the value of LONG_TIMEOUT. The new update_member_expel_timeout function updates the value of the corresponding parameter on GCS. A.2 GR conveys the parameter to GCS =================================== GR will convey the value of member_expel_timeout_var to GCS as all other parameters during its initialization. Therefore, its value is set on the "member_expel_timeout" parameter of the gcs_module_parameters variable, which is then conveyed to the configure method of gcs_module. plugin.cc ------------------------------------------------------------------------------ int configure_group_communication(st_server_ssl_variables *ssl_variables) { ... std::stringstream member_expel_timeout_stream_buffer; member_expel_timeout_stream_buffer << member_expel_timeout_var; gcs_module_parameters.add_parameter("member_expel_timeout", member_expel_timeout_stream_buffer.str()); ... LogPluginErr(INFORMATION_LEVEL, ER_GRP_RPL_GRP_COMMUNICATION_INIT_WITH_CONF, group_name_var, local_address_var, group_seeds_var, bootstrap_group_var ? "true" : "false", poll_spin_loops_var, compression_threshold_var, ip_whitelist_var, communication_debug_options_var, member_expel_timeout_var); ... } ------------------------------------------------------------------------------ The aforementioned configure method of the Gcs_operations class conveys the parameters to the initialize method of Gcs_xcom_interface, which inherits from the Gcs_interface interface. Since these parameters are conveyed to the Gcs_xcom_interface::configure_suspicions_mgr method, this will be modified to retrieve the value of the member_expel_timeout parameter and set it on the new m_member_expel_timeout field of the Gcs_suspicions_manager object, using its setter method. This new field, m_member_expel_timeout, and the corresponding getter and setter, will be defined in gcs_xcom_control_interface.h and implemented in gcs_xcom_control_interface.cc. To better distinguish from the new field, we will rename the existing suspicions_timeout parameter to non_member_expel_timeout, as well as the SUSPICION_TIMEOUT macro to NON_MEMBER_EXPEL_TIMEOUT, and the field to store this value will be renamed from m_timeout to m_non_member_expel_timeout. The getters and setter for these two timeout fields and for the m_suspicions_processing_period field will be guarded by m_suspicions_parameters_mutex, and the value of these three fields can be updated through the Gcs_xcom_interface::configure_suspicions_mgr method. Some other fields will be added to Gcs_suspicions_manager, such as the m_suspicions_cond condition used to signal the suspicions manager to process suspicions, the m_is_killer_node to let the manager know if it is the responsible for expelling the nodes when their suspicions time out, the m_my_info to point to the node's own information and the m_has_majority which lets the manager know if the majority of the group's members are active. The constructor of Gcs_suspicions_manager will have to be modified accordingly, to initialize the new fields. gcs_psi.h ------------------------------------------------------------------------------ extern PSI_mutex_key ... key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex,; ------------------------------------------------------------------------------ gcs_psi.cc ------------------------------------------------------------------------------ PSI_mutex_key ... key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex, ... static PSI__info all_gcs_psi_mutex_keys_info[] = { ... {&key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex, "GCS_Gcs_suspicions_manager::m_suspicions_parameters_mutex", PSI_FLAG_SINGLETON, 0, PSI_DOCUMENT_ME}, ... ------------------------------------------------------------------------------ gcs_xcom_control_interface.h ------------------------------------------------------------------------------ class Gcs_suspicions_manager { public: ... /** Retrieves suspicion thread period in seconds. */ unsigned int get_suspicions_processing_period(); /** Sets the period or sleep time, between iterations, for the suspicion thread. @param[in] sec Suspicion thread period */ void set_suspicions_processing_period(unsigned int sec); /** Retrieves non-member expel timeout in 100s of nanoseconds. @return Non-member expel timeout */ uint64_t get_non_member_expel_timeout(); /** Sets the time interval to wait before removing non-member nodes marked to be expelled from the cluster. @param[in] sec Suspicions timeout in seconds */ void set_non_member_expel_timeout_seconds(unsigned long sec); /** Retrieves member expel timeout in 100s of nanoseconds. @return Member expel timeout */ uint64_t get_member_expel_timeout(); /** Sets the time interval to wait before removing member nodes marked to be expelled from the cluster. @param[in] sec Expel suspicions timeout in seconds */ void set_member_expel_timeout_seconds(unsigned long sec); ... private: /* Suspicions processing thread period in seconds */ unsigned int m_suspicions_processing_period; /* Non-member expel timeout stored in 100s of nanoseconds */ uint64_t m_non_member_expel_timeout; /* Member expel timeout stored in 100s of nanoseconds */ uint64_t m_member_expel_timeout; ... /* Condition used to wake up suspicions thread */ My_xp_cond_impl m_suspicions_cond; /* Mutex to control access to suspicions parameters */ My_xp_mutex_impl m_suspicions_parameters_mutex; /* Signals if node should remove suspect nodes from group. */ bool m_is_killer_node; /* Pointer to this node's information */ Gcs_xcom_node_information *m_my_info; /* Signals if group has a majority of alive nodes. */ bool m_has_majority; } ------------------------------------------------------------------------------ gcs_xcom_control_interface.cc ------------------------------------------------------------------------------ Gcs_suspicions_manager::Gcs_suspicions_manager(Gcs_xcom_proxy *proxy, Gcs_xcom_control *ctrl) : m_proxy(proxy), m_control_if(ctrl), m_suspicions_processing_period(SUSPICION_PROCESSING_THREAD_PERIOD), m_non_member_expel_timeout(NON_MEMBER_EXPEL_TIMEOUT), m_member_expel_timeout(0), m_gid_hash(0), m_suspicions(), m_suspicions_mutex(), m_suspicions_cond(), m_suspicions_parameters_mutex(), m_is_killer_node(false) { m_suspicions_mutex.init( key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_mutex, NULL); m_suspicions_cond.init(key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond); m_suspicions_parameters_mutex.init( key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex, NULL); } ... unsigned int Gcs_suspicions_manager::get_suspicions_processing_period() { unsigned int ret; m_suspicions_parameters_mutex.lock(); ret = m_suspicions_processing_period; m_suspicions_parameters_mutex.unlock(); return ret; } void Gcs_suspicions_manager::set_suspicions_processing_period( unsigned int sec) { m_suspicions_parameters_mutex.lock(); m_suspicions_processing_period = sec; MYSQL_GCS_LOG_DEBUG("Set suspicions processing period to %u seconds.", sec) m_suspicions_parameters_mutex.unlock(); } /* purecov: begin deadcode */ uint64_t Gcs_suspicions_manager::get_non_member_expel_timeout() { uint64_t ret; m_suspicions_parameters_mutex.lock(); ret = m_non_member_expel_timeout; m_suspicions_parameters_mutex.unlock(); return ret; } /* purecov: end */ void Gcs_suspicions_manager::set_non_member_expel_timeout_seconds( unsigned long sec) { m_suspicions_parameters_mutex.lock(); m_non_member_expel_timeout = sec * 10000000ul; MYSQL_GCS_LOG_DEBUG("Set non-member expel timeout to %lu seconds (%lu ns).", sec, m_non_member_expel_timeout * 100); m_suspicions_parameters_mutex.unlock(); } uint64_t Gcs_suspicions_manager::get_member_expel_timeout() { uint64_t ret; m_suspicions_parameters_mutex.lock(); ret = m_member_expel_timeout; m_suspicions_parameters_mutex.unlock(); return ret; } void Gcs_suspicions_manager::set_member_expel_timeout_seconds( unsigned long sec) { m_suspicions_parameters_mutex.lock(); m_member_expel_timeout = sec * 10000000ul; MYSQL_GCS_LOG_DEBUG("Set member expel timeout to %lu seconds (%lu ns).", sec, m_member_expel_timeout * 100); m_suspicions_parameters_mutex.unlock(); } ------------------------------------------------------------------------------ gcs_xcom_interface.cc ------------------------------------------------------------------------------ enum_gcs_error Gcs_xcom_interface::configure_suspicions_mgr( Gcs_interface_parameters &p, Gcs_suspicions_manager *mgr) { enum_gcs_error ret = GCS_NOK; const std::string *non_member_expel_timeout_ptr = p.get_parameter("non_member_expel_timeout"); if (non_member_expel_timeout_ptr != NULL) { mgr->set_non_member_expel_timeout_seconds(static_cast( atoi(non_member_expel_timeout_ptr->c_str()))); ret = GCS_OK; MYSQL_GCS_LOG_TRACE( "::configure_suspicions_mgr():: Set non-member expel timeout to %s " "seconds", non_member_expel_timeout_ptr->c_str()) } const std::string *member_expel_timeout_ptr = p.get_parameter("member_expel_timeout"); if (member_expel_timeout_ptr != NULL) { mgr->set_member_expel_timeout_seconds( static_cast (atoi(member_expel_timeout_ptr->c_str()))); ret = GCS_OK; MYSQL_GCS_LOG_TRACE( "::configure_suspicions_mgr():: Set member expel timeout to %s " "seconds", member_expel_timeout_ptr->c_str()) } const std::string *suspicions_processing_period_ptr = p.get_parameter("suspicions_processing_period"); if (suspicions_processing_period_ptr != NULL) { mgr->set_suspicions_processing_period(static_cast ( atoi(suspicions_processing_period_ptr->c_str()))); ret = GCS_OK; MYSQL_GCS_LOG_TRACE( "::configure_suspicions_mgr():: Set suspicions processing period to %s " "seconds", suspicions_processing_period_ptr->c_str()); } return ret; } ------------------------------------------------------------------------------ When the value of the group_replication_member_expel_timeout option is updated through the command line, the update_member_expel_timeout function is invoked to convey the value to GCS and, ultimately, to the suspicions manager. For this purpose, a new method for updating the parameters in GCS is required. Gcs_operations::reconfigure will invoke the Gcs_interface::configure method, which is implemented by Gcs_xcom_interface::configure. This method ultimately invokes the Gcs_xcom_interface::configure_suspicions_mgr method, that updates the value of the m_member_expel_timeout field. The parameters passed as argument to the reconfigure method will include, besides the new value of the member_expel_timeout parameter, the group name and the new reconfigure_ip_whitelist parameter set to false, to avoid the whitelist from being reconfigured. gcs_operations.h ------------------------------------------------------------------------------ class Gcs_operations { public: ... /** Reconfigure the GCS interface, i.e. update its configuration parameters. @param[in] parameters The configuration parameters @return the operation status @retval 0 OK @retval !=0 Error */ enum enum_gcs_error reconfigure(const Gcs_interface_parameters parameters); ... } ------------------------------------------------------------------------------ The value of the member_expel_timeout parameter will be verified in the is_parameters_syntax_correct function present in Gcs_xcom_utils.cc, in a similar way to the validation of non_member_expel_timeout. ======================================= B. Suspicions and behavior modification ======================================= On this stage, we start by removing the mechanism for the immediate eviction of previously active nodes, or members, when a global view is processed. Then, we modify the Gcs_xcom_node_information class to store a boolean indicating if the node is already a member of the group, and update the creation and processing mechanisms of suspicions. The last step on this stage is the creation of the suspicions and the corresponding monitoring period for members to expel. B.1 Prevent the eviction of previously active members ===================================================== Currently, there are two types of nodes that are expelled from the group: 1. Non-members, which correspond to the nodes that probably failed while entering the group, possibly during the initial state exchange. These nodes aren't in the current set of group members and are marked as faulty in the view. 2. Members that were previously active in the group, but are considered to be faulty in the view. All of these nodes are suspected of having failed by XCom's failure detector, as it didn't receive any type of message from them over the last 5 seconds. The current behavior for dealing with nodes to expel is that when a view is received and processed by the Gcs_xcom_control::xcom_receive_global_view method, both of these types of nodes are extracted from the failed_members list into the suspect_members and expel_members lists, by the Gcs_xcom_control::build_suspect_members and Gcs_xcom_control::build_expel_members methods, respectively. Then, the Gcs_suspicions_manager::process_view method is invoked with these two lists, as well as the alive_members list, as parameters. It deletes existing suspicions for members, that are in the alive_members or in the expel_members lists, and creates new suspicions for non-members, nodes in the suspect_members list, if they didn't previously exist. After all this, if there are any nodes in the expel_members list, they are removed from the group by the "killer node", which corresponds to the node that currently holds the lowest position in the group, in the Gcs_xcom_control::xcom_receive_global_view method. On this WL, instead of simply expelling members, the nodes in the expel_members list, which will be renamed to member_suspect_nodes, suspicions are created for them later on, as currently occurs for non-members, the nodes in the suspect_members list, which will be renamed to non_member_suspect_nodes. Therefore, we eliminate the removal of the nodes comprised in the expel_members list from the Gcs_xcom_control::xcom_receive_global_view method. B.2 Update new and existing suspicions with the member field ============================================================ Due to different types of nodes that can be suspected, the Gcs_xcom_node_information class will be modified with the addition of a new field, m_member, to indicate if the suspect node is or isn't a current member of the group, and the corresponding getter and setter methods. The Gcs_xcom_node_information::has_timed_out method will not be modified, as we invoke it with the current timestamp and the timeout value that corresponds to the type of the suspected node. gcs_xcom_group_member_information.h ------------------------------------------------------------------------------ class Gcs_xcom_node_information { public: ... /** Compares the object's timestamp with the received one, in order to check if the suspicion has timed out and the suspect node must be removed. @param[in] ts Provided timestamp @param[in] timeout Provided timeout @return Indicates if the suspicion has timed out */ bool has_timed_out(uint64_t ts, uint64_t timeout); ... /** Get whether the node is already a member of the group or not. */ bool is_member() const; /** Set whether the node is already a member of the group or not. */ void set_member(bool m); ... private: ... /** Whether the member is a member of the group or not. */ bool m_member; } ------------------------------------------------------------------------------ B.3 Create suspicions for members to expel ========================================== To create suspicions for the nodes in the member_suspect_nodes list, similarly to what occurs to nodes in the non_member_suspect_nodes list, the expel_members list will be conveyed as a parameter to the Gcs_suspicions_manager::add_suspicions method, upon its invocation in the Gcs_suspicions_manager::process_view method. This method will be further modified, namely, to set the if the suspected node is a group member. gcs_xcom_control_interface.h ------------------------------------------------------------------------------ /** Invoked by Gcs_suspicions_manager::process_view, it adds suspicions for the nodes received as argument if they aren't already suspects. @param[in] xcom_nodes List of all nodes (i.e. alive or dead) with low level information such as timestamp, unique identifier, etc @param[in] non_member_suspect_nodes List of joining nodes to add to m_suspicions @param[in] member_suspect_nodes List of previously active nodes to add to m_suspicions @return Indicates if new suspicions were added */ bool add_suspicions( Gcs_xcom_nodes *xcom_nodes, std::vector non_member_suspect_nodes, std::vector member_suspect_nodes); ------------------------------------------------------------------------------ B.4 Modify suspicions processing thread ======================================= In order to maintain the current behavior when dealing with members to expel, we have to change the suspicions processing thread, since it only works periodically. We want to wake it up in case there are members to expel immediately after the creation of their suspicions. For this purpose, we will include, in the Gcs_suspicions_manager class, a new condition, m_suspicions_cond, that the thread will be waiting on during m_period seconds, instead of sleeping. That will allow us to wake the thread from the process_view method if the member_suspect_nodes list is not empty and the value of the group_replication_member_expel_timeout option is smaller than m_period. gcs_psi.h ------------------------------------------------------------------------------ extern PSI_cond_key ... key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond; ------------------------------------------------------------------------------ gcs_psi.cc ------------------------------------------------------------------------------ PSI_cond_key ... key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond; static PSI_cond_info all_gcs_psi_cond_keys_info[] = { ... {&key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond, "GCS_Gcs_suspicions_manager::m_suspicions_cond", PSI_FLAG_SINGLETON, 0, PSI_DOCUMENT_ME}}; ------------------------------------------------------------------------------ gcs_xcom_control_interface.h ------------------------------------------------------------------------------ class Gcs_suspicions_manager { public: ... /** Do a timed wait on m_suspicions_cond for m_period seconds. */ void timedwait(); } ------------------------------------------------------------------------------ Another change we will introduce in the Gcs_suspicions_manager::add_suspicions method to invoke the signal method on m_suspicions_cond if suspicions for members to expel were added and if the m_expel_timeout is smaller than m_period. This should avoid any delays on evicting this type of suspicious nodes, mimicking the current behavior. Finally, we will modify the Gcs_suspicions_manager::process_suspicions method by making nodes able to evict themselves from the group, if their own suspicion times out, and also by making the killer node the single responsible for evicting the nodes whose suspicions timeout, in case the majority of the group members are active. These two conditions will be stored in two new fields, m_is_killer_node and m_has_majority, in the Gcs_suspicions_manager class. To update the value of m_is_killer_node everytime a view is processed, an additional boolean parameter will be added to the Gcs_suspicions_manager::process_view method, whereas the value of m_has_majority is updated by comparing the size of the non_member_suspect_nodes and member_suspect_nodes lists with the number of elements in xcom_nodes. Part of the Gcs_suspicions_manager::process_suspicions method is refactored into the new Gcs_suspicions_manager::run_process_suspicions method to allow us to trigger deterministically the suspicions processing thread in the unit tests. gcs_xcom_control_interface.h ------------------------------------------------------------------------------ class Gcs_suspicions_manager { public: ... /** Invoked by Gcs_xcom_control::xcom_receive_global_view, it invokes the remove_suspicions method for the alive_nodes and left_nodes parameters, if they're not empty, neither m_suspicions. It also invokes the add_suspicions method if the non_member_suspect_nodes and member_suspect_nodes parameter aren't empty. @param[in] xcom_nodes List of all nodes (i.e. alive or dead) with low level information such as timestamp, unique identifier, etc @param[in] alive_nodes List of the nodes that currently belong to the group @param[in] left_nodes List of the nodes that have left the group @param[in] non_member_suspect_nodes List of joining nodes to add to m_suspicions @param[in] member_suspect_nodes List of previously active nodes to add to m_suspicions @param[in] is_killer_node Indicates if node should remove suspect members from the group */ void process_view( Gcs_xcom_nodes *xcom_nodes, std::vector alive_nodes, std::vector left_nodes, std::vector member_suspect_nodes, std::vector non_member_suspect_nodes, bool is_killer_node); ... /** Invoked periodically by the suspicions processing thread, it picks a timestamp and verifies which suspect nodes should be removed as they have timed out. @param[in] lock Whether lock should be acquired or not */ void run_process_suspicions(bool lock); ... } ------------------------------------------------------------------------------ B.5 Make resumed node leave group if expelled ============================================== When a suspect node is able to return to the group before its suspicion times out, it resumes normal operation. If it detects it was expelled before returning to the group, it will switch to the ERROR state and install a leave view to prevent conflicts with the view that is currently installed on the remainder of the group. For this purpose, we modified the Gcs_xcom_control::do_leave_gcs method by renaming it to Gcs_xcom_control::do_leave_view and removing its parameter, since it invokes the existing Gcs_xcom_control::install_leave_view method with the adequate parameter. We convey to the Gcs_suspicions_manager constructor a pointer to the Gcs_xcom_control object, so that when the node's own suspicions times out the suspicions manager is able to install a leave view by invoking the Gcs_xcom_control::install_leave_view method, whose access was modified from private to public. Two new fields, m_leave_view_requested and m_leave_view_delivered, are added to the Gcs_xcom_control class to determine if a leave view was requested and delivered. The value of m_leave_view_requested will determine the error code conveyed to the Gcs_xcom_control::install_leave_view method: Gcs_view::OK in case it is enabled, and Gcs_view::MEMBER_EXPELLED otherwise. gcs_xcom_control_interface.h ------------------------------------------------------------------------------ class Gcs_suspicions_manager { public: /** Constructor for Gcs_suspicions_manager, which sets m_proxy with the received pointer parameter. @param[in] proxy Pointer to Gcs_xcom_proxy @param[in] ctrl Pointer to Gcs_xcom_control */ explicit Gcs_suspicions_manager(Gcs_xcom_proxy *proxy, Gcs_xcom_control *ctrl); ... } ... class Gcs_xcom_control : public Gcs_control_interface { public: ... /** Sends a leave view message to inform that XCOM has already exited or is about to do so. */ void do_leave_view(); ... /** Notify that the current member has left the group and whether it left gracefully or not. @param[in] error_code that identifies whether there was any error when the view was received. */ void install_leave_view(Gcs_view::Gcs_view_error_code error_code); ... protected: ... /* Whether it was requested to make the node leave the group or not. */ bool m_leave_view_requested; /* Whether a view saying that the node has voluntarily left the group was delivered or not. */ bool m_leave_view_delivered; ... } ------------------------------------------------------------------------------ A new notification class, Expel_notification, was added to process the node's eviction from the group and it includes the m_functor field, which points to the function to be executed when the Expel_notification::do_execute method is invoked. gcs_xcom_notification.h ------------------------------------------------------------------------------ ... typedef void(xcom_expel_functor)(void); /** Notification used to inform that the node has been expelled or is about to be. */ class Expel_notification : public Parameterized_notification { public: /** Constructor for Expel_notification. @param functor Pointer to a function that contains that actual core of the execution. */ explicit Expel_notification(xcom_expel_functor *functor); /** Destructor for Expel_notification. */ ~Expel_notification(); private: /** Task implemented by this notification. */ void do_execute(); /* Pointer to a function that contains that actual core of the execution. */ xcom_expel_functor *m_functor; /* Disabling the copy constructor and assignment operator. */ Expel_notification(Expel_notification const &); Expel_notification &operator=(Expel_notification const &); }; ------------------------------------------------------------------------------ The existing xcom_fatal_error_cb function pointer is renamed to xcom_expel_cb and the invocation to this callback was moved from the dispatch_op function to the terminate_and_exit function, in order to let GCS know that XCom has stopped, which occurs, for instance, when the node returns to the group and detects that it was expelled. The setter for this pointer is invoked in the Gcs_xcom_interface::initialize_xcom method with the cb_xcom_expel callback function as parameter, defined in the gcs_xcom_interface.cc file. This functions is invoked by XCom to signal that the node left the group because it received a view where it is not part of the group, or a die_op was triggered. The existing cb_xcom_fatal_error function is renamed to cb_xcom_expel and it creates a new Expel_notification which is pushed into the gcs_engine, so the node will eventually process it and leave the group gcs_xcom_interface.cc ------------------------------------------------------------------------------ ... void cb_xcom_expel(int status); void do_cb_xcom_expel(); ... ------------------------------------------------------------------------------ =============== C. Update Tests =============== C.1 Update Unit Tests ===================== We must update the affected unit tests to be consistent with the changes introduced by this WL. Therefore, we modify the gcs-parameters-t test file in order to verify that the new member_expel_timeout and the renamed non_member_expel_timeout parameters are processed and conveyed correctly by GCS. We also modify the gcs_xcom_group_member_information-t test file to update the invocation to set_timestamp and get_timestamp to set_suspicion_creation_timestamp and get_suspicion_creation_timestamp respectively. On the gcs_xcom_control_interface-t.cc file, we removed the usage of a sync point and inserted some new tests: SuspectMembersRemoval, to verify that the eviction of previously active nodes having the default value on the new option keeps the current behavior; ThreeSuspectNodesRemovalAfterTimeoutReset, where we reset the values of the non_member_expel_timeout and member_expel_timeout parameters to 0 and verify that the suspicions that hadn't timed out finally do; SuspectMemberFailedRemovalDueToMajorityLoss, where we verify that the eviction of a node is impeded by the fact that only a minority of the group members are active. We will also update some of the tests according to the behavior modifications, namely FailedNodeRemovalTest, FailedNodeGlobalViewTest, ThreeSuspectNodesRemoval, FalseThreeSuspectNodesWithdrawn and ThreeSuspectNodesRemovalAndWithdrawn. C.2 Update MTR Tests ==================== The gr_gcs_psi_mutex_cond, gr_persist_only_variables, gr_persist_variables and gr_show_global_and_session_variables and gr_variables_default_values test and result files, that belong to MTR's Group Replication suite must be updated taking into account the introduction of a new instrumented condition on Gcs_suspicions_manager. New MTR tests will be implemented to verify if a suspect member resumes before and after being expelled or crashed, respectively in the gr_suspect_member_resumes, gr_suspect_member_resumes_after_expel, gr_suspect_member_resumes_after_crash and gr_suspect_member_expelled tests. Two new MTR tests are added to verify the behavior of membership changes while there is a suspect member in the gr_join_with_suspect_member and gr_leave_with_suspect_member tests. A new MTR test is added to verify the behavior of the suspicions mechanism when suspicions timeout when the group doesn't have the majority of its members active in the gr_majority_loss_restored_after_timeout test. ============== D. Limitations ============== D.1 Inability to reconfigure the group with suspect members =========================================================== While there are any suspect members in the group, no regular group membership changes lead to new views being installed, or to the election of a new leader, if it is a suspect member. This is due to the fact that views containing any failed member, which includes suspect members, are discarded. In this scenario, i.e. where the group members suspect of at least one member of the group: -when a regular node wants to leave the group, it exits successfully, by issuing the STOP GROUP_REPLICATION command. However, the corresponding view is not installed by the remaining nodes in the group. -when a new node wants to join the group, the START GROUP_REPLICATION command times out. To allow such group membership changes to be successful and propagated to all the group members, all the current suspect members must return to the group, if that is possible to be performed by the user, or be expelled. If it is taking too long to expel a node, users can reset this timeout by setting the group_replication_member_expel_timeout parameter to a different value on all alive members. This limitation will be dealt with in a future WL and should be highlighted in the documentation so users are aware of the expected behavior.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.