WL#9050: Group Replication: MySQL GCS majority loss handling integration
Affects: Server-5.7
—
Status: Complete
MySQL Group Replication provides multi-master update everywhere replication to MySQL. Clients can connect to any group server, and after conflict detection, write changes that will be propagated to all group members. The multi-master behaviour is enabled by a group communication system, XCom, which requires a consensus between group members in order to agree on which messages and on which order them are delivered to all group members. Group communication consensus requires a majority of group members to agree on a given decision in order to fulfil its correctness and liveness properties. When that majority of group members is lost, the group is unable to progress and blocks, to avoid break correctness property. This group block does happen on the following scenarios: 1) In a 5 members group on which 3 members do crash 2) A split brain: a 6 members groups is split into two groups of 3 members each due to network partitions. Both scenarios are examples of the rule - the number of online members (O) must be higher than half of the group members (M): O > M/2 When a situation like this does happen, the group is unable to heal itself, requiring manual intervention to decide if the group should be reconfigured to only consider the online members (1) or which partition should be considered the primary one (2). A DBA, through a new option group_replication_force_peer_addresses, will be able to reconfigure a blocked group to a subset of its members.
FR-1: One must be able to unblock a group that looses majority, through group_replication_force_peer_addresses option on one ONLINE member. FR-2: One must be able to set group members in a way that a majority is not needed. FR-3: The DBA will provide a complete list of the members that she/he wants to belong to the reconfigured group. FR-4: The DBA will need to manually shutdown the members not included on the new group membership. FR-5: The members included on the new group membership must be fully functional. FR-6: The members included on the new membership that are on RECOVERING state will failover to another donor if the current donor is not present on the new membership. FR-7: It must be possible to set a empty value on group_replication_force_peer_addresses option, to clear its value. FR-8: Set group_replication_force_peer_addresses option to any value other than empty on a not ONLINE member will return a error. FR-9: When group_replication_force_peer_addresses option is already set before start Group Replication, like when set on configuration file, start will error out.
1. Context In the context of Group Communication, the technology under the hood of Group Replication, a typical scenario that might happen is the loss of majority in a group. This takes great relevance, since our typical Group Communication System (GCS) will run a consensus algorithm to make the decisions among the group members. Without a majority, no decisions can be made and the group will block for any operation. This can have different causes: * Catastrophic failure of several members; * A split brain scenario in which none of the partition retains a majority to proceed. As any other GCS toolkit, furthermore supported by heterogeneous types of GCSs underneath, MySQL GCS must provide means to unblock this situation if it is bound to happen. In the particular case of the implementation that is used as default - XCom, the PAXOS nature will make it block naturally if a majority is not reached in a proposed message. This action path must be enabled at Group Replication level, that is, DBA can force a blocked group to be reconfigured to a subset of its members, by setting option group_replication_force_peer_addresses on one of the alive members of the group with the new list of members. Example: Group with 5 members: 192.168.0.1:10000,192.168.0.2:10000,192.168.0.3:10000,192.168.0.4:10000,192.168.0.5:10000 Three members do crash: 192.168.0.3:10000,192.168.0.4:10000,192.168.0.5:10000 DBA can force the group to be shrink to: 192.168.0.1:10000,192.168.0.2:10000 and unblock it. SQL command: SET GLOBAL group_replication_force_peer_addresses= "192.168.0.1:10000,192.168.0.2:10000"; 2. Building blocks Lets consider a system that is configured with 5 members and 3 members crash in a catastrophic way leaving the system: * Being able to detect the crash but; * Without means to propose a new configuration, since it does not hold a majority of configured members. 2.1 New functionality Currently, any client from the MySQL GCS API, like Group Replication, only has access to the dynamic part of a group - Control interface, in which one can join, leave and receive notification from the status of a Group. One lacks the notion of a Configuration, meaning, what was planned for a certain Group. This would allow a clear distinction between the Plan, e.g., "I want a group with 6 members" from the State, e.g., "my group has 3 nodes alive and 3 nodes down". MySQL GCS did make available to its clients a new interface, called Configuration Management. For now this would have a single operation to allow group reconfiguration, to be used in these type of scenarios. This new functionality will be accessible on Group Replication through option group_replication_force_peer_addresses. 2.2 Behaviour/Use Case In order to use this functionality, the group is already in a stall state, without being able to accomplish anything. One must go to one of the members from the group section that we want it to be alive and set option group_replication_force_peer_addresses, injecting to the group the configuration that we want it to have, in the form of a list of group members that it considers alive. This is a way of creating an implicit Primary Partition, allowing us to unblock this situation. The list of members that should belong to the unblocked version of the group is of full responsibility of the DBA. This means that intervention is needed to determine which members fit best to proceed with group operations. It is not an automated operation. When this operation is made, one must make sure that we are not doing any group related operation at the same time like joins, leaves, in: * The member from where the reconfiguration operation is triggered; * Member that we decide to include in this new configuration. A restriction that must apply is that, when one reconfigures the group, it shall not be possible to add new members. The operation must be done only using nodes that were present in the configuration that stalled. Whoever triggers this must also have present that it needs to act upon the members that were put out of the configuration, taking them down or shutting down the service. Thinking on a scenario in which the members were dropped out because of a split, they might come back and try to make operations that they should accomplish, like joining a group or sending old data through the network. Members that are not present in the new configuration will remain blocked and will not be able to proceed either sending messages or regular configuration modifications. 2.3 Crash Scenarios Several crash scenarios can be exploited by this reconfiguration procedure. Lets describe some in the following chapters and how should one proceed. 2.3.1 Crash-and-recover scenario The simplest scenario is an actual crash. Consider members A to F. A,B and C crash. One must go either to D,E or F and inject a new configuration containing D,E and F. 2.3.2 Split Brain Scenario Considering the same set of members A to F. In any moment in time, a network partition occurs: {A,B,C} and {D,E,F}. The DBA decides that {D,E,F} are the eligible members to continue. She/He proceeds the same way that he did in the previous scenario and injects {D,E,F} in any of those members. Above all, she/he must make sure that all members that belonged to the configuration are indeed offline and cannot reach their old counterparts. But she/he can't just call STOP GROUP_REPLICATION since the member is blocked. Instead she/he needs to stop the process as a whole, performing one of the following alternatives: A1: 1. Kill all client connections to that member; 2. Remove clients from that member; 3. Shutdown the server. A2: 1. Crash the server; 2.3.3 Partial Group Definition Considering the same set of members A to F. In any moment in time, a network partition occurs or a crash happens: {A,B,C} are down somehow and {D,E,F} are deemed the best candidates to proceed. But the DBA, for some reason wants to exclude D and proceed with E and F. One must inject the new configuration either in E or F, and, like in the previous scenario, make sure that A,B,C and D are not reachable. 3. Future Work 3.1 Future work in MySQL GCS and Group Replication In a near future, one must consider a full-fledged solution for Group Management, in which a client, like Group Replication, provides a group, representing a static plan for what we envision that the members should be, and the View would be the current status of that planned group. In case of failures or split brain and working together with a Failure Detector provided by the MySQL GCS implementation, we could then augment this unblocking feature, just going to the partition that we want to promote and state "unblock the members that are online in this partition", instead of enumerating all members that we want to be active. 4. Side effects Since the new group membership is injected through group_replication_force_peer_addresses option, this action has side effects both on blocked and full operating systems. On both situations, setting group_replication_force_peer_addresses will change the group membership. The members not included on the the new group membership will not receive a new view and will be blocked, that is, them will not receive or send any data from/to the new group. The DBA will need to kill the excluded servers. 5. Documentation heads up Manual must be clear that the excluded members of a new group membership injected through group_replication_force_peer_addresses option must be killed manually be the DBA. The excluded members will be blocked, though on a future group membership reconfiguration they may be allowed on the group again and bad things may happen, since their data is outdated.
SUMMARY OF CHANGES ================== 1. New group_replication_force_peer_addresses option added. When it is set on a ONLINE member, its value is injected into group communication layer as its new membership. When it is set on a OFFLINE or RECOVERING member, a error is returned. Empty is always accepted and only clears the option value. SUMMARY OF CODE CHANGES ======================= --- a/rapid/plugin/group_replication/include/gcs_event_handlers.h +++ b/rapid/plugin/group_replication/include/gcs_event_handlers.h @@ -71,6 +71,21 @@ public: void start_view_modification(); /** + Signals that a injected view modification, like unblock a + group that did lost majority, is about to start. + */ + void start_injected_view_modification(); + + /** + Checks if the view modification is a injected one. + + @return + @retval true if the current view modification is a injected one + @retval false otherwise + */ + bool is_injected_view_modification(); + + /** Signals that a view modification has ended */ void end_view_modification(); @@ -96,6 +111,7 @@ public: private: bool view_changing; bool cancelled_view_change; + bool injected_view_modification; mysql_cond_t wait_for_view_cond; mysql_mutex_t wait_for_view_mutex; --- a/rapid/plugin/group_replication/src/gcs_event_handlers.cc +++ b/rapid/plugin/group_replication/src/gcs_event_handlers.cc @@ -190,6 +190,10 @@ Plugin_gcs_events_handler::on_view_changed(const Gcs_view& new_view, //Handle joining members this->handle_joining_members(new_view, is_joining, is_leaving); + + //Signal that the injected view was delivered + if (view_change_notifier->is_injected_view_modification()) + view_change_notifier->end_view_modification(); } void Plugin_gcs_events_handler::update_group_info_manager(const Gcs_view& new_view, @@ -618,7 +622,8 @@ Plugin_gcs_events_handler::check_compatibility_with_group() const } Plugin_gcs_view_modification_notifier::Plugin_gcs_view_modification_notifier() - :view_changing(false), cancelled_view_change(false) + :view_changing(false), cancelled_view_change(false), + injected_view_modification(false) { #ifdef HAVE_PSI_INTERFACE @@ -652,10 +657,30 @@ Plugin_gcs_view_modification_notifier::start_view_modification() mysql_mutex_lock(&wait_for_view_mutex); view_changing= true; cancelled_view_change= false; + injected_view_modification= false; mysql_mutex_unlock(&wait_for_view_mutex); } void +Plugin_gcs_view_modification_notifier::start_injected_view_modification() +{ + mysql_mutex_lock(&wait_for_view_mutex); + view_changing= true; + cancelled_view_change= false; + injected_view_modification= true; + mysql_mutex_unlock(&wait_for_view_mutex); +} + +bool +Plugin_gcs_view_modification_notifier::is_injected_view_modification() +{ + mysql_mutex_lock(&wait_for_view_mutex); + bool result= injected_view_modification; + mysql_mutex_unlock(&wait_for_view_mutex); + return result; +} + +void Plugin_gcs_view_modification_notifier::end_view_modification() { mysql_mutex_lock(&wait_for_view_mutex); --- a/rapid/plugin/group_replication/src/plugin.cc +++ b/rapid/plugin/group_replication/src/plugin.cc @@ -54,6 +54,7 @@ Read_mode_handler *read_mode_handler= NULL; char *gcs_engine_var; char *local_address_var; char *peer_addresses_var; +char *force_peer_addresses_var; my_bool bootstrap_group_var= false; //The plugin auto increment handler @@ -1434,6 +1435,92 @@ static void update_auto_increment_increment(MYSQL_THD thd, SYS_VAR *var, DBUG_VOID_RETURN; } +//Communication layer options. + +static int check_force_peer_addresses(MYSQL_THD thd, SYS_VAR *var, + void* save, + struct st_mysql_value *value) +{ + DBUG_ENTER("check_force_peer_addresses"); + + char buff[STRING_BUFFER_USUAL_SIZE]; + const char *str= NULL; + (*(const char **) save)= NULL; + + int length= sizeof(buff); + if ((str= value->val_str(value, buff, &length))) + str= thd->strmake(str, length); + else + DBUG_RETURN(1); + + // If option value is empty string, just update its value. + if (length == 0) + goto update_value; + + if (gcs_module == NULL || !gcs_module->is_initialized()) + { + log_message(MY_ERROR_LEVEL, + "Member is OFFLINE, it is not possible to force a " + "new group membership"); + DBUG_RETURN(1); + } + + if (local_member_info->get_recovery_status() == Group_member_info::MEMBER_ONLINE) + { + string group_id_str(group_name_var); + Gcs_group_identifier group_id(group_id_str); + Gcs_group_management_interface* gcs_management= + gcs_module->get_management_session(group_id); + + if (gcs_management == NULL) + { + log_message(MY_ERROR_LEVEL, + "Error calling group communication interfaces"); + DBUG_RETURN(1); + } + + view_change_notifier->start_injected_view_modification(); + + Gcs_interface_parameters gcs_module_parameters; + gcs_module_parameters.add_parameter("peer_nodes", + std::string(str)); + enum_gcs_error result= + gcs_management->modify_configuration(gcs_module_parameters); + if (result != GCS_OK) + { + log_message(MY_ERROR_LEVEL, + "Error setting group_replication_force_peer_addresses " + "value '%s' on group communication interfaces", str); + DBUG_RETURN(1); + } + log_message(MY_INFORMATION_LEVEL, + "Set group_replication_force_peer_addresses value '%s' " + "into group communication interfaces", str); + Wait_for_view_modification_result wait_for_view_modification_result= + view_change_notifier->wait_for_view_modification(VIEW_MODIFICATION_TIMEOUT); + if (wait_for_view_modification_result.first) + { + log_message(MY_ERROR_LEVEL, + "Timeout on wait for view after setting " + "group_replication_force_peer_addresses value '%s' " + "into group communication interfaces", str); + DBUG_RETURN(1); + } + } + else + { + log_message(MY_ERROR_LEVEL, + "Member is not ONLINE, it is not possible to force a " + "new group membership"); + DBUG_RETURN(1); + } + +update_value: + *(const char**)save= str; + + DBUG_RETURN(0); +} + //Base plugin variables static MYSQL_SYSVAR_STR( @@ -1483,6 +1570,18 @@ static MYSQL_SYSVAR_STR( NULL, /* update func*/ ""); /* default*/ +static MYSQL_SYSVAR_STR( + force_peer_addresses, /* name */ + force_peer_addresses_var, /* var */ + PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_MEMALLOC, /* optional var | malloc string*/ + "The list of peer addresses, comma separated. E.g., host1:port1,host2:port2. " + "This option is used to inject a new group membership, being the excluded " + "members from the new membership expelled from the group. The expelled " + "members will be blocked and do need to be shutdown by the DBA.", + check_force_peer_addresses, /* check func*/ + NULL, /* update func*/ + ""); /* default*/ + static MYSQL_SYSVAR_BOOL( bootstrap_group, /* name */ bootstrap_group_var, /* var */ @@ -1694,6 +1793,7 @@ static SYS_VAR* group_replication_system_vars[]= { MYSQL_SYSVAR(gcs_engine), MYSQL_SYSVAR(local_address), MYSQL_SYSVAR(peer_addresses), + MYSQL_SYSVAR(force_peer_addresses), MYSQL_SYSVAR(bootstrap_group), MYSQL_SYSVAR(recovery_retry_count), MYSQL_SYSVAR(recovery_use_ssl),
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.