WL#11570: GR: options to defer member eviction after a suspicion

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
=================

Automatically managing the cluster membership requires acknowledging that a
server has joined the group, but also that a server has left. The latter may
mean that a server left voluntarily (informed others that it was going away)
or involuntarily (others need to realize that it has gone away). Realizing
that a server has gone away and adjusting the group membership requires
monitoring of members' activity and reconfigurations when members are
possibly dead.
Currently, the period of time that goes between the suspected failure of a node
and its eviction from the group, causing a group membership reconfiguration,
is immutable.
This worklog separates the notion of a suspicion from the time the
node is expelled. As such, it creates suspicions for the nodes deemed
to have failed, expelling them when a certain time elapses. It also
allows users to customize a timeout for these suspicions through the new
group_replication_member_expel_timeout parameter, making the cluster tolerant
to network and machine delays, as well as suspended nodes, avoiding their
eviction if communication is resumed before that timeout elapses.


USER STORIES
============

- As a systems administrator, I want to be able to execute maintenance tasks,
  without causing a node running mysqld to be evicted from the group.

- As a MySQL DBA, I want to deploy group replication across a flaky
  network prone to false suspicions (such as WAN), so that I can have
  multi-site GR deployments for DR or proximity purposes.

- As a MySQL DBA, I want to add a new node or to remove an active node from the 
  group. Since group reconfigurations are suspended while there are active 
  suspicions, I want to be able to force the immediate expulsion of the suspect 
  nodes in order to perform the desired group membership changes.


PROBLEMS
========

Issues:

- A user knows for sure that a mysqld will be silent for a period of time
(not unavailable, simply slower) but has no facility to instruct the GCS expel
mechanism to wait for more than that before evicting the node. XCom suspects
that a node has failed if it can't communicate with it for more than 5 seconds.



Users/Customers want:

- A configurable failure detector window to allow for delays or suspension
of previously active nodes in the group. Currently, these nodes are expelled
immediately after being suspected.


NOTES
=====

N1. Note that even if one relaxes the expel mechanism from actually
    evicting the server, this does not mean that one can wait
    indefinitely. While the server is muted and not evicted, others in
    the group will have to keep messages in the buffer, so that when
    the server comes back, they can relay the missing messages to it.
    If the buffer is exhausted at some point in time, the lagging
    server must be evicted anyway.

N2. The user should ensure all nodes in the group share the same value for
    the new parameter. Otherwise, the used timeout value will be the one set in 
    the "killer node", which might not be the one that the user really wants to 
    use since the "killer node" status is determined automatically from the 
    node's position in the group.

N3. This feature can be used by systems administrators to perform maintenance
    operations, such as snapshots or even migrate a Virtual Machine to another
    host, by preventing that slow or paused nodes from being evicted from the
    group.
------------------------
Functional requirements:
------------------------

FR1. Suspicions must be created for all nodes deemed as failed by XCom's
failure detector.

FR2. The timeout for suspicions on previously active nodes assumes the
current value of the group_replication_member_expel_timeout option.

FR3. When a suspicion times out, it is destroyed. If the majority of the group 
members are active, the suspected member is evicted from the group.

FR4. If a suspect node becomes active before its suspicion timeout elapsed,
its suspicion is destroyed.

FR5. It is possible to update the value of the 
group_replication_member_expel_timeout option at any time.

FR6. No view changes caused by joins or leaves can occur while there are any 
suspect members in the group.


----------------------------
Non-Functional requirements:
----------------------------
NFR1. This feature will not affect the behavior of the already in-place
mechanism of expelling nodes that never became active in the group.
Introduction
============

There are several reasons to delay the eviction of a member from a group:
e.g. flaky or slow network, hosts maintenance...
This WL allows the user to define the time period that the group should wait
for a non-responding node, which can ultimately avoid its eviction.

Currently, when a node does not respond during 5 seconds or more, it becomes
suspect of having failed. If the node was previously active in the group,
it is evicted from the group right-away. Otherwise, if the node was joining
the group, a suspicion is created and only when this times out, the node is
removed from the group.


User Interface
==============

The following user interface is suggested:

- The period of time, in seconds, that a member waits before expelling from the 
group any member suspect of having failed.

  - name: group_replication_member_expel_timeout
  - unit: seconds
  - scope: group
  - value: 0, LONG_TIMEOUT
  - default: 0 (current behavior)

Note: A member, i.e. previously active node, is expelled when its suspicion
timeout has elapsed. This timeout takes the value of the new option
at the time that the processing thread verifies if the suspicion has timed out.
This value is added to the timestamp of the suspicion's creation, which occurs 
when XCom suspects that it failed, i.e. if the node is silent for 5 seconds or 
more, and if that sum is inferior to the current timestamp, this means that the 
suspicion has timed out and the node will eventually be expelled by the node 
itself if it created a suspicion for its own, or by the killer node in case the 
group has the majority of members active. On slow networks, or when there are 
expected machine slowdowns, users can increase the value of this option.

The value of the new group_replication_member_expel_timeout option can
be defined just like any other Group Replication options through the server's
configuration file, my.cnf by default, or using the MySQL client command
line interface, where it should be modified exclusively by an administrator.


Component Modification
======================

On this WL, we will introduce a new configuration option,
group_replication_member_expel_timeout, to define the time that goes between
the suspected failure of a group member and its eviction, causing a group
membership reconfiguration.
Suspicions are created for nodes that do not respond for 5 seconds, which
is XCom's failure detector timeout.
The minimum allowed value for a suspicion timeout is 0, which corresponds
to the current behavior where active nodes that become suspects of having
failed are immediately expelled.

The existing suspicions mechanism expels only nodes that take too long to
join the group, when the corresponding suspicions time out. This mechanism
will be modified to expel any node that is a suspect of having failed, in case 
the group has the majority of its members active. For this purpose, the 
suspicion's placeholder will be modified to determine the type of the node that 
is a suspect, in order to use the adequate timeout value.


Security Context
================
The functionality introduced by this WL is vulnerable to inconsistency of the
value of the new parameter in the various servers of the cluster. Since it is 
not possible to configure which server is the current "killer node", this server 
will expel suspects according to the value of its 
group_replication_member_expel_timeout parameter. If all the servers in the 
group have different values, this can lead to unpredictable behavior as any 
server can be the "killer node", using different timeouts to expel other suspect 
members when the group holds the majority, or even itself, if the server 
considers itself a suspect and the suspicion times out.


Upgrade/Downgrade and Cross-Version Replication
===============================================

This WL impacts on cross-version replication in the sense that up to date
servers may configure the new group_replication_member_expel_timeout parameter
with a value that's not the default one. If this occurs in a multi-version
cluster, with the "killer node" being a server running an older version, it may 
try to remove suspect nodes from the group even if their delay is expected and 
accounted for on servers running the latest version.
Another scenario, is when a server running an older version is suspended or 
loses connectivity for a period of time that is inferior to the configured 
timeout of the "killer node", so it returns to the group and receives a view 
where it is signaled as suspect of having failed, since it will receive all 
missing messages. This will cause the node expel itself from the group, since it 
creates its own suspicion and its timeout is 0.
=======================
1. Implementation Steps
=======================

During the implementation of this WL, there will be changes to
the Group Replication plugin to make it accept and process the new
group_replication_member_expel_timeout option, which allows a user to
define the waiting period before expelling nodes that were previously active
in the group but have been suspected of being dead.

The existing suspicions mechanism expels nodes that take too long to join
the group, when the corresponding timeout elapses. This mechanism will
be modified to expel any node that is a suspect of having failed.
For this purpose, the suspicions manager will be modified to store the
value of the newly introduced option, and the suspicion's placeholder will
be modified to determine the type of node that is a suspect, in order to use
the adequate parameter value to determine if the suspicions times out. For
nodes that never became active in the group, the value of the existing
internal parameter will be used for that, whereas for previously active
nodes the new option's value will be used. In any of these cases, if a
suspect node becomes active before the corresponding suspicion times out,
it receives and applies all the messages that were buffered by the remaining
members of the group, and its state should become ONLINE. Except in the case
that the value of the group_replication_member_expel_timeout parameter is 0
on a node that returns to a group before being expelled, which means other
group members have a higher value set for the parameter. In this case, the
returning node should receive all the messages it missed, including the view
where it was considered a suspect which causes the node suspect and expel
itself from the group.

Breaking down what will be done under the scope of this WL, there are the
following steps:
A-New parameter
A.1-Make GR accept parameter
A.2-GR conveys the parameter to GCS

B-Suspicions and behavior modification
B.1-Prevent the eviction of previously active members
B.2-Update new and existing suspicions with the member field
B.3-Create suspicions for members to expel
B.4-Modify suspicions processing thread
B.5-Make resumed node leave group if expelled

C-Update tests
C.1-Update unit tests
C.2-Update MTR tests

D-Limitations

================
A. New parameter
================

The introduction of a new option, group_replication_member_expel_timeout,
is central to this WL, as it will allow the user to customize the waiting
period between a suspicion on a node, namely when it is suspected to have
failed, and its eviction from the group.
A suspicion on a node is created when XCom's failure detector liveness
timeout ellapses without receiving any message from that node.
This option defines the suspicion timeout for previously active members
of the group and its value is defined in seconds. Its default and minimum
value is 0 seconds, which means there is no waiting time before expelling
this type of suspected nodes, which corresponds to the current behavior. We
will refer to these nodes as members, in contrast to the suspected joining
nodes, which are non-members, since they still haven't joined the group.


A.1 Make GR accept parameter
============================

The value of the new group_replication_member_expel_timeout option can
be defined like other Group Replication options through the server's
configuration file, my.cnf by default, or using the MySQL client
command line interface, where it should be modified exclusively by an
administrator.

my.cnf
------
[mysqld]
group_replication_member_expel_timeout=120
------

command line
-------
mysql> SET GROUP_REPLICATION_MEMBER_EXPEL_TIMEOUT=120;
-------


GR will need to accept the new group_replication_member_expel_timeout
option, and it will be processed during the plugin's startup or when its
value is modified through the client command line interface.
For this purpose the plugin.cc must be modified to add the option's
corresponding variable, ulong member_expel_timeout_var, and also its
registration as a system variable.


plugin.cc
------------------------------------------------------------------------------
/* Group communication options */
...
ulong member_expel_timeout_var = 0;

...

// GCS module variables
...
static MYSQL_SYSVAR_ULONG(
    member_expel_timeout,                                  /* name */
    member_expel_timeout_var,                              /* var */
    PLUGIN_VAR_OPCMDARG | PLUGIN_VAR_PERSIST_AS_READ_ONLY, /* optional var */
    "The period of time, in seconds, that a member waits before "
    "expelling any member suspected of failing from the group.",
    check_sysvar_ulong_timeout,  /* check func. */
    update_member_expel_timeout, /* update func. */
    0,                           /* default */
    0,                           /* min */
    LONG_TIMEOUT,                /* max */
    0                            /* block */
);

...

static SYS_VAR *group_replication_system_vars[] = {
...
    MYSQL_SYSVAR(member_expel_timeout),
    NULL,
};
------------------------------------------------------------------------------

The existing check_sysvar_ulong_timeout function is used to verify if
the received option's value is a positive integer lower than the value
of LONG_TIMEOUT.
The new update_member_expel_timeout function updates the value of the
corresponding parameter on GCS.


A.2 GR conveys the parameter to GCS
===================================

GR will convey the value of member_expel_timeout_var to GCS as all other
parameters during its initialization. Therefore, its value is set on the
"member_expel_timeout" parameter of the gcs_module_parameters variable,
which is then conveyed to the configure method of gcs_module.

plugin.cc
------------------------------------------------------------------------------
int configure_group_communication(st_server_ssl_variables *ssl_variables) {
...
  std::stringstream member_expel_timeout_stream_buffer;
  member_expel_timeout_stream_buffer << member_expel_timeout_var;
  gcs_module_parameters.add_parameter("member_expel_timeout",
                                      member_expel_timeout_stream_buffer.str());
...
  LogPluginErr(INFORMATION_LEVEL, ER_GRP_RPL_GRP_COMMUNICATION_INIT_WITH_CONF,
               group_name_var, local_address_var, group_seeds_var,
               bootstrap_group_var ? "true" : "false", poll_spin_loops_var,
               compression_threshold_var, ip_whitelist_var,
               communication_debug_options_var, member_expel_timeout_var);
...
}
------------------------------------------------------------------------------

The aforementioned configure method of the Gcs_operations class conveys the
parameters to the initialize method of Gcs_xcom_interface, which inherits
from the Gcs_interface interface.
Since these parameters are conveyed to the
Gcs_xcom_interface::configure_suspicions_mgr method, this will be modified
to retrieve the value of the member_expel_timeout parameter and set it on
the new m_member_expel_timeout field of the Gcs_suspicions_manager object,
using its setter method.
This new field, m_member_expel_timeout, and the corresponding getter and
setter, will be defined in gcs_xcom_control_interface.h and implemented in
gcs_xcom_control_interface.cc.
To better distinguish from the new field, we will rename the existing
suspicions_timeout parameter to non_member_expel_timeout, as well as the
SUSPICION_TIMEOUT macro to NON_MEMBER_EXPEL_TIMEOUT, and the field to store
this value will be renamed from m_timeout to m_non_member_expel_timeout.
The getters and setter for these two timeout fields and for
the m_suspicions_processing_period field will be guarded by
m_suspicions_parameters_mutex, and the value of these three fields can be
updated through the Gcs_xcom_interface::configure_suspicions_mgr method.
Some other fields will be added to Gcs_suspicions_manager, such as the
m_suspicions_cond condition used to signal the suspicions manager to process
suspicions, the m_is_killer_node to let the manager know if it is the
responsible for expelling the nodes when their suspicions time out, the
m_my_info to point to the node's own information and the m_has_majority
which lets the manager know if the majority of the group's members are active.
The constructor of Gcs_suspicions_manager will have to be modified
accordingly, to initialize the new fields.


gcs_psi.h
------------------------------------------------------------------------------
extern PSI_mutex_key
...
    key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex,;

------------------------------------------------------------------------------


gcs_psi.cc
------------------------------------------------------------------------------
PSI_mutex_key
...
key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex,
...


static PSI__info all_gcs_psi_mutex_keys_info[] = {
...
    {&key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex,
     "GCS_Gcs_suspicions_manager::m_suspicions_parameters_mutex",
     PSI_FLAG_SINGLETON, 0, PSI_DOCUMENT_ME},
...
------------------------------------------------------------------------------


gcs_xcom_control_interface.h
------------------------------------------------------------------------------
class Gcs_suspicions_manager {
 public:
...

  /**
    Retrieves suspicion thread period in seconds.
  */

  unsigned int get_suspicions_processing_period();

  /**
    Sets the period or sleep time, between iterations, for the suspicion
    thread.
    @param[in] sec Suspicion thread period
  */

  void set_suspicions_processing_period(unsigned int sec);

  /**
    Retrieves non-member expel timeout in 100s of nanoseconds.
    @return Non-member expel timeout
  */

  uint64_t get_non_member_expel_timeout();

  /**
    Sets the time interval to wait before removing non-member nodes marked to
    be expelled from the cluster.
    @param[in] sec Suspicions timeout in seconds
  */

  void set_non_member_expel_timeout_seconds(unsigned long sec);

  /**
    Retrieves member expel timeout in 100s of nanoseconds.
    @return Member expel timeout
  */

  uint64_t get_member_expel_timeout();

  /**
    Sets the time interval to wait before removing member nodes marked to be
    expelled from the cluster.
    @param[in] sec Expel suspicions timeout in seconds
  */

  void set_member_expel_timeout_seconds(unsigned long sec);

...
private:

  /*
    Suspicions processing thread period in seconds
  */
  unsigned int m_suspicions_processing_period;

  /*
    Non-member expel timeout stored in 100s of nanoseconds
  */
  uint64_t m_non_member_expel_timeout;

  /*
    Member expel timeout stored in 100s of nanoseconds
  */
  uint64_t m_member_expel_timeout;
...
  /*
    Condition used to wake up suspicions thread
  */
  My_xp_cond_impl m_suspicions_cond;

  /*
    Mutex to control access to suspicions parameters
  */
  My_xp_mutex_impl m_suspicions_parameters_mutex;

  /*
    Signals if node should remove suspect nodes from group.
  */
  bool m_is_killer_node;

  /*
    Pointer to this node's information
  */
  Gcs_xcom_node_information *m_my_info;

  /*
    Signals if group has a majority of alive nodes.
  */
  bool m_has_majority;
}
------------------------------------------------------------------------------


gcs_xcom_control_interface.cc
------------------------------------------------------------------------------
Gcs_suspicions_manager::Gcs_suspicions_manager(Gcs_xcom_proxy *proxy,
                                               Gcs_xcom_control *ctrl)
    : m_proxy(proxy),
      m_control_if(ctrl),
      m_suspicions_processing_period(SUSPICION_PROCESSING_THREAD_PERIOD),
      m_non_member_expel_timeout(NON_MEMBER_EXPEL_TIMEOUT),
      m_member_expel_timeout(0),
      m_gid_hash(0),
      m_suspicions(),
      m_suspicions_mutex(),
      m_suspicions_cond(),
      m_suspicions_parameters_mutex(),
      m_is_killer_node(false) {
  m_suspicions_mutex.init(
      key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_mutex, NULL);
  m_suspicions_cond.init(key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond);
  m_suspicions_parameters_mutex.init(
      key_GCS_MUTEX_Gcs_suspicions_manager_m_suspicions_parameters_mutex, NULL);
}
...

unsigned int Gcs_suspicions_manager::get_suspicions_processing_period() {
  unsigned int ret;
  m_suspicions_parameters_mutex.lock();
  ret = m_suspicions_processing_period;
  m_suspicions_parameters_mutex.unlock();
  return ret;
}

void Gcs_suspicions_manager::set_suspicions_processing_period(
    unsigned int sec) {
  m_suspicions_parameters_mutex.lock();
  m_suspicions_processing_period = sec;
  MYSQL_GCS_LOG_DEBUG("Set suspicions processing period to %u seconds.", sec)
  m_suspicions_parameters_mutex.unlock();
}

/* purecov: begin deadcode */
uint64_t Gcs_suspicions_manager::get_non_member_expel_timeout() {
  uint64_t ret;
  m_suspicions_parameters_mutex.lock();
  ret = m_non_member_expel_timeout;
  m_suspicions_parameters_mutex.unlock();
  return ret;
}
/* purecov: end */

void Gcs_suspicions_manager::set_non_member_expel_timeout_seconds(
    unsigned long sec) {
  m_suspicions_parameters_mutex.lock();
  m_non_member_expel_timeout = sec * 10000000ul;
  MYSQL_GCS_LOG_DEBUG("Set non-member expel timeout to %lu seconds (%lu  ns).",
                      sec, m_non_member_expel_timeout * 100);
  m_suspicions_parameters_mutex.unlock();
}

uint64_t Gcs_suspicions_manager::get_member_expel_timeout() {
  uint64_t ret;
  m_suspicions_parameters_mutex.lock();
  ret = m_member_expel_timeout;
  m_suspicions_parameters_mutex.unlock();
  return ret;
}

void Gcs_suspicions_manager::set_member_expel_timeout_seconds(
    unsigned long sec) {
  m_suspicions_parameters_mutex.lock();
  m_member_expel_timeout = sec * 10000000ul;
  MYSQL_GCS_LOG_DEBUG("Set member expel timeout to %lu seconds (%lu  ns).", sec,
                      m_member_expel_timeout * 100);
  m_suspicions_parameters_mutex.unlock();
}
------------------------------------------------------------------------------


gcs_xcom_interface.cc
------------------------------------------------------------------------------

enum_gcs_error Gcs_xcom_interface::configure_suspicions_mgr(
    Gcs_interface_parameters &p, Gcs_suspicions_manager *mgr) {
  enum_gcs_error ret = GCS_NOK;
  const std::string *non_member_expel_timeout_ptr =
      p.get_parameter("non_member_expel_timeout");
  if (non_member_expel_timeout_ptr != NULL) {
    mgr->set_non_member_expel_timeout_seconds(static_cast<unsigned long>(
        atoi(non_member_expel_timeout_ptr->c_str())));
    ret = GCS_OK;
    MYSQL_GCS_LOG_TRACE(
        "::configure_suspicions_mgr():: Set non-member expel timeout to %s "
        "seconds",
        non_member_expel_timeout_ptr->c_str())
  }

  const std::string *member_expel_timeout_ptr =
      p.get_parameter("member_expel_timeout");
  if (member_expel_timeout_ptr != NULL) {
    mgr->set_member_expel_timeout_seconds(
        static_cast<unsigned long>(atoi(member_expel_timeout_ptr->c_str())));
    ret = GCS_OK;
    MYSQL_GCS_LOG_TRACE(
        "::configure_suspicions_mgr():: Set member expel timeout to %s "
        "seconds",
        member_expel_timeout_ptr->c_str())
  }

  const std::string *suspicions_processing_period_ptr =
      p.get_parameter("suspicions_processing_period");
  if (suspicions_processing_period_ptr != NULL) {
    mgr->set_suspicions_processing_period(static_cast<unsigned int>(
        atoi(suspicions_processing_period_ptr->c_str())));
    ret = GCS_OK;
    MYSQL_GCS_LOG_TRACE(
        "::configure_suspicions_mgr():: Set suspicions processing period to %s "
        "seconds",
        suspicions_processing_period_ptr->c_str());
  }
  return ret;
}
------------------------------------------------------------------------------


When the value of the group_replication_member_expel_timeout option is
updated through the command line, the update_member_expel_timeout function
is invoked to convey the value to GCS and, ultimately, to the suspicions
manager. For this purpose, a new method for updating the parameters in GCS is
required. Gcs_operations::reconfigure will invoke the Gcs_interface::configure
method, which is implemented by Gcs_xcom_interface::configure. This method
ultimately invokes the Gcs_xcom_interface::configure_suspicions_mgr method,
that updates the value of the m_member_expel_timeout field.
The parameters passed as argument to the reconfigure method will include,
besides the new value of the member_expel_timeout parameter, the group name
and the new reconfigure_ip_whitelist parameter set to false, to avoid the
whitelist from being reconfigured.

gcs_operations.h
------------------------------------------------------------------------------
class Gcs_operations {
 public:
...
  /**
    Reconfigure the GCS interface, i.e. update its configuration parameters.

    @param[in] parameters The configuration parameters

    @return the operation status
      @retval 0      OK
      @retval !=0    Error
  */
  enum enum_gcs_error reconfigure(const Gcs_interface_parameters parameters);
...
}
------------------------------------------------------------------------------


The value of the member_expel_timeout parameter will be verified in the
is_parameters_syntax_correct function present in Gcs_xcom_utils.cc, in a
similar way to the validation of non_member_expel_timeout.


=======================================
B. Suspicions and behavior modification
=======================================

On this stage, we start by removing the mechanism for the immediate eviction
of previously active nodes, or members, when a global view is processed.
Then, we modify the Gcs_xcom_node_information class to store a boolean
indicating if the node is already a member of the group, and update the
creation and processing mechanisms of suspicions.
The last step on this stage is the creation of the suspicions and the
corresponding monitoring period for members to expel.


B.1 Prevent the eviction of previously active members
=====================================================

Currently, there are two types of nodes that are expelled from the group:
1. Non-members, which correspond to the nodes that probably failed while
entering the group, possibly during the initial state exchange. These nodes
aren't in the current set of group members and are marked as faulty in
the view.

2. Members that were previously active in the group, but are considered to
be faulty in the view.

All of these nodes are suspected of having failed by XCom's failure
detector, as it didn't receive any type of message from them over the last
5 seconds.

The current behavior for dealing with nodes to expel is that when a view is
received and processed by the Gcs_xcom_control::xcom_receive_global_view
method, both of these types of nodes are extracted from the
failed_members list into the suspect_members and expel_members
lists, by the Gcs_xcom_control::build_suspect_members and
Gcs_xcom_control::build_expel_members methods, respectively. Then,
the Gcs_suspicions_manager::process_view method is invoked with these two
lists, as well as the alive_members list, as parameters. It deletes existing
suspicions for members, that are in the alive_members or in the expel_members
lists, and creates new suspicions for non-members, nodes in the
suspect_members list, if they didn't previously exist.
After all this, if there are any nodes in the expel_members list, they are
removed from the group by the "killer node", which corresponds to the node
that currently holds the lowest position in the group, in the
Gcs_xcom_control::xcom_receive_global_view method.

On this WL, instead of simply expelling members, the nodes in the
expel_members list, which will be renamed to member_suspect_nodes,
suspicions are created for them later on, as currently occurs for
non-members, the nodes in the suspect_members list, which will be
renamed to non_member_suspect_nodes. Therefore, we eliminate the
removal of the nodes comprised in the expel_members list from the
Gcs_xcom_control::xcom_receive_global_view method.


B.2 Update new and existing suspicions with the member field
============================================================

Due to different types of nodes that can be suspected, the
Gcs_xcom_node_information class will be modified with the addition of a
new field, m_member, to indicate if the suspect node is or isn't a current
member of the group, and the corresponding getter and setter methods. The
Gcs_xcom_node_information::has_timed_out method will not be modified, as we
invoke it with the current timestamp and the timeout value that corresponds
to the type of the suspected node.


gcs_xcom_group_member_information.h
------------------------------------------------------------------------------
class Gcs_xcom_node_information {
 public:
...
  /**
    Compares the object's timestamp with the received one, in order
    to check if the suspicion has timed out and the suspect node
    must be removed.

    @param[in] ts Provided timestamp
    @param[in] timeout Provided timeout
    @return Indicates if the suspicion has timed out
  */

  bool has_timed_out(uint64_t ts, uint64_t timeout);
...

  /**
    Get whether the node is already a member of the group or not.
  */

  bool is_member() const;

  /**
    Set whether the node is already a member of the group or not.
  */

  void set_member(bool m);
...

 private:
...

  /**
    Whether the member is a member of the group or not.
  */
  bool m_member;
}
------------------------------------------------------------------------------


B.3 Create suspicions for members to expel
==========================================

To create suspicions for the nodes in the member_suspect_nodes list,
similarly to what occurs to nodes in the non_member_suspect_nodes
list, the expel_members list will be conveyed as a parameter to the
Gcs_suspicions_manager::add_suspicions method, upon its invocation in the
Gcs_suspicions_manager::process_view method. This method will be further
modified, namely, to set the if the suspected node is a group member.

gcs_xcom_control_interface.h
------------------------------------------------------------------------------
 /**
    Invoked by Gcs_suspicions_manager::process_view, it adds suspicions
    for the nodes received as argument if they aren't already suspects.

    @param[in] xcom_nodes List of all nodes (i.e. alive or dead) with low level
                          information such as timestamp, unique identifier, etc
    @param[in] non_member_suspect_nodes List of joining nodes to add to
                                        m_suspicions
    @param[in] member_suspect_nodes List of previously active nodes to add to
                                    m_suspicions
    @return Indicates if new suspicions were added
  */

  bool add_suspicions(
      Gcs_xcom_nodes *xcom_nodes,
      std::vector<Gcs_member_identifier *> non_member_suspect_nodes,
      std::vector<Gcs_member_identifier *> member_suspect_nodes);
------------------------------------------------------------------------------


B.4 Modify suspicions processing thread
=======================================

In order to maintain the current behavior when dealing with members to
expel, we have to change the suspicions processing thread, since it only
works periodically.
We want to wake it up in case there are members to expel immediately after
the creation of their suspicions.
For this purpose, we will include, in the Gcs_suspicions_manager class, a
new condition, m_suspicions_cond, that the thread will be waiting on during
m_period seconds, instead of sleeping. That will allow us to wake the thread
from the process_view method if the member_suspect_nodes list is not empty
and the value of the group_replication_member_expel_timeout option is smaller
than m_period.


gcs_psi.h
------------------------------------------------------------------------------
extern PSI_cond_key
...
    key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond;

------------------------------------------------------------------------------


gcs_psi.cc
------------------------------------------------------------------------------
PSI_cond_key
...
key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond;


static PSI_cond_info all_gcs_psi_cond_keys_info[] = {
...
    {&key_GCS_COND_Gcs_suspicions_manager_m_suspicions_cond,
     "GCS_Gcs_suspicions_manager::m_suspicions_cond", PSI_FLAG_SINGLETON, 0,
     PSI_DOCUMENT_ME}};
------------------------------------------------------------------------------


gcs_xcom_control_interface.h
------------------------------------------------------------------------------
class Gcs_suspicions_manager {
public:
...
  /**
    Do a timed wait on m_suspicions_cond for m_period seconds.
  */

  void timedwait();
}
------------------------------------------------------------------------------


Another change we will introduce in the Gcs_suspicions_manager::add_suspicions
method to invoke the signal method on m_suspicions_cond if suspicions for
members to expel were added and if the m_expel_timeout is smaller than
m_period. This should avoid any delays on evicting this type of suspicious
nodes, mimicking the current behavior.
Finally, we will modify the Gcs_suspicions_manager::process_suspicions
method by making nodes able to evict themselves from the group, if their
own suspicion times out, and also by making the killer node the single
responsible for evicting the nodes whose suspicions timeout, in case the
majority of the group members are active. These two conditions will be
stored in two new fields, m_is_killer_node and m_has_majority, in the
Gcs_suspicions_manager class. To update the value of m_is_killer_node
everytime a view is processed, an additional boolean parameter will be added
to the Gcs_suspicions_manager::process_view method, whereas the value of
m_has_majority is updated by comparing the size of the non_member_suspect_nodes
and member_suspect_nodes lists with the number of elements in xcom_nodes.
Part of the Gcs_suspicions_manager::process_suspicions method is refactored
into the new Gcs_suspicions_manager::run_process_suspicions method to allow
us to trigger deterministically the suspicions processing thread in the
unit tests.

gcs_xcom_control_interface.h
------------------------------------------------------------------------------
class Gcs_suspicions_manager {
public:
...
  /**
    Invoked by Gcs_xcom_control::xcom_receive_global_view, it invokes the
    remove_suspicions method for the alive_nodes and left_nodes parameters,
    if they're not empty, neither m_suspicions. It also invokes the
    add_suspicions method if the non_member_suspect_nodes and
    member_suspect_nodes parameter aren't empty.

    @param[in] xcom_nodes List of all nodes (i.e. alive or dead) with low level
                          information such as timestamp, unique identifier, etc
    @param[in] alive_nodes List of the nodes that currently belong to the group
    @param[in] left_nodes List of the nodes that have left the group
    @param[in] non_member_suspect_nodes List of joining nodes to add to
                                        m_suspicions
    @param[in] member_suspect_nodes List of previously active nodes to add to
                                    m_suspicions
    @param[in] is_killer_node Indicates if node should remove suspect members
                              from the group
  */

  void process_view(
      Gcs_xcom_nodes *xcom_nodes,
      std::vector<Gcs_member_identifier *> alive_nodes,
      std::vector<Gcs_member_identifier *> left_nodes,
      std::vector<Gcs_member_identifier *> member_suspect_nodes,
      std::vector<Gcs_member_identifier *> non_member_suspect_nodes,
      bool is_killer_node);
...
  /**
    Invoked periodically by the suspicions processing thread, it picks a
    timestamp and verifies which suspect nodes should be removed as they
    have timed out.

    @param[in] lock Whether lock should be acquired or not
  */

  void run_process_suspicions(bool lock);
...
}
------------------------------------------------------------------------------


B.5 Make resumed node leave group if expelled
==============================================

When a suspect node is able to return to the group before its suspicion
times out, it resumes normal operation. If it detects it was expelled before
returning to the group, it will switch to the ERROR state and install a
leave view to prevent conflicts with the view that is currently installed
on the remainder of the group.
For this purpose, we modified the Gcs_xcom_control::do_leave_gcs method by
renaming it to Gcs_xcom_control::do_leave_view and removing its parameter,
since it invokes the existing Gcs_xcom_control::install_leave_view method
with the adequate parameter.
We convey to the Gcs_suspicions_manager constructor a pointer to the
Gcs_xcom_control object, so that when the node's own suspicions times
out the suspicions manager is able to install a leave view by invoking the
Gcs_xcom_control::install_leave_view method, whose access was modified from
private to public.

Two new fields, m_leave_view_requested and m_leave_view_delivered, are added
to the Gcs_xcom_control class to determine if a leave view was requested
and delivered.
The value of m_leave_view_requested will determine the error code conveyed
to the Gcs_xcom_control::install_leave_view method: Gcs_view::OK in case it
is enabled, and Gcs_view::MEMBER_EXPELLED otherwise.


gcs_xcom_control_interface.h
------------------------------------------------------------------------------
class Gcs_suspicions_manager {
 public:
  /**
    Constructor for Gcs_suspicions_manager, which sets m_proxy with the
    received pointer parameter.
    @param[in] proxy Pointer to Gcs_xcom_proxy
    @param[in] ctrl Pointer to Gcs_xcom_control
  */

  explicit Gcs_suspicions_manager(Gcs_xcom_proxy *proxy,
                                  Gcs_xcom_control *ctrl);
...
}
...
class Gcs_xcom_control : public Gcs_control_interface {
 public:
...
  /**
    Sends a leave view message to inform that XCOM has already exited or
    is about to do so.
  */
  void do_leave_view();
...
  /**
    Notify that the current member has left the group and whether it left
    gracefully or not.

    @param[in] error_code that identifies whether there was any error
               when the view was received.
  */
  void install_leave_view(Gcs_view::Gcs_view_error_code error_code);
...
protected:
...
  /*
    Whether it was requested to make the node leave the group or not.
  */
  bool m_leave_view_requested;

  /*
    Whether a view saying that the node has voluntarily left the group
    was delivered or not.
  */
  bool m_leave_view_delivered;
...
}
------------------------------------------------------------------------------


A new notification class, Expel_notification, was added to process the node's
eviction from the group and it includes the m_functor field, which points to
the function to be executed when the Expel_notification::do_execute method
is invoked.

gcs_xcom_notification.h
------------------------------------------------------------------------------
...
typedef void(xcom_expel_functor)(void);
/**
  Notification used to inform that the node has been expelled or is about
  to be.
*/
class Expel_notification : public Parameterized_notification<false> {
 public:
  /**
    Constructor for Expel_notification.

    @param functor Pointer to a function that contains that actual
    core of the execution.
  */

  explicit Expel_notification(xcom_expel_functor *functor);

  /**
    Destructor for Expel_notification.
  */

  ~Expel_notification();

 private:
  /**
    Task implemented by this notification.
  */

  void do_execute();

  /*
    Pointer to a function that contains that actual core of the
    execution.
  */
  xcom_expel_functor *m_functor;

  /*
    Disabling the copy constructor and assignment operator.
  */
  Expel_notification(Expel_notification const &);
  Expel_notification &operator=(Expel_notification const &);
};
------------------------------------------------------------------------------


The existing xcom_fatal_error_cb function pointer is renamed to xcom_expel_cb
and the invocation to this callback was moved from the dispatch_op function
to the terminate_and_exit function, in order to let GCS know that XCom has
stopped, which occurs, for instance, when the node returns to the group and
detects that it was expelled.
The setter for this pointer is invoked in the
Gcs_xcom_interface::initialize_xcom method with the cb_xcom_expel callback
function as parameter, defined in the gcs_xcom_interface.cc file. This
functions is invoked by XCom to signal that the node left the group
because it received a view where it is not part of the group, or a die_op
was triggered. The existing cb_xcom_fatal_error function is renamed to
cb_xcom_expel and it creates a new Expel_notification which is pushed into
the gcs_engine, so the node will eventually process it and leave the group

gcs_xcom_interface.cc
------------------------------------------------------------------------------
...
void cb_xcom_expel(int status);
void do_cb_xcom_expel();
...
------------------------------------------------------------------------------


===============
C. Update Tests
===============


C.1 Update Unit Tests
=====================

We must update the affected unit tests to be consistent with the changes
introduced by this WL.
Therefore, we modify the gcs-parameters-t test file in order to verify
that the new member_expel_timeout and the renamed non_member_expel_timeout
parameters are processed and conveyed correctly by GCS.
We also modify the gcs_xcom_group_member_information-t test file
to update the invocation to set_timestamp and get_timestamp to
set_suspicion_creation_timestamp and get_suspicion_creation_timestamp
respectively.

On the gcs_xcom_control_interface-t.cc file, we removed the usage of a sync
point and inserted some new tests: SuspectMembersRemoval, to verify that the
eviction of previously active nodes having the default value on the new option
keeps the current behavior; ThreeSuspectNodesRemovalAfterTimeoutReset,
where we reset the values of the non_member_expel_timeout and
member_expel_timeout parameters to 0 and verify that the suspicions that
hadn't timed out finally do; SuspectMemberFailedRemovalDueToMajorityLoss,
where we verify that the eviction of a node is impeded by the fact that only
a minority of the group members are active.

We will also update some of the tests according to the behavior
modifications, namely FailedNodeRemovalTest, FailedNodeGlobalViewTest,
ThreeSuspectNodesRemoval, FalseThreeSuspectNodesWithdrawn and
ThreeSuspectNodesRemovalAndWithdrawn.


C.2 Update MTR Tests
====================

The gr_gcs_psi_mutex_cond, gr_persist_only_variables, gr_persist_variables
and gr_show_global_and_session_variables and gr_variables_default_values
test and result files, that belong to MTR's Group Replication suite must be
updated taking into account the introduction of a new instrumented condition
on Gcs_suspicions_manager.

New MTR tests will be implemented to verify if a suspect member
resumes before and after being expelled or crashed, respectively in
the gr_suspect_member_resumes, gr_suspect_member_resumes_after_expel,
gr_suspect_member_resumes_after_crash and
gr_suspect_member_expelled tests.
Two new MTR tests are added to verify the behavior of membership changes
while there is a suspect member in the gr_join_with_suspect_member and
gr_leave_with_suspect_member tests.
A new MTR test is added to verify the behavior of the suspicions mechanism
when suspicions timeout when the group doesn't have the majority of its
members active in the gr_majority_loss_restored_after_timeout test.


==============
D. Limitations
==============

D.1 Inability to reconfigure the group with suspect members
===========================================================

While there are any suspect members in the group, no regular group membership
changes lead to new views being installed, or to the election of a new leader, 
if it is a suspect member.
This is due to the fact that views containing any failed member, which
includes suspect members, are discarded.

In this scenario, i.e. where the group members suspect of at least one member
of the group:
-when a regular node wants to leave the group, it exits successfully, by
issuing the STOP GROUP_REPLICATION command. However, the corresponding view
is not installed by the remaining nodes in the group.
-when a new node wants to join the group, the START GROUP_REPLICATION command
times out.

To allow such group membership changes to be successful and propagated to all
the group members, all the current suspect members must return to the group,
if that is possible to be performed by the user, or be expelled. If it is
taking too long to expel a node, users can reset this timeout by setting
the group_replication_member_expel_timeout parameter to a different value
on all alive members.


This limitation will be dealt with in a future WL and should be highlighted
in the documentation so users are aware of the expected behavior.