WL#8443: Group Replication: Plugin version handshake on member join

Affects: Server-5.7   —   Status: Complete

Focused on Group replication, this WL shall present a solution to deal
with the problem of having members with different versions on the group.
One shall present an algorithm of Version Handshaking that will
determine what different versions can belong to the same group and if
upgrade and downgrade paths are allowed.

The outcome of this WL shall be:
- A version handshake algorithm implementation
- A foundation on how to declare the compatibility between versions.
FR1: A member shall use the installing server version to discover its
compatibility.

FR2: When a new member joins an existing group, it should be aware of
other member's version.

FR3: A new joining member shall be able to automatically act upon the
fact that it is not compatible with the rest of the group.

FR4: If a member has a major version equal to the group, it shall join
the group.

FR5: If a member with a lower major version than the group joins, it shall
leave the group.

FR6: If a member has a higher major version than the group, it shall join the
group but can't write to it.

FR7: There should be a way for a developer to introduce exceptions
to the rule defined in FR4.

FR8: The exception mechanism defined in FR7 shall not be changeable by an End
User.

FR9: The rule FR5 can be disable by a user defined option.
With the continuous development of Group Replication, there is a need for a
management of the different versions that can coexist in the same environment.
This shall address the DBA classical problem of Interoperability issues, in
which one states which version can operate with one another.

For this to happen, this WL has two major tasks:
- Define HOW one should state version interoperability.
- Define WHEN this verification should happen and WHAT behavior the
  system should have.

Regarding how this should be defined, there should be a clear and fixed
rule stating the paths in which one can operate. An mechanism
should also be in place to explicitly declare version interoperability
exceptions, by a developer. The main requirement to both verification
paths is that they cannot be changed by an End-User/DBA.

This verification should occur every time a new member joins a group.
This means that the joining member shall be responsible to check its
interoperability status with the rest of the group and act upon it,
if needed. This will avoid the implementation of eviction policies
in the Group Communication framework.

In terms of software architecture, a new component shall be created to
store:
- The Compatibility algorithm, with its rules and exceptions;
- All operations that can be done upon it.
This section shall detail the architecture and implementation
topics discussed in the HLS: how to define compatibility and
how to enforce it in the execution flow.

1. Compatibility Definition

One can state that versions are compatible if they are able to
talk with each other in a compatible way. This means that we must define
what is a breach in compatibility. Interoperability shall be deemed
impossible when the messages exchanged between members become
incompatible. This can happen in the following scenarios:
- Message format changed: The messages that are exchanged between
  members and their encoding changed in an incompatible way.
- Event format changed: The events that are exchanged between members
  changed.
- Message Protocol changed: The Messages that are exchanged
  changed in an incompatible way by means of its order or even new messages.
  This means that you need to send and receive different messages when
  you belong to a group.

We should have two ways to deduce this incompatibility:
- Via a generic rule (Compatibility Rule) for all members;
- Via static rules.

The Compatibility Rule is supported by the fact that all version inside the
same major version shall be compatible, unless something catastrophic happens.
This rule can be seen as:
- A member from the same major version can enter and work in a group;
- A member with a superior major version that the ones in the group, can enter
  but can only listen to the group in a Read Only mode (WL#TBD). This
  shall be an enabler for an Upgrade process (WL#TDB)
- A member with a lower version than the ones in the group shall not enter
  that group. The possibility to enter that group shall be detailed in
  a downgrade worklog. This does not apply to minor and patch versions, i.e,
  if two versions differ only on their minor/patch version, they are 
  always compatible.

As this incompatibly can be deduced automatically with the previous
algorithm, sometimes it might not be enough since one could have the
need to declare an explicit incompatibility with a version that would
otherwise would be approved using only the Compatibility Rule. An
example of this could be that, for instance, from version 2.3.3 to version
2.3.4, a message field was deleted rendering even Read-Only operations
useless.

For that, one must maintain a list in which a developer can explicitly
state that version A is incompatible. This shall be made always regarding
the current version. That structure should only contain versions that are
incompatible with the local member version. This static check must
happen before the Compatibility Rule.

2. Compatibility Algorithm

As described in the HLS, it is easier to implement this in the joiner
side. A rough joining algorithm would be:
- A new member joins the group
- At a low-level, the State Exchange occurs.
  - If the State Exchange fails, the member is deemed incompatible at
    that level and the join procedure must fail. This can be caused
    by a similar mechanism implemented in the GCS Layer.
- The new member receives the new View and consequently, all Cluster
  Member Infos, that must now include Version Information.
- The joiner checks if it is compatible with all members in the group.
  - First he checks the table of explicit exceptions;
  - Then it checks the generic rule;
- If it deems itself fully incompatible, it voluntarily leaves the group.
- If it deems itself partially incompatible, it voluntarily enters in
  read-only mode.
- The previous two steps must happen before starting the Recovery
algorithm.

From the point of view of the existing group, in the case in which a new
member evicts itself, they will see a new View being delivered,
they will install it but in the meanwhile, a new view will arrive with
the member leaving.

This is the simplest option for now. One can think on improving this
in which the already existing members run the algorithm, deciding if
they will proceed with the View installation, but this can be addressed
as an algorithm improvement.

3. Forced entry in the group.

Even if a member is declared incompatible with the group due to the general rule
that states that lower major versions are incompatible, the user can still force
its entry.
The plugin shall facilitate a user option that allows a lower version to join
the group.
While dangerous, it can be possible that the versions are not indeed
incompatible or are so but only on some corner case, so a choice is given to the
DBA.

4. Code Improvements

In terms of code, one needs to create new modules and augment existing
ones.

Regarding new features, the plugin now needs to know:
- The server version
- How to broadcast its version
- Inform others members and the end user about its version.

In its development process, the Group Replication plugin is associated
to a server version. This is the version used for compatibility purposes.

But one needs to broadcast and receive information about all members versions.
For that, Cluster Member Info shall hold an extra field stating the each local
member version. That can be broadcast each time a new member joins, along with
the existing information.

A new module (Group_Replication_Versioning) must be created to hold:
- Compatibility Matrix and support structures
- Compatibility Algorithm
- Methods(s) that allow one to check version compatibility
  e.g: bool is_compatible_with(Group_Replication_Version v);

One should also consider augmenting the P_S interface in order to state
the version of each member in Group_Replication_Members table.