WL#8443: Group Replication: Plugin version handshake on member join
Affects: Server-5.7
—
Status: Complete
Focused on Group replication, this WL shall present a solution to deal with the problem of having members with different versions on the group. One shall present an algorithm of Version Handshaking that will determine what different versions can belong to the same group and if upgrade and downgrade paths are allowed. The outcome of this WL shall be: - A version handshake algorithm implementation - A foundation on how to declare the compatibility between versions.
FR1: A member shall use the installing server version to discover its compatibility. FR2: When a new member joins an existing group, it should be aware of other member's version. FR3: A new joining member shall be able to automatically act upon the fact that it is not compatible with the rest of the group. FR4: If a member has a major version equal to the group, it shall join the group. FR5: If a member with a lower major version than the group joins, it shall leave the group. FR6: If a member has a higher major version than the group, it shall join the group but can't write to it. FR7: There should be a way for a developer to introduce exceptions to the rule defined in FR4. FR8: The exception mechanism defined in FR7 shall not be changeable by an End User. FR9: The rule FR5 can be disable by a user defined option.
With the continuous development of Group Replication, there is a need for a management of the different versions that can coexist in the same environment. This shall address the DBA classical problem of Interoperability issues, in which one states which version can operate with one another. For this to happen, this WL has two major tasks: - Define HOW one should state version interoperability. - Define WHEN this verification should happen and WHAT behavior the system should have. Regarding how this should be defined, there should be a clear and fixed rule stating the paths in which one can operate. An mechanism should also be in place to explicitly declare version interoperability exceptions, by a developer. The main requirement to both verification paths is that they cannot be changed by an End-User/DBA. This verification should occur every time a new member joins a group. This means that the joining member shall be responsible to check its interoperability status with the rest of the group and act upon it, if needed. This will avoid the implementation of eviction policies in the Group Communication framework. In terms of software architecture, a new component shall be created to store: - The Compatibility algorithm, with its rules and exceptions; - All operations that can be done upon it.
This section shall detail the architecture and implementation topics discussed in the HLS: how to define compatibility and how to enforce it in the execution flow. 1. Compatibility Definition One can state that versions are compatible if they are able to talk with each other in a compatible way. This means that we must define what is a breach in compatibility. Interoperability shall be deemed impossible when the messages exchanged between members become incompatible. This can happen in the following scenarios: - Message format changed: The messages that are exchanged between members and their encoding changed in an incompatible way. - Event format changed: The events that are exchanged between members changed. - Message Protocol changed: The Messages that are exchanged changed in an incompatible way by means of its order or even new messages. This means that you need to send and receive different messages when you belong to a group. We should have two ways to deduce this incompatibility: - Via a generic rule (Compatibility Rule) for all members; - Via static rules. The Compatibility Rule is supported by the fact that all version inside the same major version shall be compatible, unless something catastrophic happens. This rule can be seen as: - A member from the same major version can enter and work in a group; - A member with a superior major version that the ones in the group, can enter but can only listen to the group in a Read Only mode (WL#TBD). This shall be an enabler for an Upgrade process (WL#TDB) - A member with a lower version than the ones in the group shall not enter that group. The possibility to enter that group shall be detailed in a downgrade worklog. This does not apply to minor and patch versions, i.e, if two versions differ only on their minor/patch version, they are always compatible. As this incompatibly can be deduced automatically with the previous algorithm, sometimes it might not be enough since one could have the need to declare an explicit incompatibility with a version that would otherwise would be approved using only the Compatibility Rule. An example of this could be that, for instance, from version 2.3.3 to version 2.3.4, a message field was deleted rendering even Read-Only operations useless. For that, one must maintain a list in which a developer can explicitly state that version A is incompatible. This shall be made always regarding the current version. That structure should only contain versions that are incompatible with the local member version. This static check must happen before the Compatibility Rule. 2. Compatibility Algorithm As described in the HLS, it is easier to implement this in the joiner side. A rough joining algorithm would be: - A new member joins the group - At a low-level, the State Exchange occurs. - If the State Exchange fails, the member is deemed incompatible at that level and the join procedure must fail. This can be caused by a similar mechanism implemented in the GCS Layer. - The new member receives the new View and consequently, all Cluster Member Infos, that must now include Version Information. - The joiner checks if it is compatible with all members in the group. - First he checks the table of explicit exceptions; - Then it checks the generic rule; - If it deems itself fully incompatible, it voluntarily leaves the group. - If it deems itself partially incompatible, it voluntarily enters in read-only mode. - The previous two steps must happen before starting the Recovery algorithm. From the point of view of the existing group, in the case in which a new member evicts itself, they will see a new View being delivered, they will install it but in the meanwhile, a new view will arrive with the member leaving. This is the simplest option for now. One can think on improving this in which the already existing members run the algorithm, deciding if they will proceed with the View installation, but this can be addressed as an algorithm improvement. 3. Forced entry in the group. Even if a member is declared incompatible with the group due to the general rule that states that lower major versions are incompatible, the user can still force its entry. The plugin shall facilitate a user option that allows a lower version to join the group. While dangerous, it can be possible that the versions are not indeed incompatible or are so but only on some corner case, so a choice is given to the DBA. 4. Code Improvements In terms of code, one needs to create new modules and augment existing ones. Regarding new features, the plugin now needs to know: - The server version - How to broadcast its version - Inform others members and the end user about its version. In its development process, the Group Replication plugin is associated to a server version. This is the version used for compatibility purposes. But one needs to broadcast and receive information about all members versions. For that, Cluster Member Info shall hold an extra field stating the each local member version. That can be broadcast each time a new member joins, along with the existing information. A new module (Group_Replication_Versioning) must be created to hold: - Compatibility Matrix and support structures - Compatibility Algorithm - Methods(s) that allow one to check version compatibility e.g: bool is_compatible_with(Group_Replication_Version v); One should also consider augmenting the P_S interface in order to state the version of each member in Group_Replication_Members table.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.