WL#10378: Group Replication: group single/multi primary mode change and primary election

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

Executive Summary

This worklog implements a framework to do group-wide configuration changes. After this worklog is done, the user will be able to change single_primary_mode without having to stop group replication, and he will also be able to trigger the election of a specific member as the new primary.

Background

Group Replication can be configured to run on multi primary (multi-primary) or single primary mode. Both these modes have their use cases but situations may occur where the user may want to go from one to the other with no downtime and currently such a change would require a rolling shutdown of the group members.

This worklog aims to make the change from multi-primary to single primary possible with a simple invocation of a function. The group should coordinate to elect a primary, enable or disable the read only modes on the correct members and execute any other necessary step that is deemed necessary.

This worklog should also provide to the end user the much requested option to force the election of a new primary member of his/her choice. Until now the primary would be the first member of the group, or when it died/stopped, the one with the lowest UUID or greater member weight.

This last feature is included in this worklog as it is a derivation of the coordination process that must be in place for the above live changes from multi-primary to single-primary mode and vice-versa.

User Stories

  • As a MySQL DBA I want to change the primary from member A to member B without having to remove member A from the group, so I can demote member A to secondary.

  • As a MySQL DBA I want to change from single primary mode to multi primary mode without stopping group replication, so I can from now on write to multiple servers.

  • As a MySQL DBA I want to change from multi primary mode to single primary, so I adjust the deployment mode now that I have figured that multi primary is not really fitting my use case.

Functional requirements

  • FR1: To execute a coordinated group configuration change the member must be on ONLINE state and belong to a reachable group majority.

  • FR2: To execute a coordinated group configuration change, the user must have GROUP_REPLICATION_ADMIN privileges.

  • FR3: If the coordinated group configuration change is invoked with a server UUID, that value shall be valid and must belong to a member of the group. An error shall be outputed otherwise.

  • FR4: When the group is in multi-primary mode and the user causes a change to single-primary mode the group must:

    • FR 4.1: Elect a primary. If one is selected by the user, that member must be chosen.
    • FR 4.2: The primary shall be writable after processing the local backlog
    • FR 4.3: Secondaries shall enable the server super_read_only mode.
    • FR 4.4: Update everywhere checks is set to False but only after all transactions from the old primary are applied.
  • FR5: When the group is in single-primary mode and the user causes a change to multi-primary mode the group must:

    • FR 5.1: Update everywhere checks must be set to True on all members.
    • FR 5.2: All members must be writable, so read_only mode should be False
    • FR 5.3: When all members are writable, any transactional conflict must abort
  • FR6: When primary member is proposed

    • FR 6.1: A new election shall happen in all members appointing the proposed member as the new primary
    • FR 6.2: While all updates from the old primary previous to the election are not applied, the new primary must stay in read mode.
    • FR 6.3: While updates from the old and new primaries are in the group, any transactional conflict between them must abort.
  • FR7: When changing to multi primary mode the auto increment values of the server shall change to the plugin automatic values according to the group_replication_auto_increment_increment variable if no user set value is present.

  • FR8: When changing to single primary mode the auto increment values of the server shall return to the base values if no user set value is present.

  • FR9: No members can join the group while a coordinated group configuration change is occurring

  • FR10: No coordinated group configuration change can happen if one of the members is in recovery mode.

  • FR11: No more than one coordinated group configuration change can happen at the same time.

  • FR12: No coordinated group configuration changes are allowed if the group contains a member of a previous version that does not support it.

  • FR13: When electing a primary server, P, if any other member than P contains running slave channels, the configuration change shall abort.

  • FR14: When changing to single primary mode, if more than a member contains running slave channels, the configuration change shall abort.

  • FR15: When changing to single primary mode with no appointed primary, if a solo member exists with running slave channels, that member shall be the elected primary.

  • FR16: When a coordinated group configuration change involving primary election is running no slave channels can be start in the group members.

  • FR17: Any change to multi-primary when already in multi-primary is a no-op.

  • FR18: Any change to single-primary when already in single-primary is a no-op.

  • FR19: An attempt to elect a primary member when in multi primary is not a valid operation.
    An error saying to use the primary switch command is issued.

  • FR20: An attempt to elect a member as primary that is already the group primary member is a no-op.

  • FR21: Coordinated group configuration changes can be invoked in any member despite its primary or secondary role.

  • FR22: All changes to the primary mode shall be recorded with SET PERSIST meaning they will have effect even after a member restart.

  • FR23: When changing to single primary mode with no appointed primary, and no restrictions with slave channels exist, the new primary member shall be elected using weights or lexicographic order when all weights are equal.

  • FR24: When a coordinated group configuration change is accepted, even if the invoking member leaves or fails under a majority, the action will be executed in all online members.

  • FR25: Primary elections or change to multi-primary will be delayed until all transactions forbidden by enforce_update_everywhere_checks terminate.

  • FR26: When switching to a primary server or changing mode to single primary with an appointed primary, P, if P leaves or fails under a majority, before the election starts, the configuration change must abort.

  • FR27: When changing mode to single primary with an appointed primary, P, if P leaves or fails under a majority, when the primary election began but is not yet over, the change will not abort and adapt to the new elected primary throwing a warning.

  • FR28: When switching to a primary server, P, if P leaves or fails under a majority, when the primary election began but is not yet over, the configuration change will abort and the old primary will be elected if available. If not another member will be elected.

  • FR29: When switching to a primary server or changing mode to single primary with an appointed primary, P, if P leaves or fails under a majority, after the election finalizes, change terminates and the group elects a new primary. A warning is thrown to the user.

  • FR30: When electing a primary server, P, if any server S leaves or fails under a majority, the procedure shall not be affected and will resume.

  • FR31: Any member exit or failure under a majority shall not affect the process of changing to multi master mode.

  • FR32: After a coordinated group configuration change returns successfully to the user in the invoking member, its effects should be visible in all members.

  • FR33: If the user kills the query thread then the action and query threads shall be terminated.

  • FR34: If the group change coordination thread is killed but the distributed execution has already gone beyond a point where all servers agreed (cannot be canceled) then the action will complete.
    A warning shall be returned by the executing query stating the kill had no effect.

  • FR35: If the group change coordination thread is killed but the configuration process still has major tasks to complete the member shall leave the group and go into ERROR mode or abort.

  • FR36: When the plugin is stopped or leaves in error, while changing from single primary mode to multi primary mode, if the member did not set the single primary mode flag to false, then update everywhere checks shall remain false.

  • FR37: When the plugin is stopped or leaves in error, while changing from single primary mode to multi primary mode, if the member did already set the single primary mode flag to false, then update everywhere checks shall be true afer stop.

  • FR38: When the plugin is stopped or leaves in error, while changing from multi primary mode to single primary mode, if the member did not set the single primary mode flag to true, then update everywhere checks shall remain true.

  • FR39: When the plugin is stopped or leaves in error, while changing from multi primary mode to single primary mode, if the member did already set the single primary mode flag to true, then update everywhere checks shall be false afer stop.

  • FR40: When the plugin is stopped or leaves in error, plugin configurations when the configuration change terminates must be valid, even if not persisted with SET PERSIST.

  • FR41: All coordinated group configuration changes shall allow the DBA to check its progress.

  • FR42: Functions to execute coordinated group configuration changes are only present when the plugin is installed.

  • FR43: Any local failure in a coordinated group configuration change that prevents its progress shall make the server leave the group as its configuration may have deviated from the group.

  • FR44: Error in the election process that prevent its progress shall make the server leave the group or abort as its configuration may have deviated from the group.

  • FR45: Any failure to enable the read mode in the server for data protection shall result in a server abort.

  • FR46: Outside the scope of coordinated group configurations changes, if a primary member fails the new primary wont be writable until it executes all the transactions from the old primary.

  • FR47: Member weights for primary election cannot be changed when a coordinated group configuration change is occurring.

  • FR48: When a primary election is running, no coordinated group configuration change can be executed in the group.

  • FR49: The coordinated group configuration changes proposed on this worklog cannot be executed when there is an active table lock in the session.

Non functional feature requests:

  • NFR1: This WL must have no impact on transaction execution performance when no coordinated group configuration changes are being executed.

  • NFR2: This WL must only have a minor overhead during transaction commit when executing coordinated group configuration changes that depend on transaction monitorization.

1. Some definitions and considerations

  • Single primary mode: When only one member in the group accepts writes and all other members are in read mode.
    As the primary is the source of truth in the group, certification information is updated but not used for commit decisions.
    Restrictions around foreign keys and other multi-primary limitations described below do not apply to single primary mode.

  • Primary: The writable member in the single primary mode. If it dies or exits the group a new primary is elected.

  • Secondary(ies): The non writable members in the group. These members receive transactions executed in the primary from the group.

  • Multi primary mode: Also called multi-primary in the text, is when all members are writable.
    All transactions are certified and can rollback if they are concurrent and update the same data as other transaction committed in the group.
    Multi primary mode is subjected to some restrictions described below.

  • Update everywhere checks: Controlled by a plugin variable: enforce_update_everywhere_checks.
    With this var group replication can prevent the execution of transaction that cause updates to cascading foreign keys or use the serializable commit mode.

  • Election: The process where a member is appointed as the new group primary.
    Note that under this worklog we call it election even if there is no algorithmic selection of a member and the member is directly chosen externally.
    In either cases the distributed appointment of a member, the changes to the read modes and certification related tasks make what we call an election.

  • Certification: Certification is the process where the group decides which concurrently executed transactions, at different servers, are conflicting.
    If the output is negative, the transaction will rollback in all members.
    Throughout the worklog we mention points where we say certification is enabled or disabled, so a clarification here:
    Certification keeps collecting information about transactions when enabled or disabled.
    Being ON or OFF refers only to the certification output that is considered or not during on all transaction's commit process.

  • Coordinated Group Configuration Change: the group of coordinated steps needed to execute a change to the group. These are many times addressed simply as group actions or coordinated group actions throughout the worklog.

  • The action coordinator: The central blocks that coordinates the execution of actions in the group. It guarantees that only one action can execute at a time.

  • UDF: These are User Defined Functions, a mechanism that allows us to add functionality to the plugin without coding new parser commands.
    Installed by us at plugin install, they allow the user to invoke a new action code by us in the plugin.

  • Group Replication and slave channels:
    As described, when in primary mode it is assumed that only one member in the group is the source of all updates to the group's data.
    This has some implications in the election process pointed in this worklog.
    If one member has an active slave channel receiving data from an external source, this member must be the primary in the group, and no primary switches are allowed.
    Such switches would mean that two different sources of updates would now exist in the group.

Note: All plugin variables in this worklog are often referred in the text without the prefix group_replication_ to decrease verbosity.
Example: enforce_update_everywhere_checks


2. Coordinated Group Configuration Changes - the basics

Such changes as the ones proposed here, where one dynamically alters the single/multi primary mode are operations that require the coordination of all group members in the execution of a set of steps to achieve the desired result.

So what this worklog intends to implement is a configuration module for task coordination but also a set of well defined operations that are used to achieve the wanted configuration changes.

The idea is that this coordinator and operations could be used on the implementation of new requests in the long run.


Coordinated Group Configuration Changes: The Coordinator

The coordinator shall work based on 3 phases

  1. Coordinate the start of the action

  2. Execute the action

  3. Return the status of the execution to the user and declare the end of the action.

1) and 3) are coordinator steps, common to all actions. 2) is specific to each action.

On the first phase the coordinator shall send a message to the group stating the action to execute.
If an concurrent action exists it should abort the latest one before execution.

On the first phase, the coordinator shall send a message to the group stating that action A is to be executed.
If a concurrent action B is already taking place, then A is aborted.
The order of execution is established by the total order delivery guarantee of the GCS (Paxos).

This first phase also makes use of this total order delivery guarantees to check some general validations like: there is no member of an older version, there is no member on recovery, etc.

Much like the first phase, the third phase also needs to be coordinated between all members by means of sending a message to the group.

Otherwise, since the operations are asynchronous, members that are still executing action A would refuse a new action B while others that had finished A already would accept and start B.

Validations and execution vary with each action though so each action has their own implementations.
We shall call these group actions, described below.

In summary the coordinator shall coordinate the start and finish of a action invoking the corresponding action block, only returning when that action is finished.


Coordinated Group Configuration Changes: Group Actions

An group action shall then contain the basic method for execution.
This method shall be implemented for all actions.

execute_action()

Group Actions will also contain two methods

get_action_message(Message)

and

process_action_message(Message)

The idea here is that each action will encode their own parameters and decode them.
It is up to the coordinator to get this message when the action is invoked and give it to all members when it is accepted.

Stopping an action is also a key operation so failure and plugin stop situations.

stop_action_execution

Also, for debug and identification purposes these classes should expose their names.

get_action_name()

For now the coordinator will handle 2 actions

  • Multi primary mode migration

  • Single Primary election

The first shall handle changes from single-primary mode to multi-primary setups.

The second should handle the inverse conversion but also handle the primary election of a specific member.


Coordinated Group Configuration Changes: Actions invocation

First a note on how the option group_replication_single_primary_mode is still in effect and the DBA can still configure the member to start in a mode or another.

The changes from one mode to the other in a live group do not depend on vars though but on new introduced user defined functions.
This way the change is made trough a function that denotes an implicit action and not trough a variable change.

These functions are:

  • Changes from multi-primary to single primary

Base command:

 SELECT group_replication_switch_to_single_primary_mode()

The above command shall be invoked by the user to change to single primary mode, being the election controlled by the configured election weights.
If the user wants to appoint a primary in the process it executes:

 SELECT group_replication_switch_to_single_primary_mode(server_uuid);

Any invocation of these functions in a group already in this mode will cause no visible changes.

  • Changes from single-primary to multi-primary

Base command:

 SELECT group_replication_switch_to_multi_primary_mode();

This function has no parameters.
Any invocation in a group already in multi-primary mode will cause no changes.

  • Election of a new primary

Base command:

 SELECT group_replication_set_as_primary(server_uuid);

This function will not cause changes to single primary mode if the group is running on multi-primary mode.


Configuration changes: Algorithm components

To switch from the single primary to multi-primary mode and vice-versa the following steps/code units are necessary.

A. primary election: invoke primary election in a member

B. Disable/Enable certification

C. primary validation: check if the selected member is valid. This may include - the old primary has running slave channels - the user is selecting a member with version N+1 in a group with member of version N.

D. Wait for execution of the current set of local transactions

E. Set/Get plugin vars. This includes: - single_primary_mode - enforce_update_everywhere_checks

F. Wait for the execution of current relay log transactions

G. Message sending / reception

H. Enable/Disable the super read only mode

From this list:

  • A) needs to be refactored in terms of code and message flow for safety reasons.

  • B) need minor refactors in order to be reusable

  • C) and D) are new utilities that we need to build from scratch

  • E) requires a new code module as we want to use SET PERSIST for these variables.

  • F) We enhance this code with the hability to wait for the consuption of the group replication applier module queue before waiting for the execution of the transactions.

  • The rest can be used out of the box or by using current plugin methods. We do add the option to kill read mode queries in some situations though.


Configuration changes: How it works - a summary

To sum this section, here is a summary of how it all comes together.

  1. The user triggers an action in the plugin (Using UDFs in this worklog)

  2. The plugin parses the parameters and creates the correspondent Group Action instance.

  3. The group action is submitted to the coordinator.

  4. The coordinator gets the action message from the group action class and sends it to all members

  5. If accepted, all members (except the invoking one) instantiate the same group action class.
    The action message is given to the group action object for parsing.

  6. All members execute the action

  7. All members send a termination message when over.

  8. All members declare the action as finished when everyone terminates.
    The invoking member returns the result to the client.


Configuration changes: How it works - message Diagram

+-------------------+                  +-------------------------+                  +--------------------+
| ..member 1 (m1).. |                  | .....member 2 (m2)..... |                  | ...member3 (m3)... |
|                   |                  |                         |                  |                    |
|                   |                  | UDF function execution  |                  |                    |
|                   |   Group action   |     new Group_Action    |   Group action   |                    |
|                   |  start message   |           .             |  start message   |                    |
| new Group_action  | <--------------- |    send start message   | ---------------> |  new Group_action  |
| execution         |             \--> |       execution         |                  |     execution      |
|        +          |                  |           *             |                  |          +         |
|        +          |                  |           +             |                  |          +         |
|        +          |                  |           +             |                  |          +         |
|        +          |  Group action    |           +             |   Group action   |          +         |
| send end message  | end message (m1) |           +             | end message (m1) |          +         |
|        .          | ---------------> |           +             | ---------------> |          +         |
|        .          | <--/             |           +             |                  |          +         |
|        .          |                  |           +             |                  |          +         |
|        .          |                  |           +             |                  |          +         |
|        .          |   Group action   |           +             |   Group action   |          +         |
|        .          | end message (m3) |           +             |  end message(m3) |          +         |
|        .          | <--------------  |           +             | <--------------- |  send end message  |
|        .          |                  |           +             |           \----> |          .         |
|        .          |                  |           +             |                  |          .         |
|        .          |   Group action   |           +             |   Group action   |          .         |
|        .          | end message(m2)  |           +             |  end message(m2) |          .         |
|   declare action  | <--------------  |    send end message     | -------------->  |   declare action   |
|     finished      |             \--> | declare action finished |                  |     finished       |
|                   |                  |      UDF returns        |                  |                    |
|                   |                  |                         |                  |                    |


3. From single to multi primary

The first on the list is the change from when the member is on single primary mode to multi-primary.
We start to design this one as the inverse change is more complex.

In this change, all members become writable, but there is also restrictions to the allowed transactions in the group that must be enforced.

The HLD for this operation is then:

  1. A message is sent to all members starting the configuration change in all members, same for the invocation member.

  2. All members set enforce_update_everywhere_checks to true.

  3. The primary waits for all transactions currently running to be processed by GR.
    These transactions can have updates to tables with cascading FK for example, something that can cause issues in a multi-primary environment.

  4. A message is sent to all members meaning: "I executed all running transactions, from now on, all transaction are safe."

  5. When members receive this message they queue a packet in the plugin pipeline that will activate certification.

  6. In the secondaries we extract the current GTIDs queued in the applier relay log and wait for its application.

  7. Every member can set the single_primary_mode to false.
    Members invoke a SET PERSIST instruction to make the option persistent.
    The enforce_update_everywhere_checks is also made persistent here.

  8. All members change the auto increment settings to the automatic values to avoid transaction collision.
    Previous values are cached.

  9. All secondaries can disable the read only mode when they complete step 6.

  10. All members send a message when the action terminates.
    When N messages are received, the action terminates.


4. From multi-primary to primary / primary election

In this change, there is an election of a new primary, either selected by the internal election algorithm or appointed by the user.
So, either when the user changes the primary mode to true or when it sets this variable the same set of tasks will be executed in the group.

In this change, only one member becomes writable and the transaction limitations that are enforced on multi-primary are no longer needed.

The HLD for this operation is then:

  1. A message is sent to all members starting the configuration change in all members, same for the invocation member.

  2. A validation phase is executed:
    If the candidate must be of the lowest version present in the group.
    Same thing for invalid uuids passed as an argument or the member is no longer in the group.
    Everyone sends a message stating the existence of slave channels.
    If more than a member has slave channels: error out
    If slave channels exist in a member that is not the selected primary: error out
    If no primary is appointed and a sole member exists with slave channels, force that member to be the new primary.

  3. [Extra step] If a primary already exists:
    All members set enforce_update_everywhere_checks to true. The primary waits for all transactions currently running to be processed by GR.
    This means the old primary, if present, is the one that sends the primary election request message.

  4. Run primary election on all members, using either the present election algorithm or choosing the user appointed member.
    Member roles change as a result of the selection process. Under the new election algorithm (section 5) the new appointed primary will wait for messages from the old primary (if existant) up to this point.

  5. The new primary will send a message to all members that election can continue and members also update the read mode status at this point.
    Note that setting read mode to true will wait for executing transactions meaning this must be done in a spawned thread.

  6. Everyone queues a message in the applier pipeline to re-enable the certifier (when migrating from multi-primary it is already enabled).

  7. The new primary waits for a message from all members stating when they are on read mode. The primary also states it set his read mode to false.

  8. When all member receive N messages they can set single_primary_mode to true. SET PERSIST is used here to make the mode persistent.
    Secondaries can set enforce_update_everywhere_checks to false.
    This step can be skipped if already on single primary mode.

  9. When the primary receives N messages move to step 10

  10. The primary shall wait for all the applier relay log to be consumed.
    It sends a packet informing about this change.
    Only then it can set enforce_update_everywhere_checks to false to avoid concurrency between local and remote transactions.

  11. The primary should return values for auto increment to the user cached values if changing the mode.
    These were stored before the multi primary values were set to avoid collisions.

  12. All members disable the certifier when they receive the packet from the primary. Secondaries also change the auto increment settings for easy future primary failovers.

  13. All members send a message when the action terminates. When N messages are received, the action terminates.

Note: Steps like 6), the first part of step 10) and 12) are already part of the current primary election process. They are only placed here for clarity.


5. Changes to Primary election

In order to solve an old safety issue surrounding primary election and the algorithms presented here we propose a change to the election mechanism.

Dwelling into it, the base of the issue is that single primary mode allows the execution of transactions that could lead to data divergence in multi primary setups.
One of these examples are transactions that have foreign keys and cascading side effects that could lead to different execution results in different members.
In theory such transactions are safe because the primary is the only source of truth in the group at all times.

Until now, when changing from one primary to the other, the new primary would accept new transactions the moment it was elected.
This meant that, for a window of time, it was possible for such transactions coming from both the old and the new primary to be executed in concurrency.
This breaks the assumption that a sole source of truth exists at each moment in time.

Other point of concern is that we must wait for old primaries to be in read mode (not applicable to crashes situations).
This is something that can take time, and certification must be ON during this period.

For these reasons this algorithm will now change and elections will have 5 stages whose invocation depends on the context of the election.

  1. When the old primary dies or there is a change of the primary member, every member does an election and chooses the new primary.
    The algorithm, or appointed server parameter make the election have the same outcome on all members.

  2. The new appointed primary waits for its relay log queue to be consumed totally or in part.
    If the old primary failed we wait for the relay log to be fully consumed as no more messages will arrive.
    If this is a switch from the old to the new primary then we should wait for the transactions of all messages up to this point.
    This prevents the local vs remote conflicts that would lead to data divergence.

  3. When these transactions are consumed the member elected will send a message to all members and when received they all change their read modes according to their roles.
    Enabling the read mode on members must be done in a spawned thread is it would deadlock with ongoing transaction messages in the GCS layer.

  4. When receiving this message the members might or might not enable certification.

  5. If an old primary(ies) is still present, then the new primary must wait for all members to send a message stating they are on read mode.
    When it receives this message it will wait for its current relay log backlog to be executed, instructing the members to disable certification afterwards.

A note here about how this is a choice of safety over availability on failure scenarios so it may result on write downtime for the end user.
On the other hand, during live primary switches we have the option to restrict and monitor user transactions to preserve availability.

In terms of version coexistence, if a member of a version 5.X or previous to this WL release is present in the group, the old primary election algorithm will be the one executed.


Primary election: Brief look into the primary election scenarios.

If the old primary dies:

On failure cases the old primary dies and when the new one is elected there could still be some old transactions being applied on this member.
For this reason we need to wait for the execution of the transactions from the old primary before declaring the new primary writable.

So in this situation steps 1, 2 and 3 are executed to ensure safety.

One particularity of this case is that there is only one source of truth at a time, i.e., we ensure a member does not accept writes when applying updates from another member.
In practice this means there is no wait for the old primary to be on read mode and no certification activation is needed.

| ....member 1 (m1).... |             | .....member 2 (m2)..... |                          | .....member3 (m3)..... |
|     (Old Primary)     |             |       (Secondary)       |                          |     (New Primary)      |
|  (read mode is OFF)   |             |     (read mode is ON)   |                          |   (read mode is ON)    |
|                       |             |                         |                          |                        |
|       Failure         | View change |                         |       View change        |                        |
|                       | ########### |       View change       | ######################## |      View change       |
|                       |             |     elect a primary     |                          |    elect a primary     |
|                       |             |       m3 elected        |                          |      m3 elected        |
|                       |             |                         |                          |   wait for queue = 0   |
|                       |             |                         |                          |  wait for transaction  |
|                       |             |                         |                          |       execution        |
|                       |             |                         |                          |          +             |
|                       |             |                         | Primary election message |          +             |
|                       |             |                         | primary is ready         |          +             |
|                       |             |                         | <----------------------- |    backlog executed    |
|                       |             |     Read mode = ON      |                  \       |          .             |
|                       |             |                         |                   \----> |    Read mode = OFF     |
|                       |             |                         |                          |                        |
|                       |             |                         |                          |                        |

If we switch from one primary to another:

This scenario is the most complex one in order to preserve safety but also availability, something we cannot in the above case.

Since we are focusing on the primary election part, lets recall that under the full primary change algorithm there is a first phase where the old primary enables enforce_update_everywhere_checks.
So when election is invoked step 2 is executed to ensure all transactions executed before changing this variable are processed in the new primary.

Note that the old primary is still accepting requests until step 3 is invoked.
Hence, when the read mode is set, it must be done outside the GCS framework or else it could deadlock against running transactions waiting for certification messages.

It is also for this reason, that when the switch happens there are updates being executed from both the new and the old primary so we need to execute step 4.

So, now the new primary will wait for all members to be in read mode.
When that happens then it will wait on it back log, instructing the all members, when finished, that certification can be turned off.

| ....member 1 (m1).... | ........................ | .....member 2 (m2)..... | ........................ | .....member3 (m3)..... |
|     (Old Primary)     |                          |       (Secondary)       |                          |     (New Primary)      |
|  (read mode is OFF)   |                          |     (read mode is ON)   |                          |   (read mode is ON)    |
|                       |                          |                         |                          |                        |
|   Action Invocation   |                          |                         |                          |                        |
|      validations      |                          |                         |                          |                        |
| update checks = true  |                          |                         |                          |                        |
|   wait for ongoing    |                          |                         |                          |                        |
|    transactions       |                          |                         |                          |                        |
|         +             |                          |                         |                          |                        |
|         +             |                          |                         |                          |                        |
|   primary election    |                          |                         |                          |                        |
|                       |                          |                         |                          |                        |
/////////////////////////////////////////////////////// Primary Election /////////////////////////////////////////////////////////
|                       |                          |                         |                          |                        |
| Invoke an election    | Primary election message |                         | Primary election message |                        |
|    Send message       |  elect a new member (m3) |                         |  elect a new member (m3) |                        |
|                       | -----------------------> |                         | -----------------------> |                        |
|   elect a primary     | <----/                   |     elect a primary     |                          |     elect a primary    |
|     m3 elected        |                          |       m3 elected        |                          |       m3 elected       |
|                       |                          |                         |                          |  [wait for queue = 0]  |
|                       |                          |                         |                          | [wait for transaction  |
|                       |                          |                         |                          |      execution]        |
|                       |                          |                         |                          |          +             |
|                       |                          |                         |                          |          +             |
|                       | Primary election message |                         | Primary election message |          +             |
|                       |     primary is ready     |                         |     primary is ready     |          +             |
|                       | <----------------------- |                         | <----------------------- |   backlog executed     |
|  enable certification |                          |   enable certification  |                  \       |                        |
|  [Set Read mode = ON] |                          |   [Set Read mode = ON]  |                   \----> |  enable certification  |
|          +            |                          |            +            |                          | [Set Read mode = OFF]  |
|          +            | Primary election message |    Read mode = true     | Primary election message |                        |
|          +            | member in read mode (m2) |                         | member in read mode (m2) |                        |
|          +            | <----------------------- |                         | -----------------------> |                        |
|          +            |                          |                         | <----/                   |                        |
|          +            |                          |                         |                          |                        |
|          +            |                          |                         |                          |                        |
|          +            |                          |                         |                          |                        |
|   Read mode = ON      | Primary election message |                         | Primary election message |                        |
|                       | member in read mode (m1) |                         | member in read mode (m1) |                        |
|                       | -----------------------> |                         | -----------------------> |   [Wait for backlog]   |
|                       | <----/                   |                         |                          |           +            |
|                       |                          |                         |                          |           +            |
|                       | Single primary message   |                         | Single primary message   |           +            |
| disable certification | <----------------------- | disable certification   | <----------------------- |    backlog executed    |
|                       |                          |                         |                   \----> |  disable certification |
|                       |                          |                         |                          |                        |
|                       |                          |                         |                          |                        |

Operations surround by "[]" mean they are executed in spawned process and not on the GCS stack.

If we switch from multi primary to a single primary:

When electing a primary coming from a multi primary group one thing to have in mind is that enforce_update_everywhere_checks was already true before the election.
In practice this means that there is no need to wait for the execution of transactions from the old primary.
So the above described step 2 is skipped here.
Apart from that, the algorithm remains the same.

| ....member 1 (m1).... | ........................ | .....member 2 (m2)..... | ........................ | .....member3 (m3)..... |
|   (Multi Primary)     |                          |     (Multi Primary)     |                          |   (Appointed primary)  |
|  (read mode is ON     |                          |    (read mode is ON)    |                          |    (read mode is ON)   |
|                       |                          |                         |                          |                        |
|   Action Invocation   |                          |                         |                          |                        |
|      validations      |                          |                         |                          |                        |
|   primary election    |                          |                         |                          |                        |
|                       |                          |                         |                          |                        |
/////////////////////////////////////////////////////// Primary Election /////////////////////////////////////////////////////////
|                       |                          |                         |                          |                        |
| Invoke an election    | Primary election message |                         | Primary election message |                        |
|    Send message       |  elect a new member (m3) |                         |  elect a new member (m3) |                        |
|                       | -----------------------> |                         | -----------------------> |                        |
|   elect a primary     | <----/                   |     elect a primary     |                          |     elect a primary    |
|     m3 elected        |                          |       m3 elected        |                          |       m3 elected       |
|                       |                          |                         |                          |         /              |
|                       | Primary election message |                         | Primary election message |        /               |
|                       |     primary is ready     |                         |     primary is ready     |       /                |
|                       | <----------------------- |                         | <----------------------- | ------                 |
|  enable certification |                          |   enable certification  |                  \       |                        |
|  [Set Read mode = ON] |                          |   [Set Read mode = ON]  |                   \----> |  enable certification  |
|          +            |                          |            +            |                          | [Set Read mode = OFF]  |
|          +            | Primary election message |    Read mode = true     | Primary election message |                        |
|          +            | member in read mode (m2) |                         | member in read mode (m2) |                        |
|          +            | <----------------------- |                         | -----------------------> |                        |
|          +            |                          |                         | <----/                   |                        |
|          +            |                          |                         |                          |                        |
|          +            |                          |                         |                          |                        |
|          +            |                          |                         |                          |                        |
|   Read mode = ON      | Primary election message |                         | Primary election message |                        |
|                       | member in read mode (m1) |                         | member in read mode (m1) |                        |
|                       | -----------------------> |                         | -----------------------> |   [Wait for backlog]   |
|                       | <----/                   |                         |                          |           +            |
|                       |                          |                         |                          |           +            |
|                       | Single primary message   |                         | Single primary message   |           +            |
| disable certification | <----------------------- | disable certification   | <----------------------- |    backlog executed    |
|                       |                          |                         |                   \----> |  disable certification |
|                       |                          |                         |                          |                        |
|                       |                          |                         |                          |                        |


6. Facing member failures or stops

All is nice if there are no problems while the process is running.
One issue that might happen is that some member can leave during the process.
This can be an intentional leave as the DBA stopped the member or a server/machine/network failure that made the group expel the member.
In this section we handle exits under a majority, for partitions check section 7.


How is it handled: Failures at the coordination level

  • The invoking member fails: The way this WL is structured, the invoking member only plays a key role on the start and end part of the action when the result is returned to the user.

In practice what this means is that once an action is accepted by the group any failure on the invoking member will not stop the action progress on the group.
The group action will continue its work on other members until all declare its end.

If the coordinator dies before sending the action then nothing happens.

  • Any member fails:

When a member leaves or fails, all the running action coordinators in the other members needs to wait for 1 less member to declare the action as terminated.


How it is handled: Single primary -> Multi-primary

  • If primary fails:
    We have to break the wait on the secondaries if they are waiting.
    Note that this means no more transactions will come from the old primary so all transactions from this point on are safe.

There is however a question here of what to do with concurrent primary elections.
We have 2 options:
- A We elect a new primary, causing a secondary to be writable and activating certification before applying all pending transactions from the old primary.
The upside here is that the group write downtime is smaller, the downside is that there is a window for transaction divergence.

- **B** We don't elect a new primary and the process will wait for the

secondaries to be up to date.
This option while safer may mean the group wont be writable for a period of time.

For now we go with B for safety as explained on section 5.

  • If secondary fails:
    If a secondary member fails the algorithm will not be affected and no action is needed in the remaining members


How it is handled: Multi-primary -> Single primary / Primary election

  • If the old primary fails: the process must break any waits for the old primary.
    This means that if we are waiting for the old primary to be safe we can invoke a new primary election at this point.
    If we are still validating the parameters and the action execution then it must select a new member to invoke the primary election. On the other hand, if the process is already waiting for this member to be in read mode, then we can skip waiting for it.

  • If the new primary fails: the process must abort if the member is not yet elected.
    If the primary was elected already, then this failure does not affect the group action and is handled as a traditional primary failure.
    If the election is still ongoing, it depends on the action. Primary member changes will abort and try to elect the old primary again.
    If the election is still ongoing and we are changing to single primary mode, the action will output a warning but wont fail.

  • If secondary fails: the waiting members should be adjusted if waiting for messages.


How we register leaves and messages

For handling such events as member exits and primary elections mentioned above two options existed:

  1. Methods placed at each point that are directed to the coordinator class and then direct at each executing group action

  2. Observers at each point that can be used by actions if needed.

We went with 2, while it is a bit more complex than 1) it allows for more versatility.

Other reason we went for 2 is that we used the same pattern for messages as we needed to add new behaviors to old messages.

This way new plugin components can emerge reading events from the group and messages without changes to the gcs_event_handler code.


7. Other scenarios - partitions and joins

Partitions

Partitions are an orthogonal issue in this WL.
Being a distributed algorithm, it is normal that actions will block as any other transaction message will on a minority partition.
Question remains on how the DBA can handle it, and how will the system react.

Lets reason about the two different types of partitions:

Asymmetric partitions

In this case the group still has a majority and some members will be expelled.
To the majority, the way it handles these exits falls under section 6.

On the minority, on the absence of a network connection, the executing group actions will block.
On these members the DBA may stop group replication in the member and the group action process will terminate on that member.

If a value for group_replication_unreachable_majority_timeout is defined then eventually these members in a minority partition will error out.
The members can also be eventually expelled by the group and error out when they are again enable to contact it.

When this happens the action is stopped by the plugin error handling code.

Symmetric partitions

On symmetric partitions or multi group partitions where no sub-group holds a majority the DBA options are:

  1. The DBA manages to restore the network, and if so the process should continue normally.
  2. The DBA forces a new group membership. You can check:
    https://dev.mysql.com/doc/refman/8.0/en/group-replication-network-partitioning.html

If the DBA goes with option 2 then a new majority will be formed where the group action will unblock.
Again, how this new majority handles the leaving members is described on section 6.

Example:
Is the old primary in the unreachable member and you were waiting for a message? When a majority is formed and the old primary is expelled, the wait will unblock,

On expelled member, the view will mark them as leaving members making them terminate their actions locally.


A new member joins

The way this WL was designed, no concurrent joins are allowed during a group action as it would make them even more complex.

The frontier is the reception of the action starting message.
At this point if there are members in recovery the action will abort.
If there are no joiners, then after this point all joins will fail, meaning the joiner will leave the group when it sees a running action.

Note that joins, recovery state changes, and action starts all rely in GCS events and are for that reason not concurrent.


8. Facing process failures

When an unexpected failure happens when executing a step the process also must behave accordingly.
Many of the execution steps are simple tough, like setting a plugin var, so they should not fail.
Others are critical and threaten the consistency of the group like falling to disallow writes on a secondary member.

So the strategy may vary and the process should accept that:

  • Message sending failures: Message retry is already something inside GCS.
    This means that sending errors are serious and not easy to handle.
    When a group action fails to send messages how can it say to abort?
    If the action already is running, the only option is to leave the group (and enable read mode).

  • Enabling the read mode: all failures enabling the read mode will leave the member in a undesired state, so the server process shall abort. For this, a service shall be implemented, as described in the low level design.

  • Disabling the read mode: Failures when disabling the read mode can be handled by the DBA, so a logged error should be sufficient.

  • Failures in assessing the number of transactions running: When a operation needs to know the number of transactions running to ensure safety, any failure should make the process abort.

  • Failures on SET PERSIST or other critical failures: Whenever there is a failure that prevents the algorithm from progressing, the member should exclude himself from the group (and enable read mode).


9. Monitoring and error reporting

One point not mentioned until now is how the DBA can check the progress and status of coordinated configuration changes.

First of all, we must answer the question: What does the user wants to know?

A. What is the action running. B. What is the progress on that action. C. Is there an error, and if so, what was it.

Lets start with A. and B. To not overload our current performance schema tables with fields that have an unknown lifetime and evolution we went for an alternative.

Thus, monitoring will be based on the current stage event table:

performance_schema.events_stages_current

How this works is that under

performance_schema.setup_instruments

We have new instruments for monitoring


Stages

So, the planned stages consist of several steps where the algorithm will probably wait on and we can give some progress information.
Thence we skip here singular steps like action acceptance or primary election invocation.

Multi primary switch stages

stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.

The old primary updated the enforce_update_everywhere_checks variable and collected the set of currently ongoing transactions.
In this stage we show the progress of how many transactions are left for execution.

stage/group_rpl/Multi-primary Switch: waiting on another member step completion

While the above step runs, the old secondaries are in wait state.
This stage shows we are waiting on a message from the old primary member.

stage/group_rpl/Multi-primary Switch: applying buffered transactions.

The old primary executed all the above transactions.
The secondaries must also wait for them to be executed locally.
This stage reports when the process is over.

stage/group_rpl/Multi-primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in different members can differ.
This stage reports how much members finished vs the ones that are still missing.

Single primary switch stages

stage/group_rpl/Single-primary Switch: checking group pre-conditions.

Check if the group has running channel slaves or members of an invalid version.
This stage reports the completion of the several verification steps.

stage/group_rpl/Single-primary Switch: executing Primary election

Primary election is invoked and runs in all members. This stage means the algorithm is waiting on the election.

stage/group_rpl/Single-primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in different members can differ.
This stage reports how much members finished vs the ones that are still missing.

Primary switch stages

stage/group_rpl/Primary switch: checking current primary pre-conditions.

Check if the old primary has running channel slaves. This stage reports the completion of the several verification steps.

stage/group_rpl/Primary Switch: waiting for pending transactions to finish.

The old primary updated the enforce_update_everywhere_checks variable and collected the set of currently ongoing transactions.
In this stage we show the progress of how many transactions are left for execution.

stage/group_rpl/Primary Switch: waiting on another member step completion

While the above step runs, the old secondaries are in wait state.
This stage shows we are waiting the execution of the above transactions that will lead to the primary election phase.

stage/group_rpl/Primary Switch: executing Primary election

Primary election is invoked and runs in all members. This stage means the algorithm is waiting on the election outcome.

stage/group_rpl/Primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in different members can differ.
This stage reports how much members finished vs the ones that are still missing.

Primary election stages

stage/group_rpl/Primary Election: applying buffered transactions.

The old primary executed all the above transactions.
The secondaries must also wait for them to be executed locally.
This stage reports the progress of this stage

stage/group_rpl/Primary Election: waiting on current primary transaction execution

When the secondaries are waiting on the primary to end the above stage.

stage/group_rpl/Primary Election: waiting for members to turn on super_read_only

When primary election is invoked all members shall wait for the other members to be in read mode. This stage reports how many servers are not yet in read mode.

stage/group_rpl/Primary Election: stabilizing transactions from former primaries.

Once all servers are in read mode the new primary shall consume its backlog and then declare that certification can be disable on all members.
This stage shows how many transactions are left to apply.


How to use it

So when a action is running, the DBA can check stages table for something like:

 SELECT event_name, source, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
 EVENT_NAME                                                                         SOURCE              WORK_COMPLETED  WORK_ESTIMATED
 stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.  stage_monitor.h:73      3               10  

When no action is running the query returns empty.
So this solves A and B.

This information can be checked on all members running an action, being the progress reported the local one.

For C. we rely on the UDF function mechanism server error primitives.
If a function is executed with invalid parameters the UDF API will return an error with our custom error message like:

 ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary'; Member is in multi-primary mode.

If the function is validated and there is an error during execution then we return a custom error ourselves.
For that we will add the error:

 ER_GRP_RPL_UDF_ERROR

As for now there is no number associated to this error we will use NNNN

 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed. There is a member joining the group. 


10. User interface - a summary

To DBAs in general, here is the summary part.

  • Scenario A:

If you are multi-primary and want to change to single primary.
Just execute

 SELECT group_replication_switch_to_single_primary_mode()

And if you have a primary in mind you can opt to

 SELECT group_replication_switch_to_single_primary_mode(primary_uuid);
 Mode switched to single-primary successfully 

While the action runs, you can check its progress with

 SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
 EVENT_NAME                                                           WORK_COMPLETED  WORK_ESTIMATED
 stage/group_rpl/Single-primary Switch: checking group pre-conditions       0               1   


  • Scenario B:

If you are single-primary and want to change to multi-primary.
Just execute

 SELECT group_replication_switch_to_multi_primary_mode();
 Mode switched to multi-primary successfully 

While the action runs, you can check its progress with

 SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
 EVENT_NAME                                                                        WORK_COMPLETED  WORK_ESTIMATED
 stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.      2               10  


  • Scenario C:

If you are single-primary and want to change the primary.
Just execute

 SELECT group_replication_set_as_primary(server_uuid);
 Primary server switched to: UUID

While the action run, you can check its progress with

 SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
 EVENT_NAME                                                                  WORK_COMPLETED  WORK_ESTIMATED
 stage/group_rpl/Primary Switch: waiting on another member step completion        0               1 

Note that when the primary election algorithm kicks in, you can also monitor that in another stage:

 SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
 EVENT_NAME                                                                         WORK_COMPLETED  WORK_ESTIMATED
 stage/group_rpl/Primary Election: Waiting for members to turn on super_read_only        3               6  


  • Error cases 1: State changes under the same mode

You are in single-primary / multi primary and you execute a migration to the mode the system is already in.

 SELECT group_replication_change_to_multi_primary_mode();
 The system is already on multi-primary mode

The function just returns a string stating that. No error.


  • Error cases 2: Primary switches in multi-primary mode

You are in multi primary and you execute a primary election

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; In multi-primary mode. Use group_replication_switch_to_single_primary_mode.


  • Error cases 3: Generic Validation errors

You want to execute a function and you give an improper argument, none at all or some other case, some of the errors are:

SELECT group_replication_switch_to_single_primary_mode(____)
ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; Wrong arguments: This function either takes no arguments or a single server uuid
ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; Wrong arguments: The server uuid is not valid.
ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; The requested uuid is not a member of the group.
ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: You need to specify a server uuid.


SELECT group_replication_set_as_primary(____);
ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: You need to specify a server uuid.
ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: The server uuid is not valid.
ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; The requested uuid is not a member of the group.

SELECT group_replication_switch_to_multi_primary_mode(____);
ERROR HY000: Can't initialize function 'group_replication_switch_to_multi_primary_mode'; Wrong arguments: This function takes no arguments.


  • Error cases 4: The DBA doesn't have privileges

You try to execute an action with no privileges

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary';  User 'group_rpl_user'@'%'. needs SUPER or GROUP_REPLICATION_ADMIN privileges.


  • Error cases 5: Member is not in a valid state

The member is in error state or unreachable.

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary'; The member needs to be ONLINE and in a reachable partition.


  • Error cases 6: An action is already running

There is already a group action running.
This is a runtime error, not validation error:

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; There is already a configuration action being executed. Wait for it to finish.


  • Error cases 7: A member is joining

There is a member joining.
This is a runtime error, not validation error:

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; A member is joining the group, wait for it to be ONLINE.


  • Error cases 8: A member of a lower version is present

A member that has a lower version and cannot execute this actions is present in the group.
This is a runtime error, not validation error:

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; The group has a member with a version that does not support group coordinated operations.


  • Error cases 9: The primary fails before election

We are electing a primary or changing to single primary mode with an appointed primary and it fails.
This is a runtime error, not validation error:

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; The appointed primary for election left the group, this operation will be aborted. No primary election was invoked under this operation.


  • Error cases 10: The primary fails during election - Change to Single Primary mode

We are changing to single primary mode with an appointed primary and it fails during election.
The operation completes but there is an warning.

 SELECT
 group_replication_switch_to_single_primary_mode(server_uuid)
 Mode switched to single-primary with reported warnings: The appointed primary being elected exited the group. Check the group member list to see who is the primary
 Warnings:
 Warning    NNNN    The appointed primary being elected exited the group. Check the group member list to see who is the primary. There were warnings detected also on other members, check their logs.


  • Error cases 11: The primary fails during election - Change of Primary Member

We are changing to single primary mode with an appointed primary and it fails during election.
This is a runtime error, not validation error:

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; Primary assigned for election left the group, this operation will be aborted and if present the old primary member will be re-elected. Check the group member list to see who is the primary.


  • Error cases 12: The primary fails after election

We are electing a primary or changing to single primary mode with an appointed primary and it fails when the member was already elected.
Only a warning is thrown.

 SELECT group_replication_switch_to_single_primary_mode("MEMBER1_UUID")
 Mode switched to single-primary with reported warnings: The appointed primary left the group as the operation is terminating. Check the group member list to see who is the primary
 Warnings:
 Warning NNNN   The appointed primary left the group as the operation is terminating. Check the group member list to see who is the primary


  • Error cases 13: A slave channel prevents the operation

When you execute a function but a channel presence prevents it

 SELECT group_replication_switch_to_single_primary_mode("MEMBER2_UUID");
 ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. The requested primary is not valid as a slave channel is running on member MEMBER1_UUID

 SELECT group_replication_switch_to_single_primary_mode();
 ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. There is more than a member in the group with running slave channels so no primary can be elected.

 SELECT group_replication_set_as_primary("MEMBER2_UUID");
 ERROR HY000: The function 'group_replication_set_as_primary' failed. There is a slave channel running in the group's current primary member.


  • Error cases 14: There is a member in group from an older version

When you execute a function but there is a member that does not have this feature.

 SELECT group_replication_switch_to_single_primary_mode();
 ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. The group has a member with a version that does not support group coordinated operations.


  • Error cases 15: When you kill a coordinated change

When you execute a coordinated change and you kill it.
Note that depending on progress messages can be different .

 SELECT group_replication_switch_to_single_primary_mode();
 ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. This operation was locally killed and for that reason terminated. The member will now leave the group.

 SELECT group_replication_switch_to_single_primary_mode();
 ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. This operation was locally killed and for that reason terminated. However the member is already configured to run in single primary mode, but the configuration was not persisted. The member will now leave the group.
  • Error cases 16: Critical failures

When you execute a function and some critical error occurs

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR HY000: The function 'group_replication_set_as_primary' failed. A critical error occurred during the local execution of this action. The member will now leave the group.


  • Error cases 17: Other failures

We are electing a primary or a migration and something fails

 SELECT group_replication_set_as_primary(server_uuid);
 ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; Error message here


11. Other points

Upgrade/Downgrade

This WL does not depend on any upgrade or does it brings restrictions to downgrades.

There are however some behaviors that are associated to upgrade/downgrade processes where the group is formed by mixed versions.

A mix of 5.7 and 8.0 members

In such a group:

If an action is invoked on 8.0 members the action shall error out when it checks that 5.7 version members are present.

About 5.7 members there is no issue as actions cannot be invoked from such members.
One point about 5.7 members is that the old election algorithm shall be used whenever a member of this version is present.

A mix of 8.0 and 9.0 members

This applies for this algorithm for any version difference between members above 8.0.

If in a mixed group where actions are supported by all members, the primary election shall only allow the selection of a primary of the group lowest version.
Selecting a higher version as primary could lead to the dissemination of unknown messages to the lower version members.


Security

Users must have GROUP_REPLICATION_ADMIN privileges as for START and STOP commands.

Users that have this privilege can also stop group replication in all members interrupting service so no new angle of attack is introduced.

In terms of operation abuse, no two commands can run at the same time so malicious DBAs could only cause sequential elections or changes of mode.
These can cause some delay in the cluster, but overall there is no effect in terms of availability.


Query life cycle and killing behaviors

The way query execution is envisioned in this WL is that a DBA executes a query in the form of a UDF, and that will block until a instantiated group action is concluded.
The query process will remain waiting for a result while a new spawned thread executes the action process.

Question here is how this relates to query termination from the DBA part.

In terms of kill semantics, we should start with the basics on how query kills works on MySQL.
The basics is that the process in execution should check in different stages if the thread was given a kill signal.

Same principle applies here, actions should regularly check if its thread was killed and if so, they should abort.
The notion of not committing or roll-backing locally don't apply here though.
This has 2 major consequences:

  1. Killing an accepted group coordinated change means this member is diverging in configuration from the group.
    Hence, the member shall leave the group and move to error mode.

  2. There is a point in time where the kill signal may not mean nothing as only trivial operation remains in the process.
    So, the DBA may kill a group action only to find out that it completed successfully.

    In practical terms this can be tracked trough stages.
    When the final stage kicks in, it means the action is now completed locally and is only waiting for other members to finish. This means that any kill request after this will not cancel the operation.

Another point here is that the DBA may kill the stuck query but it is indeed the underlying action execution thread that is taking too much time.
This design also takes that into account, guaranteeing that upon detecting it was killed the query process shall kill the local action process.


Deployment / Install

This new plugin brings no new changes in terms of install and deploy.
Just some notes:

  1. Performance schema needs to be enabled for monitoring.
    Also, some of the above setup instruments may have to be enabled if needed.

  2. UDF functions are auto installed so no user actions is needed here.

  3. The use of SET PERSIST means some user defined settings in the configuration files may be ignored as newer settings take precedence over them.


12. Points not considered in the Worklog

Cancellation:
We do not handle cancellation of requests in this worklog due to its complexity.

1. Coordinator

Coordinator - Code Skeleton

//The coordinator class where actions are submitted
class Group_action_coordinator

public:
  //Proposes a group action
  int coordinate_action_execution(Group_action* action);

  /*
    Asks the coordinator to stop any ongoing action
    @param coordinator_stop is the coordinator terminating
  */ 
  int stop_coordinator_process(bool coordinator_stop);

  //Handle incoming  action message (start or stop)
  int handle_action_message(Group_action_message *msg);

  //Queue notification (primary change, message received,.. )
  int queue_notification(Action_notification notification);

  //Returns if there is a group action running
  bool is_group_action_running();

  /*
    Adapts the coordinator
    @param number_members   the current number of members
    @param is_leaving       is this member leaving?
  */ 
  void handle_leaving_members(int number_members, bool is_leaving);

private:

  //Handle incoming start action message
  int handle_action_start_message(Group_action_message *msg);

  //Handle incoming stop action message
  int handle_action_stop_message(Group_action_message* msg);

  //Declare this action as terminated to other members
  // @param message_type for the sent message
  int signal_action_terminated(enum_action_message_type);

  // Leave the group and change state to error 
  int leave_on_action_error(); 

  //Handle the termination of current action
  void terminate_action();

   //The id defined for each action that is currently running
  enum_group_action_type current_action_id;

  //The id defined for each action that is currently running
  Group_action executing_action;

  //Declare this action as terminated to other members
  Queue<Action_notifications> notifications;

  //The number of members known for the current action
  list<uuid> known_messages_uuids;

  //The lock too coordinate start and stop requests
  lock coordinator_process_lock;

  //The flag to avoid concurrent action start requests
  bool action_ongoing;

  //The flag to avoid action starts post stop
  bool coordinator_terminating;

  //Is the action terminating
  bool action_terminating

  //The handler where actions can report progress through stages
  Plugin_stage_monitor_handler* monitoring_stage_handler;


Coordinator - Method logic

  • General idea

->user action

Group_action action_X = new Custom_action(parameters_from_user);
error= group_action_coordinator.coordinate_action_execution(action_X);
return error;
  • coordinate_action_execution(Group_action action)

  1. Lock coordinator_process_lock
  2. [action_ongoing == true || coordinator_terminating == true]
    Then abort (fail early)
  3. Set action_ongoing to true
  4. Get action message with Group_action::get_action_message
  5. Send the message to all members
    If it fails to send, return error to the user
  6. Unlock coordinator_process_lock
  7. Create a Plugin_stage_monitor_handler instance and set a stage;
  8. Wait for response from action execution (execution is on another thread).
  9. Set action_ongoing to false.
  10. [If return value is either GROUP_ACTION_KILLED or GROUP_ACTION_ERROR]
    Execute leave_on_action_error()
  11. End the stage on the Plugin_stage_monitor_handler instance.
  12. Check the response and return either success or error to the client

Check the below Killing queries section for that code path.

  • stop_coordinator_process(bool coordinator_stop)

    [Is an action running]

  1. Lock coordinator_process_lock
  2. Set coordinator_terminating= coordinator_stop
  3. Invoke Group_action::stop_action_execution(false).
  4. Wait for the thread executing the action to finish.
  5. Invoke Group_action_coordinator::terminate_action()
  6. Unlock coordinator_process_lock

  • handle_action_message(Group_action_message *msg)

  1. [coordinator_terminating == true]
    Return
  2. [Is it is a start message]
    Invoke: Group_action_coordinator::handle_start_action_message
    If there is an error on handling, return
    If local, awake the action coordination so it aborts.
  3. [Is it is a stop message]
    Invoke: Group_action_coordinator::handle_stop_action_message
  4. [Is it is a abort message] Invoke Group_action_coordinator::stop_coordinator_process(false)

  • is_group_action_running()

  1. Return true if there is a running action.
    Current ideas it to test if there is a defined action id.
    Using action_ongoing can also be an option but this is set before the action is accepted.
    So using action_ongoing can cause unnecessary join failures from new members.

  • handle_leaving_members(int number_members, bool is_leaving)

  1. [is_leaving == true]
    Invoke
    Group_action_coordinator::stop_coordinator_process(true) Return
  2. Update known_messages_uuids.
  3. [Are the termination messages == known_messages_uuids]
    Invoke Group_action_coordinator::terminate_action();

  • handle_start_action_message(Group_action_message *msg)

  1. [Is an action is already/still running]
    Then abort
    If local, awake the action coordination so it aborts.
    [No action running]
    Go to 2)
  2. [Are there any members of the group in recovery]
    Then abort
    If local, awake the action coordination so it aborts.
    [No action running]
    Set known_messages_uuids (this is a logical consistent moment)
    Go to 3)
  3. Get the local action coordinator.
    Set the current action id.
    [If it is the sender]
    Get the Group_action object for this message
    [If remote]
    Instantiate a new Group_action object.
    Set action_terminating to false;
  4. Give the message to the Group_action object for processing.
    Group_action::process_action_message(Group_action_message msg)
  5. Instantiate a new Plugin_stage_monitor_handler and set monitoring_stage_handler
  6. Invoke the execution method of the Group_action class in a spawned new thread
    Group_action::execute_action
  7. Return

  • handle_start_action_message - Execution thread

  1. Execute Group_action::execute_action()
  2. If the method returns a GROUP_ACTION_RESTART signal, re-execute the action.
  3. When the thread job finishes, execute Group_action_coordinator::signal_action_terminated()

  • handle_stop_action_message(Group_action_message *msg)

  1. Update the completed work on monitoring_stage_handler
  2. [are the termination messages uuids == known_messages_uuids]
    Then declare the action as terminated, go to 3)
  3. Update known_messages_uuids to remove the received member uuid.
  4. Use the end stage method on monitoring_stage_handler
  5. Invoke Group_action_coordinator::terminate_action()

  • signal_action_terminated(enum_action_message_type)

  1. Set action_terminating to false;
  2. Use get_termination_key() from the Group_Action class.
    Set the stage on the monitoring_stage_handler
    Set the estimated work to the number of known members, and the completed to the number of received messages.
  3. Instantiate message of type Group_action_message.
    Use the given message type and use ACTION_END_PHASE.
  4. Send the message.

  • terminate_action()

  1. Delete any notification not used by the current action.
  2. Awake coordinate_action_execution method.
  3. Unset the current_action_id.

  • leave_on_action_error()

  1. Change member state to error
  2. Leave the group
  3. Cancel pending transactions
  4. Set read mode to true.


Coordinator - Code related changes

  • Plugin_gcs_events_handler::check_group_compatibility(

As it is stipulated in the requirements, new members cannot join during a group action.

To accomplish this we need to change this method and add a check for:
Group_action_coordinator::is_group_action_running()

  • Plugin_gcs_events_handler::handle_joining_members(

In a continuation of the above requirement, but with the intent of increasing user experience we propose to add the same check here.

The idea is that if joining the member will error out, but other members should also print the cause of the member being expelled.

So, on the code branch

else if (number_of_joining_members > 0 ||
        (number_of_joining_members == 0 && number_of_leaving_members == 0))
{

We shall add a check and a print to the error log if the member joined while an action was ongoing.

  • Plugin_gcs_events_handler::handle_leaving_members(

Similar to the recovery call for update, we also invoke here:
Group_action_coordinator::handle_leaving_members(int number_members, is_leaving)


Coordinator - Concurrency notes

  • Start vs Start scenarios

The coordinator will stop several start messages from being sent at the same time.
Only when an action returns you can send another due to the coordinator_process_lock
About concurrency between requests from other members, we rely on the sequential nature of GCS.
The first received start action message is the one executed, the other fails.

  • Start vs Stop scenarios

Stop happens on plugin.cc method terminate_plugin_modules().
This means that it happens when the member left the group so in theory there are no more messages starting actions.

There are however possible requests for actions in parallel.
Due to the coordinator locks, either:
The stop goes first and sets coordinator_terminating, so all requests will fail.
The start goes first, but stop ends it.


Coordinator - Life cycle

  • Initialization

This class is initialized on plugin.cc on plugin_group_replication_start.
Since it does not depend on any server service it does not rely on the delayed thread class.

  • Termination

This class is terminated and deleted on terminate_plugin_modules().
The method Group_action_coordinator::stop_coordinator_process(true) is invoked

  • Invocation

See section 8, UDF functions


Coordinator - Killing queries

As described on the functional requirements and high level design this WL shall be implemented with a responsive behavior to DBA kill requests.

On key point is that query kill signals shall also kill the underlying action process.

So on coordinate_action_execution(Group_action action) it is assumed that step 7 is not a hard wait but a timed one that will periodically check if the requested was killed.

If killed, the process shall:

  1. Invoke Group_action::stop_action_execution(true).
  2. Wait for the thread executing the action to finish.
  3. Invoke Group_action_coordinator::terminate_action()
  4. [Group Action result is GROUP_ACTION_NOT_KILLED]
    Send a warning stating the action finished in spite of the kill
  5. [Group Action result is GROUP_ACTION_KILLED]
    Send an error stating that the action was killed and the member will leave the group.


2. Group Actions : Parent class

The parent class for all actions

Group Action - Code Skeleton

//The base class that each action implements
class Group_action

  // Enum for existent group actions classes
   enum_group_action_type     {
   GROUP_ACTION_MULTI_PRIMARY    //change to multi primary
   GROUP_ACTION_PRIMARY_ELECTION //primary election
   NO_GROUP_ACTION
  }

  // Enum for the end results of a action execution
  enum_action_execution_result{
   GROUP_ACTION_TERMINATED // Terminated with success 
   GROUP_ACTION_ERROR      // Error on execution 
   GROUP_ACTION_RESTART    // Due to an error the action shall be restarted
   GROUP_ACTION_ABORTED    // Was aborted due to some internal check
   GROUP_ACTION_KILLED     // Action was killed 
   GROUP_ACTION_NOT_KILLED // Action was killed but finished 
  }

  //Constructor giving the class access to notifications
  Group_action(Queue<Action_notifications> notifications);

  /*
    Get the message with parameters to this action
    @param message  [out] the message to start the action
  */
  virtual void get_action_message(Group_action_message** message)=0

  /*
    Get the message with parameters to this action
    @param message  [in]  the message to start the action
  */
  virtual int process_action_message(Group_action_message& message)=0

  /*
    Execute the action
    @param invoking_member is the member that invoked it
    @param stage_handler the stage handler to report progress

    @returns the execution result 
  */
  virtual enum_action_execution_result 
      execute_action(bool invoking_member,
                     Plugin_stage_monitor_handler stage_handler)=0;

  /*
    Get the error message in case of error
    @param [out] error_msg

    @returns the execution result 
  */
  virtual enum_action_execution_result get_error_message(string& error_msg)=0;

  /*
    Terminate the execution process
    @param killed are we killing the action. 
  */
  virtual stop_action_execution(bool killed)=0;

  //Returns the action identifier
  virtual int get_action_id()=0;

  // Returns the action name (for debug)
  virtual int get_action_name()=0;

  //Allow each class to have its own end stage key/message
  virtual PSI_stage_key get_termination_key()=0;


Group Action - Method logic and ideas

  • get_action_message(Group_action_message message)**

[Method extended by child classes]

This method should return the class that contains the parameters for execution.
The idea here is that each class can defined what parameters it has and how to encode them.

  • process_action_message(Group_action_message& message)

[Method extended by child classes]

Each action class reacts on their own way to their message/parameters.
There is however another side to this method.
This method is executed upon message receive and that means it is processed at the same logical moment in all members.
You can use this method to check something about the current group view or so.

  • execute_action(bool invoking_member, Plugin_stage_monitor_handler stage_handler)

[Method extended by child classes]

Each action executes its logic on this method.
This method is executed in spawned thread.
This method can self repeat if you return GROUP_ACTION_RESTART.

  • stop_action_execution(bool killed)

[Method extended by child classes]

This method should simply unblock any wait and make the execution method return faster.

  • get_error_message(string& error_msg)

[Method extended by child classes]

This method is omitted in the below classes but basically we assume error messages are stored and can be retrieved.


Group Action - Killing queries

Not going into details in the child implemented classes, lets point here the basics of killing a Group Action process.

The basics is that execute_action shall contain stages where it is checked if the thread was killed, or stop_action_execution was invoked with a true flag.
If the thread was killed and we detect it at this points, a flag like action_was_killed is set.

When the action terminates, if there was an attempt to kill the query and it was killed in on of these points we output GROUP_ACTION_KILLED.

If there was an attempt to kill it, but it failed, it outputs GROUP_ACTION_NOT_KILLED


2.1 Multi Primary migrations

The action block to do a migration from single primary setups to multi-primary setups.

Multi-primary migration - Code Skeleton

//Class for multi primary migrations
class Multi_primary_migration_action : public Group_action

  virtual void get_action_message(Group_action_message** msg)

  virtual int process_action_message(Group_action_message& msg)

  virtual enum_action_execution_result 
      execute_action(bool invoking_member,
                     Plugin_stage_monitor_handler stage_handler)

  virtual int stop_action_execution(bool killed)

  virtual int get_action_id()

  virtual int get_action_name()

  // Listener:  React to view changes
  after_view_change(joining, leaving, group, *skip_election)

  // Listener: React to messages
  before_message_handling(message, *skip_message)

private:

  // The current primary member
  string primary_member

  // If the action was aborted
  bool action_aborted


Multi primary migration - Method logic

  • get_action_message(Group_action_message msg)**

  1. Instantiate message of type Group_action_message.
    Use ACTION_MULTI_PRIMARY_MESSAGE and use ACTION_START_PHASE.
    No need for a custom message as this action has no parameters.

  • process_action_message(Group_action_message& msg)

  1. Get what is the current primary and set primary_member.
  2. Register listener on Group_events_observation_manager

  • execute_action(invoking_member, monitoring_stage_handler)

  1. Set enforce_update_everywhere_checks to true
  2. [If primary member]
    Use a Server_query_execution_handler. instance
    Extract a list of the current executing server transactions.
    When all transactions are executed we can proceed to 3.
  3. [If primary member]
    Send a message to all members stating that all transactions are now safe.
    Use a Single_primary_message with type SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS.
    If there is an error when checking transactions execution send an Action Message:
    Use phase ACTION_ABORT_PHASE and return.
  4. When all members receive the above said message the method before_message_handling is executed.
    See Multi_primary_migration_action::before_message_handling below.
    Here the process pools the notification to move to step 5.
  5. When awaken by the Queue_checkpoint_packet:
    [If secondary member]
    Use the channel_get_retrieved_gtid_set method from the channel interface to get the current applier retrieved set.
    Loop until the server GTID executed contains all the retrieved transactions.
  6. Set the plugin.cc var single_primary_mode to false.
    Use Persistent_variables_handler::set_persistent_variable(
  7. [If secondary member]
    Disable read mode.
    Use methods on read_mode_handler.h
  8. Unregister listener on Group_events_observation_manager
  9. return

  • stop_action_execution(bool killed)

For simplicity we omit in the execution methods the termination checks.
It is assumed that a regular check for the action_aborted flag is made.
Same thing for checks on notifications of type TERMINATE_EXECUTION_NOTIFICATION.

  1. Set action_aborted to true.
  2. Invoke Server_query_execution_handler::abort_waiting_process()
  3. Queue a TERMINATE_EXECUTION_NOTIFICATION notification to unblock any waits

  • after_view_change(joining, leaving, group, *skip_election)

  1. [If secondary and the old primary died ]
    Queue a notification DEAD_PRIMARY_NOTIFICATION that will unblock the wait for message from the primary.
  2. Execute the method Applier_module::queue_certification_enabling_packet(true).
    Queue a Queue_checkpoint_packet that will awake the main process when queue is empty.
  3. Set the out parameter skip_primary_election to true.

  • before_message_handling(message, *skip_message)

  1. [If message type = SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS]
    Queue a notification TRANSACTIONS_SAFE_NOTIFICATION in the notification queue.
    Execute the method Applier_module::queue_certification_enabling_packet(true).
    Check the below section for notes on this method. Queue a Queue_checkpoint_packet that will awake the main process when queue is processed up to this point.


Multi primary migration - Code related changes

To enable the certification in the applier we need some tweaks to this class.
The main issue is that we must prevent the execution of the check_single_primary_queue_status() method.
This method, used for old elections will turn off certification after the "new primary" SQL thread is idle.

Even on primary elections, we want to have a more fine control of the moment when the primary declares that certification is no longer needed.

class Applier_module

  /*
    Queues a Single_primary_action_packet in the applier queue
    @param multi_primary_context is there more than a primary
  */
  + int queue_certification_enabling_packet(bool multi_primary_context)

  /*
    Signals that 
  */
  + void end_multi_primary_period()

 private:

  // Is the member in situation where more that one member does updates
  + bool multi_primary_context
  • queue_certification_enabling_packet(bool multi_primary_context)

  1. Create Single_primary_action_packet with NEW_PRIMARY
  2. Set multi_primary_context to the passed parameter

  • end_multi_primary_period()

  1. Set multi_primary_context to false

  • check_single_primary_queue_status()

Add a new check here for multi_primary_context


Multi primary migration - Concurrency notes

  • Multi primary changes and primary elections

As stated in the HLD, the primary elections during multi primary migrations are disabled in case of failure.
Any election during the mode change will be skipped, so there could be some writting downtime.


Multi primary migration - Monitoring notes

Here we describe when process stages change and how we do the monitoring of progress.

Here, we will use the steps from Multi primary migration - Method logic, in particular for the execute_action(invoking_member, monitoring_stage_handler)

  • Step 2 on the primary

The stage is set to

Multi-primary Switch: waiting for buffered transactions to finish.

The progress is set in the handler, passing into it the Plugin_stage_monitor_handler object.

  • Step 2 on the secondary members

The stage is set to

Multi-primary Switch: waiting on another member step completion

Completed work is set to 0, estimated work is set to 1.

  • Step 5 on secondaries

The stage is set to

Multi-primary Switch: applying buffered transactions.

The progress is tracked in the form of what is the initial difference between the retrieved and executed sets and how it evolves in time.


2.2 Single primary election

The action block to elect a primary chosen by the user or in a migration from multi-primary.

Single primary election - Code Skeleton

//Class for primary primary election / migration
class Primary_election_action

  // Enum for the end results of a action execution
  enum_primary_election_state{
   PRIMARY_VALIDATION_PHASE   // Check if primary is valid
   PRIMARY_SAFETY_CHECK_PHASE // Make the change safe
   PRIMARY_ELECTION_PHASE     // Invoke primary election
  }

  virtual void get_action_message(Action_message** messg)

  virtual int process_action_message(Action_message& messg)

  virtual enum_action_execution_result 
      execute_action(bool invoking_member,
                     Plugin_stage_monitor_handler stage_handler)

  virtual int stop_action_execution(bool killed)

  virtual int get_action_id()

  virtual int get_action_name()

  // Listener:  React to view changes
  after_view_change(joining, leaving, group, *skip_election)

  // Listener: React to messages
  before_message_handling(message, *skip_message)

  // Listener: After primary election
  after_primary_election(primary_uuid)

private:

  // Changes the phase where the action is currently
  void change_action_phase(enum_primary_election_state s)
  // The current phase
  enum_primary_election_states current_action_phase
  // Lock for the phase change
  lock phase_lock

  // The member that invokes primary election
  string invoking_member_uuid

  // The selected primary uuid to change to
  string selected_primary_uuid

  // If the action was aborted
  bool action_aborted


Single primary election - Method logic

  • get_action_message(Action_message msg)**

  1. Instantiate message of type Primary_election_action_message.
    Use ACTION_START_PHASE and the chosen primary uuid.

  • process_action_message(Action_message& msg)

  1. Cast the Action_message objecto to Primary_election_action_message.
  2. If a uuid is selected, set selected_primary_uuid
  3. [If primary candidate is defined]
    validate_primary_uuid(primary_uuid, error_message)
  4. validate_primary_version(error_message).
    Step 3 and 4 are executed here for the consistent view of the group.
  5. Get what is the current primary and set invoking_member_uuid.
    If the no previous primary exists, define the invoking members as being the member that invoked the action change.
  6. Register listener on Group_events_observation_manager
  7. Set current_action_phase to PRIMARY_VALIDATION_PHASE

  

  • execute_action(invoking_member, monitoring_stage_handler)

  1. Invoke Primary_election_validation_handler::validate_election(uuid, valid_uuid, String& error_msg)
    [result = INVALID_PRIMARY || CURRENT_PRIMARY ]
       return GROUP_ACTION_ABORT.
    [result = GROUP_SOLO_PRIMARY]
       set selected_primary_uuid to valid_uuid and go to 2
    [result = VALID_PRIMARY]
       go to 2
  2. Set current_action_phase to PRIMARY_SAFETY_CHECK_PHASE
  3. [If previous primary member exists]
    Set enforce_update_everywhere_checks to true on all members.
    [If (old) primary member]
    Use the Server_query_execution_handler.
    When all transactions are executed we can proceed to 4.
    If there is an error when checkting transactions execution send an Action Message:
    Use phase ACTION_ABORT_PHASE and return.
  4. The invoking_member_uuid invokes:
    Primary_election_handler::request_group_primary_election(primary_uuid,mode)
    [If coming from multi master]
    mode=SAFE_OLD_PRIMARY
    [If primary switch]
    mode=UNSAFE_OLD_PRIMARY
    This sends a message and handles the election on all members.
    Primary election handles the certification enabling.
    It also handles the read mode settings.
  5. Wait for PRIMARY_ELECTED_NOTIFICATION notification in the queue.
    Pop the notification from the queue.
    Set the plugin.cc var single_ primary_mode to true.
    Use Persistent_variables_handler::set_persistent_variable(
    [If secondary member]
    Set enforce_update_everywhere_checks to false;
    Use Persistent_variables_handler::set_persistent_variable(
  6. When the message
    Single_primary_message::SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE
    arrives, queue a notification.
  7. [If primary member]
    In the action process wait for above notification.
    Set enforce_update_everywhere_checks to false;
    Use Persistent_variables_handler::set_persistent_variable(
  8. return.

  • stop_action_execution(bool killed)

For simplicity we omit in the execution methods the termination checks.
It is assumed that a regular check for the action_aborted flag is made.
Same thing for checks on notifications of type TERMINATE_EXECUTION_NOTIFICATION.

  1. Set action_aborted to true.
  2. Invoke Server_query_execution_handler::abort_waiting_process()
  3. Queue a notification of type TERMINATE_EXECUTION_NOTIFICATION to unblock any waits

  • change_action_phase(enum_primary_election_state phase_var)

  1. Lock phase_lock
  2. Change current_action_phase to phase_var
  3. Unlock phase_lock

  • after_view_change(joining, leaving, group, *skip_election)

  1. Lock phase_lock
  2. [If the old primary died]
    Set the out parameter skip_primary_election to true
    [Is current_action_phase == PRIMARY_VALIDATION_PHASE]
       Change the invoking member from the primary to the invoking member.
       If no invoking member exists, select the lowest uuid member.
    [Is current_action_phase == PRIMARY_SAFETY_CHECK_PHASE]
       Invoke:
       Set current_action_phase to PRIMARY_ELECTION_PHASE    Primary_election_handler::execute_primary_election(primary_uuid, DEAD_OLD_PRIMARY)
  3. [If the selected_primary_uuid died]
    [Is current_action_phase == PRIMARY_VALIDATION_PHASE || PRIMARY_SAFETY_CHECK_PHASE]
       Abort the action. Return GROUP_ACTION_ERROR
    [Is current_action_phase == PRIMARY_ELECTION_PHASE]
       Do nothing.
       The algorithm will remain waiting for a primary to be elected.
       Secondaries only have to disable transaction checks.
  4. Unlock phase lock

  • after_primary_election(string uuid, int error)

  1. [error !=0]
    Queue a notification TERMINATE_EXECUTION_NOTIFICATION
    return;
  2. Queue a notification PRIMARY_ELECTED_NOTIFICATION

  • before_message_handling(message, *skip_message)

  1. [If message type = Single_primary_message::SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE]
    Queue a notification PRIMARY_QUEUE_APPLIED_NOTIFICATION
  2. [If message type = Single_primary_message::SINGLE_PRIMARY_ELECTION]
    Set current_action_phase to PRIMARY_ELECTION_PHASE


Single primary election - Concurrency notes

  • Single primary changes and primary elections

The main issue here is when members leave, in particular the old or the new primary.
The reaction to these events depend however in the current state of the action. That is why the phase lock is important.

Question can also pop as: what if the new member fails when one member is on phase 1 and the other on phase 2.
That is why the post primary election phase is triggered using a GCS event.
Same as why the single primary mode is set to true at this point.
This guarantees all the members make the same decision.

When the new primary dies before the election then the process aborts in all members with no needed coordination.

The primary election is a box to the action algorithm.
If a new primary dies, the election algorithm will elect a new member.
The action algorithm only waits for a valid primary and the group to be ready and reacts to that.

When the new primary dies after election, this process is already over from its own point of view.
Secondaries simply set update checks to false and the action terminates.
The election will be handled by the group.

A note here also about how he skip elections in some cases.
When the old primary dies this means there is no safety wait for currently executing transactions.


Single primary migration - Monitoring notes

Here we describe when process stages change and how we do the monitoring of progress.

Here, we will use the steps from Single primary migration - Method logic, in particular for the execute_action(invoking_member, monitoring_stage_handler)

Some of the stages depend on the context of the operation. Simple primary changes use stages that begin with

 Primary switch:

Changes from multi primary to single primary use stages that begin with:

Single-primary switch:
  • Step 1 on all members

The stage is set to

Single-primary switch: checking group pre-conditions.

or

Primary switch: checking current primary pre-conditions.

Work completed/estimated can be the number of message to receive. For simplicity reason we can although consider this a unique step

  • Step 3 on old primary

Set the stage to

Primary Switch: waiting for buffered transactions to finish.

The progress is set in the transaction handler, passing into it the Plugin_stage_monitor_handler object.

  • Step 3 on old secondaries

Set the stage to

  Primary Switch: waiting on another member step completion

They will remain in this state until primary election is invoked.

  • Between steps 4 and 5 on all member.

When the leader election message comes, change the stage to

Primary Switch: executing primary election

or

Single-primary Switch: executing primary election

These are single step stages, all progress on primary election is reported on its own process.

Also, from this point one, no more stages are invoked under the context of these actions.


3. Notifications

The basic ideas about notifications is to warn you when a message is received or something happens in the plugin.
Notifications are important in cases where a group action wants to know about something that might happen before or after a point P.

Current used notification

// Enum for the end results of validation
enum_action_notification_types{
 DEAD_PRIMARY_NOTIFICATION          // The current primary is dead
 DEAD_MEMBER_NOTIFICATION           // On of the member is dead
 CHANNEL_VALIDATION_NOTIFICATION    // member has channels?
 PRIMARY_ELECTED_NOTIFICATION       // a new primary was elected
 PRIMARY_QUEUE_APPLIED_NOTIFICATION // queue consumed on primary
 TRANSACTIONS_SAFE_NOTIFICATION     // all transactions are now safe
 TERMINATE_EXECUTION_NOTIFICATION   // terminate current process
}

Code Skeleton - Base class

// Notification events to alert actions of some event
class Action_notification

public:

  enum_action_notification_types action_type

Dead_member_notification

// Notification about a member that left or died
class Dead_member_notification : public Action_notification

private:

  //The dead member uuid
  string uuid

Channel_validation_notification

// Notifications with information about channels on members
class Channel_validation_notification : public Action_notification

private:

  bool has_slave_channels
  string uuid


4. Observers/Listeners - Group Events

In order to have a notification mechanism that is more extensible to future actions we chose a observer/listener pattern.
Beside being possible useful outside this worklog, this pattern also allows the user to add behaviors to old messages without changing the existing code.

The Listeners:

// Listener class for events like view changes
class Group_event_listener

  /*
    Executed before view install
    @param joining            members joining the group
    @param leaving            members leaving the group
    @param group              members in the group
    @param skip_election[out] skip primary election on view
  */
  int after_view_change(joining, leaving, group, *skip_election)

  /*
    Executed before primary election
    @param primary  the elected primary
    param error    if there was and error on the process
  */
  int after_primary_election(primary_uuid, int error)

  /*
    Executed before the message is processed
    @param message             The GCS message
    @param skip_election[out]  skip message handling if true
  */
  int before_message_handling(message, *skip_message);

The Manager:

// The class that registers and alerts listeners
class Group_events_observation_manager

 /*
   The method to register new observers
   @param observer   An observer class to register
 */
 void register_channel_observer(Group_event_listener* observer)

 /*
   The method to unregister new observers
   @param obsvr      An observer class to unregister
 */
 void unregister_channel_observer(Group_event_listener* obsvr)

 /*
   Executed before view install
   @param joining            members joining the group
   @param leaving            members leaving the group
   @param group              members in the group
   @param skip_election[out] skip primary election on view
 */
 int after_view_change(joining, leaving, group, bool *skip_election)

 /*
   Executed before primary election
   @param primary  the elected primary
   @param error    if there was and error on the process
 */
 int after_primary_election(primary_uuid, int error=0)

 /*
   Executed before the message is processed
   @param message             The GCS message
   @param skip_election[out]  skip message handling if true
 */
 int before_message_handling(message, bool *skip_message);

private:
  list<Group_event_listener*> group_events_listeners;
  readwritelock channel_list_lock;

The Manager: Method logic:

  • register_channel_observer(Group_event_listener* observer)

  1. Lock channel_list_lock for writing
  2. Add listener to group_events_listeners
  3. Unlock

  • unregister_channel_observer(Group_event_listener* obsvr)

  1. Lock channel_list_lock for writing
  2. Remove listener from group_events_listeners
  3. Unlock

  • after_view_change(joining, leaving, group, *skip_election)

  1. Lock channel_list_lock for reading
  2. For all member in group_events_listeners:
    execute after_view_change(joining, leaving, group, *skip_election)
    skip_election+= skip_election
  3. Unlock
  4. Return the sum of the error values from the invocation

  • after_primary_election(primary_uuid, error)

  1. Lock channel_list_lock for reading
  2. For all member in the group_events_listeners:
    execute after_primary_election(primary_uuid, error)
  3. Unlock
  4. Return the sum of the error values from the invocation

  • before_message_handling(message, *skip_message)

  1. Lock channel_list_lock for reading
  2. For all member in group_events_listeners:
    execute before_message_handling(message, *skip_message)
    skip_message+= skip_message
  3. Unlock
  4. Return the sum of the error values from the invocation

The Manager: Initialization

The manager class is initialized on plugin.cc after

channel_observation_manager= new Channel_observation_manager(plugin_info);

The Manager: Invocation

  • after_view_change(joining, leaving, group, *skip_election)

Before the following snippet on
Plugin_gcs_events_handler::on_view_changed

// Handle primary election if needed 
this->handle_leader_election_if_needed();

This snippet will only be executed if skip_election is false

  • after_primary_election(primary_uuid, error)

This method should be executed in the context of the primary election process.
Check sections 6.2.1 and 6.2.2 for details on the invocation.

This ensures the method is executed on automatic and invoked primary elections.

  • before_message_handling(message, *skip_message)

Executed on
Plugin_gcs_events_handler::on_message_received(const Gcs_message& message)
before:

switch (message_type)
{

The message is scrapped if skip_message is true.
For performance reasons we may skip transactional messages here.


5. Observers/Listeners - Transactions

Same idea as section 4, but now for transactions.

The Listeners:

// Listener for transaction life cycle events
class Group_transaction_listener

  // Enum for transaction origins
  enum_group_transaction_origin{
   GROUP_APPLIER_TRANSACTION  // Group applier transaction
   GROUP_RECOVERY_TRANSACTION // Distributed recovery transaction
   GROUP_LOCAL_TRANSACTION    // Local transaction
  }

  /*
    Executed before commit
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */  
  int before_commit(thread_id, enum_group_transaction_origin)

  /*
    Executed before rollback
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */
  int before_rollback(thread_id, enum_group_transaction_origin)

  /*
    Executed after commit
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */ 
  int after_commit(thread_id, enum_group_transaction_origin)

  /*
    Executed after rollback
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */
  int after_rollback(thread_id, enum_group_transaction_origin)

The Manager:

// The class that registers and alerts listeners
class Group_transaction_observation_manager

  /*
    The method to register new observers
    @param observer   An observer class to register
  */
  void register_transaction_observer(Group_transaction_listener *obsvr)

  /*
    The method to unregister new observers
    @param obsvr      An observer class to unregister
  */
  void unregister_transaction_observer(Group_transaction_listener *obsvr)

  /*
    Executed before commit
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */
  int before_commit(thread_id, enum_group_transaction_origin)

  /*
    Executed before rollback
     @param thread id          the transaction thread id
     @param enum_group_transaction_origin who applied it
 */
  int before_rollback(thread_id, enum_group_transaction_origin)

  /*
    Executed after commit
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */
  int after_commit(thread_id, enum_group_transaction_origin)

  /*
    Executed after rollback
    @param thread id          the transaction thread id
    @param enum_group_transaction_origin who applied it
  */
  int after_rollback(thread_id, enum_group_transaction_origin)

  // Are there any observers present
  bool is_any_observer_present()

private:

  //List of observers
  list<Group_transaction_listener*> group_transaction_listeners;

  //The lock to protect the list
  readwritelock channel_list_lock;

  //Flag that indicates that there are observers (for performance)
  bool registered_observers;

The Manager: Method logic:

  • register_channel_observer(Group_transaction_listener* observer)

  1. Lock channel_list_lock for writing
  2. Add listener to group_transaction_listeners
  3. registered_observers = true
  4. Unlock

  • unregister_channel_observer(Group_transaction_listener* obsvr)

  1. Lock channel_list_lock for writing
  2. Remove listener from group_transaction_listeners
  3. registered_observers= (group_transaction_listener != 0)
  4. Unlock

  • before/after_commit(thread_id, enum_group_transaction_origin)

  1. Lock channel_list_lock for reading
  2. For all member in group_transaction_listeners:
    execute before/after_commit(thread_id, enum_group_transaction_origin)
  3. Unlock

  • before/after_rollback(thread_id, enum_group_transaction_origin)

  1. Lock channel_list_lock for reading
  2. For all member in the group_transaction_listeners:
    execute before/after_rollback(thread_id, enum_group_transaction_origin)
  3. Unlock

  • is_any_observer_present()

  1. Return registered_observers;

The Manager: Initialization

The manager class is initialized on plugin.cc after

channel_observation_manager= new Channel_observation_manager(plugin_info);

The Manager: Invocation

  • before_commit(thread_id, enum_group_transaction_origin)

Executed in the group_replication_trans_before_commit.
Since we don't have a use for this method now, we may skip its implementation for now.

  • before_rollback(thread_id, enum_group_transaction_origin)

Executed in the group_replication_trans_before_rollback.
Since we don't have a use for this method now, we may skip its implementation for now.

  • after_rollback(thread_id, enum_group_transaction_origin)

  1. shared_plugin_stop_lock->grab_read_lock();
  2. [Group_transaction_observation_manager::is_any_observer_present() != false]
    return
  3. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_applier"))]
    type= GROUP_APPLIER_TRANSACTION
  4. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_recovery"))]
    type= GROUP_RECOVERY_TRANSACTION
  5. [else]
    type = GROUP_LOCAL_TRANSACTION
  6. Group_transaction_observation_manager::after_rollback(thread_id, type);

  • after_commit(thread_id, enum_group_transaction_origin)

  1. shared_plugin_stop_lock->grab_read_lock();
  2. [Group_transaction_observation_manager::is_any_observer_present() != false]
    return
  3. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_applier"))]
    type= GROUP_APPLIER_TRANSACTION
  4. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_recovery"))]
    type= GROUP_RECOVERY_TRANSACTION
  5. [else]
    type = GROUP_LOCAL_TRANSACTION
  6. Group_transaction_observation_manager::after_commit(thread_id, type);


6.1 Utility class: Validate Primary Member

This class contains the logic to check if a chosen member is valid to be a new primary.
If no member is selected, it validates that the group can change into a primary mode setup.

Code Skeleton

//The base class request and execute an election
class Primary_election_validation_handler
 : public Group_event_listener

 // Enum for the end results of validation
 enum_primary_validation_result{
  VALID_PRIMARY       // Primary / Group is valid
  INVALID_PRIMARY     // Primary is invalid
  CURRENT_PRIMARY     // Primary is the current one
  GROUP_SOLO_PRIMARY  // Only a member can become primary
 }

public:

  //Constructor
  Primary_election_validation_handler(Queue<notifications>)

  /*
   * Validate group for election
   * @param uuid[in]   member to validate
   * @param valid_uuid[out] only member valid for election
   * @param error_msg[out] error message
   * @returns VALID_PRIMARY if valid
   * @returns INVALID_PRIMARY if not valid
   * @returns CURRENT_PRIMARY if it is already the primary
   * @returns GROUP_SOLO_PRIMARY only one member is valid
  */
  int validate_election(uuid, valid_uuid, String& error_msg)

  /* 
    Check that the UUID is valid and present in the group
    @param uuid[in]   member to validate
  */
  int validate_primary_uuid(uuid, error_message)

  /*
    Check that the group members have valid versions
    @param uuid[in]   member to validate
  */
  int validate_primary_version(error_message)

private:

  //Check that the old primary doesn't have channels
  int validate_old_primary_channels(error_message)

  //Check wich members have slave channels
  int validate_group_slave_channels(uuid, error_message)

  //Listener: React to view changes
  after_view_change(joining, leaving, group, *skip_election)

  //Listener: React to messages
  before_message_handling(message, *skip_message)

  //Number of known members uuids
  int known_messages_uuids;

  Queue<Action_notifications> notifications;

Method logic

  • Primary_election_validation_handler(Queue notifications)

  1. Register listener on Group_events_observation_manager
  2. Set the local notifications queue to the parameter.

  • validate_primary_uuid(primary_uuid, error_message)

  1. Check the uuid is valid
  2. Check if the uuid is equal to the current primary
    If so, set error message, return CURRENT_PRIMARY
  3. Check the group member manager and check the uuid exists
    If not, set error message, return INVALID_PRIMARY

  • validate_primary_version(error_message)

  1. Loop in the group, is there a member with a lower version than 8.0?
    If so, abort. Set error message, return INVALID_PRIMARY
  2. Are there members in the group with a major lower version than the appointed primary?
    If so, abort. Set error message, return INVALID_PRIMARY

  • int validate_old_primary_channels(error_message)

This method assumes it is called in all members at the same logical time.
A version of this method without this assumption would assume a query message and a response message.

  1. [If primary]
    Use the method is_any_slave_channel_running.
    Create a Group_validation_message with the response and send it.
  2. Poll the notification list for messages
    If it is a notification about a dead primary (DEAD_PRIMARY_NOTIFICATION) return VALID_PRIMARY
  3. See if the old primary has running channels.
    If so, set error message, return INVALID_PRIMARY.

  • validate_group_slave_channels(valid_uuid, error_message)

This method assumes it is called in all members at the same logical time.
A version of this method without this assumption would assume a query message and a response message.

  1. Use the method is_any_slave_channel_running.
    Create a Group_validation_message with the response and send it.
  2. Poll the notification list for messages
    If it is a notification about a dead member (DEAD_MEMBER_NOTIFICATION), skip it.
    Do it until we have known_messages_uuids messages
  3. Count the number of members with slave channels.
    [If 0] the group is valid, return VALID_PRIMARY
    [If 1] there is only one option, so set the valid_uuid param and return GROUP_SOLO_PRIMARY
    [If >1] the group cannot be run in primary mode, return INVALID_PRIMARY
    Set error message accordingly

  • int validate_election(uuid, valid_uuid, String& error_msg)

  1. [If in Single primary mode]
      [Is there a primary member]
       return validate_old_primary_channels()
      [Else]
       return VALID_PRIMARY
  2. [If in multi primary mode]
    result= validate_group_slave_channels(valid_uuid, error_message)
      [If result=GROUP_SOLO_PRIMARY && uuid is defined && uuid != valid_uuid]
       return INVALID_PRIMARY
      [If result=GROUP_SOLO_PRIMARY && uuid is defined && uuid == valid_uuid]
       return VALID_PRIMARY
      [If result=GROUP_SOLO_PRIMARY && uuid is not defined]
       return GROUP_SOLO_PRIMARY
       valid_uuid was already set in the method invocation
      [Else]
       return result

  • before_message_handling(message, *skip_message)

  1. [If message type = CT_GROUP_VALIDATION_MESSAGE]
    Update known_messages_uuids as this is a logical consistent moment
    Extract the result and uuid from the message
    Queue a notification CHANNEL_VALIDATION_NOTIFICATION with this info

  • after_view_change(joining, leaving, group, *skip_election)

  1. Update known_messages_uuids as this is a logical consistent moment
  2. [If current primary is dead]
    Queue a notification into the queue: DEAD_PRIMARY_NOTIFICATION
  3. [If a non-primary is dead]
    Queue a notification into the queue: DEAD_MEMBER_NOTIFICATION
    Use class Dead_member_notification.

Other related changes

On plugin.cc we can see the method initialize_asynchronous_channels_observer().
This method assumes that this observer only needs to be initialized if the server is in single primary mode.

This needs to be changed and the check for the primary check must pass to the observer methods.


6.2 Utility class: Invoke Primary Election

This class will be used to invoke a primary election locally or send a message to do it on all members.

Code Skeleton

// The base class request and execute an election
class Primary_election_handler

  // Enum for election types 
  enum_primary_election_mode{
    SAFE_OLD_PRIMARY   // Migrating from multi primary
    UNSAFE_OLD_PRIMARY // Changing from one primary to other
    DEAD_OLD_PRIMARY   // Old primary died
  }

  // Send a message to all members requesting an election
  int request_group_primary_election(primary_uuid,  enum_primary_election_mode);

  // Get the election message and parameters
  int handle_primary_election_message(Primary_message);

  // Elect a new primary
  int execute_primary_election(primary_uuid, enum_primary_election_mode);

  // End any running election process. 
  int terminate_election_process();

private:

  // Set the status and start certification
  int internal_primary_election(primary_uuid, enum_primary_election_mode);

  // Executes the old primary election algorithm. 
  int legacy_primary_election();

  /* The handler to handle the election on the primary member  */
  Primary_election_primary_process*   primary_election_handler;

  /* The handler to handle the election in the secondary members */
  Primary_election_secondary_process* secondary_election_handler;


Primary Election: Method logic

  • request_group_primary_election(primary_uuid, enum_primary_election_mode)

  1. Create a Single_primary_message with the given uuid and mode.
  2. Send message to group

  • handle_primary_election_message(Primary_message)

  1. Extract parameters if any
  2. Invoke the execute_primary_election() method

  • execute_primary_election(primary_uuid, enum_primary_election_mode)

This method shall be a derivative of the gcs_even_handlers method

Plugin_gcs_events_handler::handle_primary_election_if_needed()

The idea should be to copy the method and invoke it from the handler file.
The file is overloaded at the moment, so this is a plus.

Currently the method is structured as

  1. Sort members and get the valid version "frontier" iterator
  2. Check if an old primary exists, also if the member is leaving
  3. Init a sql command interface
  4. Select a new primary
  5. If the primary changed, update member roles and set read mode. Also queue a packet to activate certification.
  6. If there is no valid primary, then log a warning and set read only mode.

    So, first a comment here as due to recent code refactoring, point 3 is no longer needed.

    Also, point that if we have a chosen primary, only point 5 is needed. So that should be moved to another method.

    We then have the 2 following methods:

    • execute_primary_election(primary_uuid, enum_primary_election_mode)

  1. [no primary uuid is given]
    Executes points 1 to 4 and 6 if no valid primary is found.
    [If lowest version > 5.7]
    Invoke internal_primary_election.
    [Else]
    Invoke legacy_primary_election();
  2. [primary uuid is given] Invoke internal_primary_election

  • internal_primary_election(primary_uuid,enum_primary_election_mode mode)

Here we have the old point 5 but with heavy rework as mentioned on HLD. The steps are now resumed to:

  1. [Primary_election_secondary_process::is_election_process_running()]
    Invoke Primary_election_secondary_process::terminate_election_process()
  2. [If member uuid = primary uuid to elect]
    Invoke Primary_election_primary_process::launch_primary_election_process()
  3. [Else]
    Invoke Primary_election_secondary_process::launch_primary_election_process()

We only check the secondary process before starting as no valid case exists where a primary process is running and a new election begins.

  • legacy_primary_election()

This method preserves the old version of step 5.
This method is used when there are members in the group whose version does not contain this worklog code.

  • terminate_election_process()

  1. [Primary_election_secondary_process::is_election_process_running()]
    Invoke Primary_election_secondary_process::terminate_election_process()
  2. [Primary_election_primary_process::is_election_process_running()]
    Invoke Primary_election_primary_process::terminate_election_process()


Primary Election: Life cycle

  • Initialization

This class is initialized on plugin.cc on start_group_communication().

  • Termination

This class is terminated and deleted on terminate_plugin_modules().
The method Primary_election_handler::terminate_election_process() is invoked


Primary Election: Related code changes (Primary member message)

  • Plugin_gcs_events_handler::on_message_received(

Since this is a more generic handler that we want to be stateless, we wont rely on listeners here.

So on this method we now invoke
handle_primary_election_message(Primary_message)

  • Plugin_gcs_events_handler::>handle_leader_election_if_needed(

This method will now consist of:

  1. [Member not in primary mode && there is no running election process]
    Return
  2. [Is the primary dead]
    Invoke execute_primary_election(NULL, DEAD_OLD_PRIMARY)


6.2.1 Utility class: Invoke Primary Election - The primary sub process

This class will be used control the election process on the new appointed member.

The primary process: Code Skeleton

// The class that controls the election from the primary perspective. 
class Primary_election_primary_process 
  : public Group_event_listener
  /*
    Launch the local process on the primary member for primary election

    @param election_mode the context on which election is occurring 

    @returns 0 in case of success, or 1 otherwise
  */
  int launch_primary_election_process(enum_primary_election_mode election_mode);

  /*
    Is the election process running? 
    @returns  election_process_running
  */
  bool is_election_process_running()

  /*
    Terminate the election process on shutdown
  */
  int terminate_election_process()

private:

  /*
    Internal thread execution method with the election process 
  */
  int primary_election_process_handler();

  //Listener: React to view changes
  after_view_change(joining, leaving, group, *skip_election)

  //Listener: React to messages
  before_message_handling(message, *skip_message)

  /* Is the election process running */
  bool election_process_running;
  /* Is the process aborted */
  bool election_process_aborted;
  /* Waiting for old primary transaction execution */
  bool waiting_on_old_primary_transactions;

  /* The election invocation context */
  enum_primary_election_mode election_mode;

  //The number of members known for the current action
  list<uuid> known_members_uuids;

  /* The stage handler for progress reporting*/
  Plugin_stage_monitor_handler* stage_handler;

  mysql_mutex_t election_lock;
  mysql_cond_t  election_cond;


The primary process: Method logic

int launch_primary_election_process(enum_primary_election_mode election_mode)

  1. Set the election_mode field
  2. Set the list of know member uuids: known_members_uuids
    Must be done here as this step is executed under the GCS serial process
  3. Register the listeners for group events.
  4. Instantiate the stage_handler
  5. Launch a thread that will call primary_election_process_handler();
  6. Check that the thread was launched and it running.

int primary_election_process_handler()

  1. Set election_process_running = true
  2. [election_mode == SAFE_OLD_PRIMARY]
    Go to step 5
  3. Submit a Queue_checkpoint_packet on the applier module and wait for it to be consumed
  4. Use the channel_get_retrieved_gtid_set method from the channel interface to get the current applier retrieved set.
    Loop until the server GTID executed contains all the retrieved GTIDs.
  5. Send a message stating that the primary is now ready for election
    Use a Single_primary_message with type SINGLE_PRIMARY_PRIMARY_READY.
  6. [election_mode != DEAD_OLD_PRIMARY]
    Execute Applier_module::queue_certification_enabling_packet(true).
  7. Set the server super read only mode to false.
    Use enable_server_read_mode.
  8. [election_mode == DEAD_OLD_PRIMARY]
    return
  9. Wait for all members to be in read mode
    lock election lock
    while(known_members_uuids is not empty) wait on election condition
    unlock election condition
  10. Set waiting_on_old_primary_transactions to true
    Execute Applier_module::end_multi_primary_period()
  11. The certification disabling process follows the old algorithm from this point.
  12. Wait for all transactions of old primary to be executed
    lock election lock
    while(waiting_on_old_primary_transactions) wait on election condition
    unlock election condition
  13. End the stage on Plugin_stage_monitor_handler;
  14. Unregister the group event listeners
  15. Declare election_process_running = false;

after_view_change(joining, leaving, group, *skip_election)

  1. Lock election lock
  2. Remove the leaving members from known_members_uuids
  3. [known_members_uuids is empty]
    Awake the election_condition
  4. Unlock the election lock

before_message_handling(message, *skip_message)

  1. Lock election lock
  2. [If message type = SINGLE_PRIMARY_READ_MODE_SET]
    Remove the received uuid from the known_members_uuids list
    [known_members_uuids is empty]
    Execute the observer after_primary_election(primary_uuid, 0)
    Awake the election condition
  3. [If message type = SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE]
    waiting_on_old_primary_transactions = false
    Awake the election condition
  4. Unlock the election lock

int terminate_election_process()

It is assumed here that step 4,9 and 12 of primary_election_process_handler() have termination flags for election_process_aborted;

  1. Set election_process_aborted to true;
  2. Execute event_is_consumed() for the Queue_checkpoint_packet
  3. Awake the election condition.
  4. Wait for election_process_running = false


The primary process: Monitoring

Here we describe when process stages change and how we do the monitoring of progress.

Here, we will use the steps from primary_election_process_handler()

  • Step 2

The stage is set to

Primary Election: applying buffered transactions

The progress is set in the handler, passing into it the Plugin_stage_monitor_handler object.

  • Step 8

The stage is set to

Primary Election: Waiting for members to turn on super_read_only

The estimated work is the size of known_members_uuids Progress is reported when the array changes.

  • Step 9

The stage is set to

Primary Election: Stabilizing transactions from former primaries. 

We can either do:
1) A sleep loop on the thread process checking the difference between received and executed GTIDs
2) Just set estimated work to 1 and set progress when we see a message of type SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE.
We might go for 2 given the time restrictions on implementation.


The primary process: Error handling

It is assumed that when the thread errors out for some reason, the process will leave the group and the plugin will enable the read mode on the server.
The hook after_primary_election will be invoked with an error value to alert possible listeners.


6.2.1 Utility class: Invoke Primary Election - The secondary sub process

This class will be used control the election process on the secondary members of the election.

The secondary process: Code Skeleton

// The class that controls the election from the secondary perspective. 
class Primary_election_secondary_process 
  : public Group_event_listener

  /*
    Launch the local process on the secondary members for primary election

    @param election_mode the context on which election is occurring 

    @returns 0 in case of success, or 1 otherwise
  */
  int launch_primary_election_process(enum_primary_election_mode election_mode);

  /*
    Is the election process running? 
    @returns  election_process_running
  */
  bool is_election_process_running()

  /*
    Terminate the election process on shutdown
  */
  int terminate_election_process()

private:

  /*
    Internal thread execution method with the election process 
  */
  int primary_election_process_handler();

  //Listener: React to messages
  before_message_handling(message, *skip_message)

  //Listener: React to view changes
  after_view_change(joining, leaving, group, *skip_election)

  /* The stage handler for progress reporting*/
  Plugin_stage_monitor_handler* stage_handler;

  /* Is the election process running */
  bool election_process_running;
  /* Is the process aborted */
  bool election_process_aborted;
  /* Waiting for old primary transaction execution */
  bool waiting_on_old_primary_transactions;

  //The number of members known for the current action
  list<uuid> known_members_uuids;

  /* Is the primary ready? */
  bool primary_ready;
  mysql_mutex_t election_lock;
  mysql_cond_t  election_cond;


The secondary process: Method logic

int launch_primary_election_process(enum_primary_election_mode election_mode)

  1. Set the election_mode field
  2. Set the list of know member uuids: known_members_uuids
  3. Register the listeners for group events.
  4. Instantiate the stage_handler
  5. Launch a thread that will call primary_election_process_handler();
  6. Check that the thread was launched and it running.

int primary_election_process_handler()

  1. election_process_running = true
  2. Wait for primary ready message.
    lock election lock
    while(!primary_election) wait on election condition
    unlock election condition
  3. [election_mode != DEAD_OLD_PRIMARY]
    Set waiting_on_old_primary_transactions to true
    Execute Applier_module::queue_certification_enabling_packet(false).
  4. Set the server super read only mode to true.
    On failure (if not aborted) invoke abort_server_process()
  5. Send message as the member is on read mode
    Use a Single_primary_message with type SINGLE_PRIMARY_READ_MODE_SET.
  6. The certification disabling process follows the old algorithm from this point.
  7. Wait for all transactions of old primary to be executed lock election lock
    while(waiting_on_old_primary_transactions) wait on election condition
    unlock election condition
  8. End the stage on Plugin_stage_monitor_handler;
  9. Unregister the group event listeners
  10. Declare election_process_running = false;

before_message_handling(message, *skip_message)

  1. Lock election lock
  2. [If message type = SINGLE_PRIMARY_PRIMARY_READY]
    Set primary_ready to true
    Awake the election condition
  3. [If message type = SINGLE_PRIMARY_READ_MODE_SET]
    Remove the received uuid from the known_members_uuids list
  4. [known_members_uuids is empty]
    Execute the observer after_primary_election(primary_uuid, 0)
  5. Unlock the election lock

after_view_change(joining, leaving, group, *skip_election)

  1. Remove the leaving members from known_members_uuids
  2. Set election_process_aborted to true; (Accelerate the termination process)

int terminate_election_process()

It is assumed here that step 2 and 7 of primary_election_process_handler() have termination flags for election_process_aborted;

  1. Set election_process_aborted to true;
  2. Instantiate a new SQL session and issue a KILL QUERY to the read mode query.
  3. Awake the election condition.
  4. Wait for election_process_running = false


The secondary process: Monitoring

Here we describe when process stages change and how we do the monitoring of progress.

Here, we will use the steps from primary_election_process_handler()

  • Step 1

The stage is set to

Primary Election: Waiting on current primary transaction execution

Estimated work is 1, and progress is incremented when the message comes.

  • Step 3

The stage is set to

Primary Election: Waiting for members to turn on super_read_only

The estimated work is the size of known_members_uuids Progress is reported when the array changes.

  • Step 5

The stage is set to

Primary Election: Stabilizing transactions from former primaries. 

There is no good way to track progress here.
So we just set estimated work to 1 and progress is set when the message from the primary comes.


The secondary process: Error handling

It is assumed that when the thread errors out for some reason, the process will leave the group and the plugin will enable the read mode on the server.
The hook after_primary_election will be invoked with an error value to alert possible listeners.

We don't include here errors when enabling read mode as they will lead to a server abort (as pointed on the HLD)


6.3 Utility class: Check Server query execution

This class will be used to extract the number of running transactions in the server.

Code Skeleton

// Class to query about what transactions are running
class Server_query_execution_handler :
 public Group_transaction_listener

public:

  /*
    Get the list of running transactions from the server
    @param ids[out] an array of thread ids
    @returns 0 in case of success, 1 in case of error
  */
  int get_server_running_transactions(my_thread_id** ids)

  /*
    Gets running transactions and waits for its end
    @returns 0 in case of success, 1 in case of error
  */
  int wait_for_current_transaction_load_execution()

  // Abort any running waiting process
  void abort_waiting_process();

  after_rollback(thread_id, enum_group_transaction_origin)

  after_commit(thread_id, enum_group_transaction_origin)

private:

  queue<thread_id> thread_ids_finished;

  lock query_wait_lock;

  bool wait_process_aborted;

Method logic

  • get_server_running_transactions(my_thread_id ids)**

  1. Get server service for transaction querying
  2. If valid, get all current running transactions.
  3. Discard the invoking thread id if in the list

  • wait_for_current_transaction_load_execution(Plugin_stage_monitor_handler stage_handler=NULL)

  1. Lock query_wait_lock
  2. Register itself on Group_transaction_observation_manager::register_channel_observer This allows the code to receive notifications for commits and aborts.
  3. Invoke get_server_running_transactions(list_of_thread_ids)
  4. Unlock query_wait_lock
  5. while (list_of_thread_ids.size != 0 || !wait_process_aborted)
      remove any all members from list_of_thread_ids that match thread_ids_finished
      execute get_server_running_transactions(new_list_of_thread_ids)
      remove any entry that is on list_of_thread_ids and not new_list_of_thread_ids
      sleep 1 second

In terms of monitoring, i.e, if a Plugin_stage_monitor_handler is given

  1. Set the estimated work to the number of transactions in the list_of_thread_ids
  2. Whenever the code loops, set the completed work to the initial total minus the remaining transactions.

  • abort_waiting_process()

  1. wait_process_aborted = true;

  • after_rollback(thread_id, enum_group_transaction_origin)

  1. Lock query_wait_lock
  2. Add thread id to thread_ids_finished
  3. Unlock query_wait_lock

  • after_commit(thread_id, enum_group_transaction_origin)

  1. Lock query_wait_lock
  2. Add thread id to thread_ids thread_ids_finished
  3. Unlock query_wait_lock

Server Service

This query into how many transactions are running in the server is made trough a server service.

BEGIN_SERVICE_DEFINITION(transactional_querying_service)
                         DECLARE_METHOD(size_t,
                         get_server_transactions,
                         (unsigned long** ids));
END_SERVICE_DEFINITION(transactional_querying_service)

This service is then added to the server components in

components/mysql_server/server_component.cc
components/mysql_server/server_component.h

The plan for the method is: first create a class

class Get_running_transactions : public Do_THD_Impl

public:

  /*
    Method executed for each thread
  */
  virtual void operator()(THD *thd)

Then when the service is invoked do

 Get_running_transactions trx_counter;
 Global_THD_manager::get_instance()->do_for_all_thd(&trx_counter);
 trx_counter.get_transaction_ids();

About the operator method, the idea is for each thread check

  1. Has the thread a query plan?
    If it is running, it has one.
    We can also filter DML queries here, since we don't care for DDL

  2. If there is no query plan, then maybe the transactions is in between statements.
    If that is true, then the method
    -in_active_multi_stmt_transaction()
    will return true.

Some considerations about this service.
Yes, this service may return a transaction that just finish or fail to return a transaction that just started.
Lets look at the context were we need though.
There are a bunch of transactions that may have started and are running that will be now incompatible because of reason R.
If we only want for these to end, if this service is executed after R is changed, then all the new transactions that are now starting don't matter.
Also, the ones that ended only mean less trouble for us.


6.4 Utility class: SET PERSIST

This class will be used to persist system variables using the session API to call SET PERSIST commands.

This commands will make some of the changes made in the plugin persistent to restarts.

Code Skeleton

// Class to execute SET PERSIST queries 
class Persistent_variables_handler

public:

  /*
    Get the list of running transactions from the server
    @param name the name of the query
    @param value the value to set in the variable
    @param session_isolation what isolation the server connection must have 

    @note use this method when there is not an open server connection

    @returns 0 in case of success, or the error value from the query
  */
  int set_persistent_variable(string name, string value, enum_plugin_con_isolation session_isolation)

  /*
    Get the list of running transactions from the server
    @param name the name of the query
    @param value the value to set in the variable
    @param command_interface the interface to the session API 

    @note use this method when there is already an open server connection 

    @returns 0 in case of success, or the error value from the query
  */
  int set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)

Method logic

  • set_persistent_variable(string name, string value, enum_plugin_con_isolation session_isolation)

  1. Create a Sql_service_command_interface instance.
  2. Invoke set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)

  • set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)

  1. Construct the set persist query with the given parameters.
  2. Execute the query and extract the return result. Throw an error if needed


6.5 Utility class: Abort server mechanism

Code Skeleton

No need for a class here, just add a method to plugin_utils.h/cc

int abort_server_process()

Method logic

  • abort_server_process()

  1. Set a registry reference extracted from mysql_plugin_registry_acquire
  2. Fetch the server_abort_service service from the registry
  3. Invoke the abort_server_process in the service

The Service

The idea behind this class it use a service that will encapsulate an abort procedure.

So we need a new service

BEGIN_SERVICE_DEFINITION(server_abort_service)
    DECLARE_BOOL_METHOD(abort_server_process, const char* message);
END_SERVICE_DEFINITION(server_abort_service)

This service is then added to the server components.

The implementation of such a method would be similar to the current implementation of exec_binlog_error_action_abort.

  1. [Is THD present]
    Try to send an error to the client about the fatal error
    [else]
    Print an error to the log.
  2. Invoke abort()

This also means a new error should be added like

ER_SERVICE_ABORT: A component aborted the mysql server: %s

since the basic ER_ABORTING doesn't allow generic messages.


6.6 Utility class: Plugin stages for monitoring

An important part of this WL is the monitoring of actions currently being executed.
As described in the High Level Design, the idea is to use thread stages to express the step the group action currently is and its progress.

Lets start with the base class that takes inspiration from the clone plugin clone_monitor.h.

Code Skeleton

// Class to execute SET PERSIST queries 
class Plugin_stage_monitor_handler

public:

  /* The class constructor */
  Plugin_stage_monitor_handler();

  /* The class destructor */
  ~Plugin_stage_monitor_handler();

  /*
    Set that a new stage is now in progress. 
    @param key The PSI key for the stage
    @param function the file for this stage
    @param line the line of the file for this stage
    @param estimated_work what work is estimated for this stage
    @param completed_work what work already completed for this stage

    @returns 0 in case of success, or 1 otherwise
  */
  int set_stage(PSI_stage_key key, string file, int line,
                ulonglong estimated_work, ulonglong work_completed)

  /*
    Set the currently estimated work for this stage
  */
  int set_estimated_work(ulonglong estimated_work)

  /*
    Set the currently completed work for this stage
  */
  int set_completed_work(ulonglong completed_work)

  //get methods

  /*
    End the current stage
  */
  int end_stage();

private:
  SERVICE_TYPE(registry) *registry;
  my_service<SERVICE_TYPE(psi_stage_v1)> stage_service;
  PSI_stage_progress* stage_progress_handler;

Method logic

  • Plugin_stage_monitor_handler()

  1. Set the registry field with a reference extracted from mysql_plugin_registry_acquire
  2. Fetch the psi_stage_v1 service from the registry and set stage_service

  • ~Plugin_stage_monitor_handler()

  1. Delete stage_service
  2. Use mysql_plugin_registry_release to relase the registry field;

  • set_stage(PSI_stage_key key, string file, int line, ulonglong estimated_work, ulonglong work_completed)

  1. Invoke the start_stage method in the service with the given key, file and line.
  2. Set stage_progress_handler with the PSI_stage_progress object returned on 2
  3. Set the estimated work and completed work on stage_progress_handler

  • set_estimated_work(ulonglong estimated_work)

  1. Set the current work being estimated on stage_progress_handler

  • set_estimated_work(ulonglong estimated_work)

  1. Set the current completed work on stage_progress_handler

  • end_stage

  1. Just invoke end_stage on the service


Life-cycle

Under this worklog, this utility makes sense in the context of a group action execution.
Hence, it makes sense that an instance is created every time an action is accepted.
The service is then only used while the action is running.
This does not invalidate that other server parts may use this handler for other purposes with a different life cycle.
Such an example is the primary election algorithm that will use stages even outside its invocation trough group actions.


Stage keys

One of the key parts of this stage instrumation is the keys. They shall be registerd under the plugin_psi.h/cc under the form

PSI_stage_info gr_stage_group_action_running=
 {0, "Executing some group stage", PSI_FLAG_STAGE_PROGRESS};

As described in the HLD the stage keys are:

Multi-primary Switch: waiting for pending transactions to finish.

Multi-primary Switch: waiting on another member step completion

Multi-primary Switch: applying buffered transactions.

Multi-primary Switch: waiting for operation to complete on all members.


Single-primary Switch: checking group pre-conditions.

Single-primary Switch: executing primary election

Single-primary Switch: waiting for operation to complete on all members.


Primary switch: checking current primary pre-conditions.

Primary Switch: waiting for pending transactions to finish.

Primary Switch: waiting on another member step completion

Primary Switch: executing primary election

Primary Switch: waiting for operation to complete on all members.


Primary Election: applying buffered transactions.

Primary Election: Waiting on current primary transaction execution

Primary Election: Waiting for members to turn on super_read_only

Primary Election: Stabilizing transactions from former primaries. 


7. Messages

7.1 New Message: Action message

The messages used by actions must be extensible as new actions might emerge.

Message type

On gcs_plugin_messages.h add to
enum_cargo_type
the new type
CT_GROUP_ACTION_MESSAGE


Group_action_message - Code Skeleton

//The base message for action messages
class Group_action_message : public Plugin_gcs_message

  // Enum for message payload
  enum_action_message_type{
   PIT_UNKNOWN= 0,      // Not used
   PIT_ACTION_TYPE=1,   // The action type
   PIT_ACTION_PHASE=2,  // The action phase
   PIT_ACTION_DATA=3,   // The action data
   PIT_MAX
  }

  // Enum for the types of message / actions
  enum_action_message_type{
   ACTION_MULTI_PRIMARY_MESSAGE      // Change to multi primary
   ACTION_PRIMARY_ELECTION_MESSAGE  // Elect a primary member
  }

 // Enum for the phase of the action in the message
 enum_action_message_phase{
  ACTION_START_PHASE  // Start a new action
  ACTION_END_PHASE    // The action was ended
  ACTION_ABORT_PHASE  // The action was aborted
 }

public:

  // Constructor
  Group_action_message(enum_action_message_type, enum_action_message_phase)

  // Get the action type for this message
  enum_action_message_type get_action_type()

  // Get the action phase for this message
  enum_action_message_phase get_action_phase()

protected:

  /*
    The inherited encode method
    @param buffer  [out]  the message encoded
  */
  void encode_payload(buffer);

  /*
    The inherited decode method
    @param[in] buffer the received data
    @param[in] end    the end pointer
  */
  void decode_payload(buffer, end)

  /*
    Encode the data associated to the action if existent
    @param buffer  [out]  the message encoded
  */
  virtual void encode_action_data(buffer);

  /*
    Decode the data associated to the action if existent
    @param[in] buffer the received data
    @param[in] end    the end pointer
  */
  virtual void decode_action_data(buffer, end);

 private:

  // The action type for this message
  enum_action_message_type action_type

  // If it is a start or stop message
  enum_action_message_phase action_phase

  // The potencial payload this action class has
  const uchar * action_data


Group_action_message - Method logic

  • Group_action_message(enum_action_message_type, enum_action_message_phase)

  1. Set action type
  2. Set action phase
  3. action_data remains empty

  • encode_payload(buffer)

  1. Encode message type
  2. Encode action phase
  3. Invoke encode_action_data(buffer);

  • decode_payload(buffer)

  1. Decode and set message type
  2. Decode and set action phase
  3. Invoke encode_action_data(buffer);

  • encode_action_data(buffer)

Since this is the default implementation of the method, nothing is done here.

  • decode_action_data(buffer)

The default implementation of this method copies the remaining payload to action_data


Group_action_message - Code related changes

  • Plugin_gcs_events_handler::on_message_received(const Gcs_message& message)

  1. Add another case for CT_GROUP_ACTION_MESSAGE.
  2. Get the Group_action_coordinator instance.
  3. Invoke Group_action_coordinator::handle_action_message()


Primary_election_action_message - Code Skeleton

//The class for primary election message
Primary_election_action_message :public Group_action_message

public:

  // Constructor
  Primary_election_action_message(enum_action_message_phase, uuid)

  // Constructor
  Primary_election_action_message(Group_action_message)

protected:

  /*
   Encode the data associated to the action if existent
    @param buffer  [out]  the message encoded
  */
  virtual void encode_action_data(buffer);

  /*
   Decode the data associated to the action if existent
   @param[in] buffer the received data
   @param[in] end    the end pointer
  */
  virtual void decode_action_data(buffer, end);

 private:

  // The uuid for election, can be empty if not defined
  string primary_uuid


Primary_election_action_message - Method logic

  • Primary_election_action_message(enum_action_message_phase, uuid)

  1. Set action type to ACTION_PRIMARY_ELECTION_MESSAGE
  2. Set action phase to the given parameter
  3. Set the primary uuid.

  • Primary_election_action_message(Group_action_message)

  1. Assert action type is equal to ACTION_PRIMARY_ELECTION_MESSAGE
  2. Copy action phase
  3. Decode the uuid from action_data

  • encode_action_data(buffer)

  1. Encode the primary uuid

  • decode_action_data(buffer)

  1. Decode the primary uuid


7.2 New Message: Validation message

These messages are used to know that there are no slave channels in non primary members.
This message can be used in the future for validations on other processes.

Message type

On gcs_plugin_messages.h add to
enum_cargo_type
the new type
CT_GROUP_VALIDATION_MESSAGE


Group_validation_message - Code Skeleton

//The base message for action messages
class Group_validation_message : public Plugin_gcs_message

  // Enum for message payload
  enum_action_message_type{
   PIT_UNKNOWN= 0,          // Not used
   PIT_VALIDATION_TYPE=1,   // The validation type
   PIT_VALIDATION_CHANNEL=2,  // The member has channel flag
   PIT_MAX
  }

  // Enum for the types of message / actions
  enum_action_message_type{
   ACTION_CHANNEL_VALIDATION_MESSAGE // Channel presence msg
  }

public:

  // Constructor
  Group_validation_message(bool has_channels)

  // Does who sent this message has slave channels
  bool has_slave_channels()

protected:

  /*
    The inherited encode method
    @param buffer  [out]  the message encoded
  */
  void encode_payload(buffer);

  /*
    The inherited decode method
    @param[in] buffer the received data
    @param[in] end    the end pointer
  */
  void decode_payload(buffer, end)

 private:

  // Does the member has channels?
  bool* has_channels


Group_validation_message - Method logic

  • Group_validation_message(bool has_channels)

  1. Set has_channels

  • encode_payload(buffer)

  1. Encode message type
  2. Encode has_channels

  • decode_payload(buffer)

  1. Decode message type
  2. Decode has_channels



7.3 Primary member message extension

This is an extension of an already existent message class.
CT_SINGLE_PRIMARY_MESSAGE

Primary member message - Code Skeleton

//The base message for action messages
class Single_primary_message : public Plugin_gcs_message

  // Enum for message payload
  enum_action_message_type{
   PIT_UNKNOWN= 0,          // Not used
   PIT_SINGLE_PRIMARY_MESSAGE_TYPE= 1, // The message type
   + PIT_SINGLE_PRIMARY_SERVER_UUID= 2,  // Uuid to elect
   + PIT_SINGLE_PRIMARY_ELECTION_MODE=3, // The election mode
   PIT_MAX
  }

  // Enum for the types of message / actions
  enum_action_message_type{
   SINGLE_PRIMARY_UNKNOWN
   SINGLE_PRIMARY_NEW_PRIMARY_MESSAGE
   SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE
   +SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS
   +SINGLE_PRIMARY_PRIMARY_ELECTION
   +SINGLE_PRIMARY_PRIMARY_READY
   +SINGLE_PRIMARY_READ_MODE_SET
   SINGLE_PRIMARY_MESSAGE_TYPE_END
  }

public:

  // Constructor
  Single_primary_message(string primary_to_elect, enum_primary_election_mode mode);

  /*
    Returns the primary to elect for election messages
    @param uuid  [out]  the server uuid
  */
  void get_primary_to_elect(string& uuid)

protected:

  /*
    The inherited encode method
    @param buffer  [out]  the message encoded
  */
  void encode_payload(buffer);

  /*
    The inherited decode method
    @param[in] buffer the received data
    @param[in] end    the end pointer
  */
  void decode_payload(buffer, end)

 private:

  // The uuid for the primary member
  String primary_uuid
  // The election mode
  enum_primary_election_mode election_mode


Primary member message - Method logic

  • Single_primary_message(string primary_to_elect, enum_primary_election_mode)

  1. Set type to SINGLE_PRIMARY_PRIMARY_ELECTION
  2. Set the uuid for the primary to be elected
  3. Set the mode

  • encode_payload(buffer)

  1. Encode message type
  2. [If type == SINGLE_PRIMARY_PRIMARY_ELECTION]
    Encode the primary_uuid parameter Encode the election_mode

  • decode_payload(buffer)

  1. Decode and set message type
  2. [If type == SINGLE_PRIMARY_PRIMARY_ELECTION]
    Decode and set primary_uuid


Primary member message - Backport considerations

For members in 5.7 that receive this message, there should be no associated issues with these additions.
Members in 5.7, there is no real change to the old messages and the old decode and encode methods still work correctly.
Only members on 8.0+ should receive the new election messages.



8. UDF functions

One important point in the design is that these actions are made trough user defined functions.

SELECT group_replication_switch_to_single_primary_mode([server_uuid]);

SELECT group_replication_switch_to_multi_primary_mode();

SELECT group_replication_set_as_primary(server_uuid);

Besides the necessary code base support, these also need to be created alongside the plugin installation.
In previous server versions this meant the user had to execute SQL commands to create the functions, but not on the 8.0.2+ versions.
With the UDF install service, these functions can now be created alongside the install.

Code Skeleton - Functions

PLUGIN_EXPORT char*
group_replication_switch_to_single_primary_mode(UDF_INIT*,
                                                UDF_ARGS *args,
                                                char *result,
                                                unsigned long *length,
                                                char*, char*)

PLUGIN_EXPORT my_bool
group_replication_switch_to_single_primary_mode_init(UDF_INIT* initid,
                                                     UDF_ARGS* args,
                                                     char* message)

PLUGIN_EXPORT void
group_replication_switch_to_single_primary_mode_deinit(UDF_INIT*)

PLUGIN_EXPORT char*
group_replication_switch_to_multi_primary_mode(UDF_INIT*,
                                               UDF_ARGS *arg,
                                               char *res,
                                               unsigned long *length,
                                               char*, char*)

PLUGIN_EXPORT my_bool*
group_replication_switch_to_multi_primary_mode_init(UDF_INIT* initid,
                                                    UDF_ARGS* args,
                                                    char* message)

PLUGIN_EXPORT void
group_replication_switch_to_multi_primary_mode_deinit(UDF_INIT*)


PLUGIN_EXPORT char*
group_replication_set_as_primary(UDF_INIT*,
                                 UDF_ARGS *arg,
                                 char *res,
                                 unsigned long *length,
                                 char*, char*)

PLUGIN_EXPORT my_bool*
group_replication_set_as_primary_init(UDF_INIT* initid,
                                      UDF_ARGS* args,
                                      char* message)

PLUGIN_EXPORT void
group_replication_set_as_primary_deinit(UDF_INIT*)

These must be announced trough a settings file

rapid/plugin/group_replication/group_replication.def

Method logic - Functions

  • group_replication_switch_to_single_primary_mode_init

  1. Check that the parameter count is 0 or 1
  2. If given, check the parameter is a valid uuid
  3. Check the uuid belongs to one of the members

  • group_replication_switch_to_single_primary_mode

  1. Lock the plugin auto_lock
  2. [Is plugin running?]
    If not, return
  3. Check if the state is not the current already.
  4. Group_action action = new Primary_election_action(uuid);
    error= group_action_coordinator.coordinate_action_execution(action);
  5. return to the user.
    Use Group_action::get_error_message if needed.

  • group_replication_switch_to_multi_primary_mode

  1. Lock the plugin auto_lock
  2. [Is plugin running?]
    If not, return
  3. Check if the state is not the current already.
  4. Group_action action = new Multi_primary_migration_action();
    error= group_action_coordinator.coordinate_action_execution(action);
  5. return to the user.
    Use Group_action::get_error_message if needed.

  • group_replication_set_as_primary_init

  1. Check that the parameter count is 1
  2. Check the parameter is a valid uuid
  3. Check the uuid belongs to one of the members

  • group_replication_set_as_primary

  1. Lock the plugin auto_lock
  2. [Is plugin running?]
    If not, return
  3. Check if the state is not the current already.
  4. Group_action action = new Primary_election_action(uuid);
    error= group_action_coordinator.coordinate_action_execution(action);
  5. return to the user.
    Use Group_action::get_error_message if needed.

Function installation

To automatically create the functions at plugin install we can use the UDF install service.

my_service<SERVICE_TYPE(udf_registration)> service("udf_registration.mysql_server", r);

service->udf_register("group_replication_change_primary_to",
                      Item_result::STRING_RESULT,
                      (Udf_func_any) group_replication_switch_to_single_primary_mode,
                      group_replication_switch_to_single_primary_mode_init,
                      group_replication_switch_to_single_primary_mode_deinit);

This code is to be located in the plugin install.
Due to the server initialization order we may have to rely on the Delayed initialization thread.


9. Applier Module Action Packet extension - Queue Checkpoint Packet

This small section is about a small addition to the applier module.
The idea is to have a packet that you can use to wait until it is processed, i.e., until the current queue is consumed.

Code Skeleton

enum enum_packet_action
{
  TERMINATION_PACKET=0,  //Packet for a termination action
  SUSPENSION_PACKET,     //Packet to signal something to suspend
  CHECKPOINT_PACKET      //Packet to wait for queue consumption
  ACTION_NUMBER= 2       //The number of actions
};

/**
  @class Queue_checkpoint_packet
  A packet to wait for queue consumption 
*/
class Queue_checkpoint_packet: public Action_Packet
{
public:

  /**
    Create a new action packet.
    @param  action           the packet action
  */
  Queue_checkpoint_packet()
    :Action_Packet(CHECKPOINT_PACKET), packet_consumed(false)
  {
    init lock;
    init condition;
  }

  ~Queue_checkpoint_packet() {}

  void wait_on_event_consumption();

  void event_is_consumed();

private: 
  bool packet_consumed; 
  mysql_mutex_t lock;
  mysql_cond_t  cond;
};

Method logic

  • wait_on_event_consumption()

  1. Lock
  2. while the packet is not consumed wait
  3. unlock

  • event_is_consumed()

  1. lock
  2. set the flag to true
  3. unlock

Related changes

  • Applier_module::apply_action_packet(Action_packet *action_packet)

On the method add a branch that does

if (action == CHECKPOINT_PACKET)
{
  cast the packet to Queue_checkpoint_packet
  invoke event_is_consumed()
  return false
}


10. File Structure

Due to the number of files on plugin folder, this WL proposes a more structured approach to the code.

All folder below, unless specified, refer to the base: rapid/plugin/group_replication
The + means addition of a new file.
The m means the move of an existing file.

Note that these change also affect the structure inside plugin CMakeLists.txt like

SET(GROUP_REPLICATION_SOURCES
  src/*.cc
  src/XXX/*.cc
  src/YYY/*.cc

Coordinator and actions

These classes, as a new concept in Group Replication are located in a new folder: group_actions

So we have

+ src/group_actions/group_action_coordinator.cc
+ src/group_actions/group_action.cc
+ src/group_actions/multi_primary_migration_action.cc
+ src/group_actions/primary_election_action.cc

same for ".h" on include/group_actions/

Group action - notifications

Used solely for group actions for now, we propose the place the notification on: group_actions/notifications

+ src/group_actions/notifications/action_notification.cc
+ src/group_actions/notifications/dead_member_notification.cc
+ src/group_actions/notifications/channel_validation_notification.cc

same for ".h" on include/group_actions/notifications/

Observers

We already had some observers in the plugin for replications channel.
With this worklog we add two more to be located under: plugin_observers

+ src/plugin_observers/group_event_listener.cc
+ src/plugin_observers/group_transaction_listener.cc
m src/plugin_observers/channel_observation_manager.cc

same for ".h" on include/plugin_observers/

Handlers

All classes that used in the plugin to execute a contained action should be isolated into the folder plugin_handlers

So we have

+ src/plugin_handlers/primary_election/primary_election_validation_handler.cc
+ src/plugin_handlers/primary_election/primary_election_invocation_handler.cc
+ src/plugin_handlers/primary_election/primary_election_primary_process.cc
+ src/plugin_handlers/primary_election/primary_election_secondary_process.cc
+ src/plugin_handlers/server_transaction_checks_handler.cc
+ src/plugin_handlers/persistent_variables_handler.cc
+ src/plugin_handlers/stage_monitor_handler.cc
m src/plugin_handlers/read_mode_handler.cc

same for ".h" on include/plugin_handlers/

Messages

With the addition of new messages to the plugin, it is time to also have a dedicated folder: plugin_messages

+ src/plugin_messages/group_action_message.cc
+ src/plugin_messages/primary_election_action_message.cc
+ src/plugin_messages/group_validation_message.cc
m src/plugin_messages/single_primary_message.cc
m src/plugin_messages/recovery_message.cc

same for ".h" on include/plugin_messages/

UDF functions

For UDF functions we are adding

+ src/plugin_udf_functions.cc
+ include/plugin_udf_functions.h

And also the definition file:

+ /group_replication.def

Services

For the server side implementation we need to create the new files:

+ include/mysql/components/services/transactional_querying.h
+ sql/server_compoment/dynamic_transactional_querying_impl.cc
+ sql/server_compoment/dynamic_transactional_querying_impl.h
+ sql/server_compoment/server_abort.cc
+ sql/server_compoment/server_abort.h