WL#10378: Group Replication: group single/multi primary mode change and primary election

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

## Executive Summary

This worklog implements a framework to do group-wide configuration
changes. After this worklog is done, the user will be able to
change single_primary_mode without having to stop group replication,
and he will also be able to trigger the election of a specific member
as the new primary.

## Background

Group Replication can be configured to run on multi primary
(multi-primary) or single primary mode.
Both these modes have their use cases but situations may occur where
the user may want to go from one to the other with no downtime and
currently such a change would require a rolling shutdown of the group
members.

This worklog aims to make the change from multi-primary to single
primary possible with a simple invocation of a function. The group should
coordinate to elect a primary, enable or disable the read only modes
on the correct members and execute any other necessary step that is
deemed necessary.

This worklog should also provide to the end user the much requested
option to force the election of a new primary member of his/her
choice. Until now the primary would be the first member of the group,
or when it died/stopped, the one with the lowest UUID or greater
member weight.

This last feature is included in this worklog as it is a derivation of the
coordination process that must be in place for the above live changes
from multi-primary to single-primary mode and vice-versa.

## User Stories

- As a MySQL DBA I want to change the primary from member A to member
  B without having to remove member A from the group, so I can demote
  member A to secondary.

- As a MySQL DBA I want to change from single primary mode to multi
  primary mode without stopping group replication, so I can from now
  on write to multiple servers.

- As a MySQL DBA I want to change from multi primary mode to single
  primary, so I adjust the deployment mode now that I have figured that
  multi primary is not really fitting my use case.

Functional requirements 
-----------------------

* FR1: To execute a coordinated group configuration change the member
  must be on ONLINE state and belong to a reachable group majority.
  
* FR2: To execute a coordinated group configuration change, the user
  must have GROUP_REPLICATION_ADMIN privileges.

* FR3: If the coordinated group configuration change is invoked with a
  server UUID, that value shall be valid and must belong to a member
  of the group. An error shall be outputed otherwise.

* FR4: When the group is in **multi-primary** mode and the user causes a
  change to **single-primary** mode the group must:
  - FR 4.1: Elect a primary. If one is selected by the user, that member must be chosen.
  - FR 4.2: The primary shall be writable after processing the local backlog
  - FR 4.3: Secondaries shall enable the server super_read_only mode.
  - FR 4.4: Update everywhere checks is set to False but only after all
    transactions from the old primary are applied.

* FR5: When the group is in **single-primary** mode and the user causes a
  change to **multi-primary** mode the group must:
  - FR 5.1: Update everywhere checks must be set to True on all members.
  - FR 5.2: All members must be writable, so read_only mode should be False 
  - FR 5.3: When all members are writable, any transactional conflict must abort

* FR6: When primary member is proposed
  - FR 6.1: A new election shall happen in all members appointing the
    proposed member as the new primary
  - FR 6.2: While all updates from the old primary previous to the
    election are not applied, the new primary must stay in read mode.
  - FR 6.3: While updates from the old and new primaries are in the
    group, any transactional conflict between them must abort.

* FR7: When changing to multi primary mode the auto increment values
  of the server shall change to the plugin automatic values according
  to the group_replication_auto_increment_increment variable if no
  user set value is present.

* FR8: When changing to single primary mode the auto increment values
  of the server shall return to the base values if no user set value
  is present.

* FR9: No members can join the group while a coordinated group
  configuration change is occurring

* FR10: No coordinated group configuration change can happen if one of
  the members is in recovery mode.

* FR11: No more than one coordinated group configuration change can
  happen at the same time.

* FR12: No coordinated group configuration changes are allowed if the
  group contains a member of a previous version that does not support
  it.

* FR13: When electing a primary server, P, if any other member
  than P contains running slave channels, the configuration change shall
  abort. 

* FR14: When changing to single primary mode, if more than a member
  contains running slave channels, the configuration change shall
  abort. 

* FR15: When changing to single primary mode with no appointed
  primary, if a solo member exists with running slave channels, that
  member shall be the elected primary.

* FR16: When a coordinated group configuration change involving
  primary election is running no slave channels can be start in the
  group members.

* FR17: Any change to multi-primary when already in multi-primary is a
  no-op. 
  
* FR18: Any change to single-primary when already in single-primary is
  a no-op.

* FR19: An attempt to elect a primary member when in multi primary is
  not a valid operation.   
  An error saying to use the primary switch command is issued. 

* FR20: An attempt to elect a member as primary that is already the
  group primary member is a no-op.

* FR21: Coordinated group configuration changes can be invoked in any
  member despite its primary or secondary role.

* FR22: All changes to the primary mode shall be recorded with SET
  PERSIST meaning they will have effect even after a member restart.

* FR23: When changing to single primary mode with no appointed
  primary, and no restrictions with slave channels exist, the new
  primary member shall be elected using weights or lexicographic order
  when all weights are equal.

* FR24: When a coordinated group configuration change is accepted,
  even if the invoking member leaves or fails under a majority, the
  action will be executed in all online members.

* FR25: Primary elections or change to multi-primary will be delayed
  until all transactions forbidden by enforce_update_everywhere_checks
  terminate.

* FR26: When switching to a primary server or changing mode to single
  primary with an appointed primary, P, if P leaves or fails under a
  majority, before the election starts, the configuration change must
  abort.

* FR27: When changing mode to single primary with an appointed
  primary, P, if P leaves or fails under a majority, when the primary
  election began but is not yet over, the change will not abort and
  adapt to the new elected primary throwing a warning.

* FR28: When switching to a primary server, P, if P leaves or fails under
  a majority, when the primary election began but is not yet over, the
  configuration change will abort and the old primary will be elected
  if available. If not another member will be elected. 

* FR29: When switching to a primary server or changing mode to single
  primary with an appointed primary, P, if P leaves or fails under
  a majority, after the election finalizes, change terminates and the
  group elects a new primary. A warning is thrown to the user.

* FR30: When electing a primary server, P, if any server S leaves or
  fails under a majority, the procedure shall not be affected and will
  resume.

* FR31: Any member exit or failure under a majority shall not affect
  the process of changing to multi master mode.

* FR32: After a coordinated group configuration change returns
  successfully to the user in the invoking member, its effects should
  be visible in all members.

* FR33: If the user kills the query thread then the action and query
  threads shall be terminated.

* FR34: If the group change coordination thread is killed but the
  distributed execution has already gone beyond a point where all
  servers agreed (cannot be
  canceled) then the action will complete.   
  A warning shall be returned by the executing query stating the kill
  had no effect.
  
* FR35: If the group change coordination thread is killed but the
  configuration process still has major tasks to complete the member
  shall leave the group and go into ERROR mode or abort.

* FR36: When the plugin is stopped or leaves in error, while changing
  from single primary mode to multi primary mode, if the member did
  not set the single primary mode flag to false, then update
  everywhere checks shall remain false. 

* FR37: When the plugin is stopped or leaves in error, while changing
  from single primary mode to multi primary mode, if the member did
  already set the single primary mode flag to false, then update
  everywhere checks shall be true afer stop. 

* FR38: When the plugin is stopped or leaves in error, while changing
  from multi primary mode to single primary mode, if the member did
  not set the single primary mode flag to true, then update
  everywhere checks shall remain true. 

* FR39: When the plugin is stopped or leaves in error, while changing
  from multi primary mode to single primary mode, if the member did
  already set the single primary mode flag to true, then update
  everywhere checks shall be false afer stop.

* FR40: When the plugin is stopped or leaves in error, plugin
  configurations when the configuration change terminates must be
  valid, even if not persisted with SET PERSIST.

* FR41: All coordinated group configuration changes shall allow the
  DBA to check its progress.

* FR42: Functions to execute coordinated group configuration changes
  are only present when the plugin is installed.  

* FR43: Any local failure in a coordinated group configuration change
  that prevents its progress shall make the server leave the group as
  its configuration may have deviated from the group.

* FR44: Error in the election process that prevent its progress shall
  make the server leave the group or abort as its configuration may
  have deviated from the group.

* FR45: Any failure to enable the read mode in the server for data
  protection shall result in a server abort.

* FR46: Outside the scope of coordinated group configurations changes,
  if a primary member fails the new primary wont be writable until it
  executes all the transactions from the old primary.

* FR47: Member weights for primary election cannot be changed when a
  coordinated group configuration change is occurring.
  
* FR48: When a primary election is running, no coordinated group
  configuration change can be executed in the group.

* FR49: The coordinated group configuration changes proposed on this
  worklog cannot be executed when there is an active table lock in the
  session.

Non functional feature requests:

* NFR1: This WL must have no impact on transaction execution
  performance when no coordinated group configuration changes are being
  executed.

* NFR2: This WL must only have a minor overhead during transaction
  commit when executing coordinated group configuration changes that
  depend on transaction monitorization.
1. Some definitions and considerations
======================================

* Single primary mode: When only one member in the group accepts
  writes and all other members are in read mode.    
  As the primary is the source of truth in the group, certification
  information is updated but not used for commit decisions.     
  Restrictions around foreign keys and other multi-primary limitations
  described below do not apply to single primary mode.

* Primary: The writable member in the single primary mode. If it dies
  or exits the group a new primary is elected.

* Secondary(ies): The non writable members in the group. These members
  receive transactions executed in the primary from the group. 

* Multi primary mode: Also called multi-primary in the text, is when all
  members are writable.     
  All transactions are certified and can rollback if they are
  concurrent and update the same data as other transaction committed
  in the group.     
  Multi primary mode is subjected to some restrictions described
  below.

* Update everywhere checks: Controlled by a plugin variable:
  **enforce_update_everywhere_checks**.     
  With this var group replication can prevent the execution of
  transaction that cause updates to cascading foreign keys or use the
  serializable commit mode.

* Election: The process where a member is appointed as the new
  group primary.   
  Note that under this worklog we call it election even if there is no
  algorithmic selection of a member and the member is directly chosen
  externally.     
  In either cases the distributed appointment of a member, the changes
  to the read modes and certification related tasks make what we call an
  election. 

* Certification: Certification is the process where the group decides
  which concurrently executed transactions, at different servers, are
  conflicting.  
  If the output is negative, the transaction will rollback in all members.  
  Throughout the worklog we mention points where we say certification is
  enabled or disabled, so a clarification here:   
  Certification keeps collecting information about transactions when enabled or disabled.  
  Being ON or OFF refers only to the certification output that is considered or not
  during on all transaction's commit process.

* Coordinated Group Configuration Change: the group of coordinated
  steps needed to execute a change to the group. These are many times
  addressed simply as group actions or coordinated group actions
  throughout the worklog.

* The action coordinator: The central blocks that coordinates the execution
  of actions in the group. It guarantees that only one action can
  execute at a time.

* UDF: These are User Defined Functions, a mechanism that allows us to
  add functionality to the plugin without coding new parser commands.    
  Installed by us at plugin install, they allow the user to invoke a
      new action code by us in the plugin.

* Group Replication and slave channels:    
  As described, when in primary mode it is assumed that only one member in
  the group is the source of all updates to the group's data.   
  This has some implications in the election process pointed in this
  worklog.  
  If one member has an active slave channel receiving data
  from an external source, this member must be the primary in the
  group, and no primary switches are allowed.    
  Such switches would mean that two different sources of updates would
  now exist in the group. 

>**Note**: All plugin variables in this worklog are often referred in
>  the text without the prefix group_replication_ to decrease
>  verbosity.         
>  Example: enforce_update_everywhere_checks

</br>

2. Coordinated Group Configuration Changes - the basics
=======================================================

Such changes as the ones proposed here, where one dynamically alters
the single/multi primary mode are operations that require the
coordination of all group members in the execution of a set of steps
to achieve the desired result.

So what this worklog intends to implement is a configuration module
for task coordination but also a set of well defined operations that
are used to achieve the wanted configuration changes.

The idea is that this coordinator and operations could be used on the
implementation of new requests in the long run. 

</br>

Coordinated Group Configuration Changes: The Coordinator 
--------------------------------------------------------

The coordinator shall work based on 3 phases 

1. Coordinate the start of the action

2. Execute the action

3. Return the status of the execution to the user and declare the end
   of the action. 

1) and 3) are coordinator steps, common to all actions.
2) is specific to each action. 

On the first phase the coordinator shall send a message to the group
stating the action to execute.    
If an concurrent action exists it should abort the latest one before execution.

On the first phase, the coordinator shall send a message to the group
stating that action A is to be executed.   
If a concurrent action B is already taking place, then A is aborted.  
The order of execution is established by the total order delivery guarantee of the GCS (Paxos).

This first phase also makes use of this total order delivery
guarantees to check some general validations like: there is no member
of an older version, there is no member on recovery, etc. 

Much like the first phase, the third phase also needs to be
coordinated between all members by means of sending a message to the
group.  

Otherwise, since the operations are asynchronous, members that are
still executing action A would refuse a new action B while others that
had finished A already would accept and start B.  

Validations and execution vary with each action though so each action
has their own implementations.   
We shall call these group actions, described below. 

In summary the coordinator shall coordinate the start and finish of a
action invoking the corresponding action block, only returning when that action is finished.
    
</br>    
    
Coordinated Group Configuration Changes: Group Actions
------------------------------------------------------

An group action shall then contain the basic method for execution.        
This method shall be implemented for all actions.

**execute_action()**

Group Actions will also contain two methods 

**get_action_message(Message)**

and 

**process_action_message(Message)**

The idea here is that each action will encode their own parameters and
decode them.     
It is up to the coordinator to get this message when the
action is invoked and give it to all members when it is accepted.

Stopping an action is also a key operation so failure and plugin stop
situations. 

**stop_action_execution**

Also, for debug and identification purposes these classes should expose
their names. 

**get_action_name()**

For now the coordinator will handle 2 actions

*  Multi primary mode migration

*  Single Primary election 

The first shall handle changes from single-primary mode to multi-primary setups.

The second should handle the inverse conversion but also handle the
primary election of a specific member. 

</br>

Coordinated Group Configuration Changes: Actions invocation
-----------------------------------------------------------

First a note on how the option **group_replication_single_primary_mode**
is still in effect and the DBA can still configure the member to start in a mode
or another.    

The changes from one mode to the other in a live group do not depend on vars though but
on new introduced user defined functions.   
This way the change is made trough a function that denotes an implicit
action and not trough a variable change.   

These functions are:

* Changes from multi-primary to single primary

Base command:
            
     SELECT group_replication_switch_to_single_primary_mode()

The above command shall be invoked by the user to change to single
primary mode, being the election controlled by the configured election
weights.   
If the user wants to appoint a primary in the process it executes:   

     SELECT group_replication_switch_to_single_primary_mode(server_uuid);

Any invocation of these functions in a group already in this mode will
cause no visible changes.

* Changes from single-primary to multi-primary

Base command:

     SELECT group_replication_switch_to_multi_primary_mode();

This function has no parameters.   
Any invocation in a group already in multi-primary mode will cause no
changes. 

* Election of a new primary 

Base command:

     SELECT group_replication_set_as_primary(server_uuid);
    
This function will not cause changes to single primary mode if the
group is running on multi-primary mode. 

</br>

Configuration changes: Algorithm components 
-------------------------------------------

To switch from the single primary to multi-primary mode and vice-versa
the following steps/code units are necessary.

A. primary election: invoke primary election in a member

B. Disable/Enable certification

C. primary validation: check if the selected member is valid. This may include
  - the old primary has running slave channels
  - the user is selecting a member with version N+1 in a group with
    member of version N.

D. Wait for execution of the current set of local transactions

E. Set/Get plugin vars. This includes:
  - single_primary_mode
  - enforce_update_everywhere_checks

F. Wait for the execution of current relay log transactions

G. Message sending / reception

H. Enable/Disable the super read only mode


From this list:

* A) needs to be refactored in terms of code and message flow for
safety reasons. 

* B) need minor refactors in order to be reusable 

* C) and D) are new utilities that we need to build from scratch 

* E) requires a new code module as we want to use SET PERSIST for these
  variables.
  
* F) We enhance this code with the hability to wait for the consuption
  of the group replication applier module queue before waiting for the
  execution of the transactions.    

* The rest can be used out of the box or by using current plugin
  methods. We do add the option to kill read mode queries in some
  situations though.

</br>

Configuration changes: How it works - a summary 
-----------------------------------------------

To sum this section, here is a summary of how it all comes together.

1. The user triggers an action in the plugin (Using UDFs in this
   worklog)

2. The plugin parses the parameters and creates the correspondent
   Group Action instance.
   
3. The group action is submitted to the coordinator.

4. The coordinator gets the action message from the group action class
   and sends it to all members
   
5. If accepted, all members (except the invoking one) instantiate the same
   group action class.   
   The action message is given to the group action object for parsing.
   
6. All members execute the action

7. All members send a termination message when over.

8. All members declare the action as finished when everyone terminates.   
   The invoking member returns the result to the client.   

</br>

Configuration changes: How it works - message Diagram 
-----------------------------------------------------

    +-------------------+                  +-------------------------+                  +--------------------+
    | ..member 1 (m1).. |                  | .....member 2 (m2)..... |                  | ...member3 (m3)... |
    |                   |                  |                         |                  |                    |
    |                   |                  | UDF function execution  |                  |                    |
    |                   |   Group action   |     new Group_Action    |   Group action   |                    |
    |                   |  start message   |           .             |  start message   |                    |
    | new Group_action  | <--------------- |    send start message   | ---------------> |  new Group_action  |
    | execution         |             \--> |       execution         |                  |     execution      |
    |        +          |                  |           *             |                  |          +         |
    |        +          |                  |           +             |                  |          +         |
    |        +          |                  |           +             |                  |          +         |
    |        +          |  Group action    |           +             |   Group action   |          +         |
    | send end message  | end message (m1) |           +             | end message (m1) |          +         |
    |        .          | ---------------> |           +             | ---------------> |          +         |
    |        .          | <--/             |           +             |                  |          +         |
    |        .          |                  |           +             |                  |          +         |
    |        .          |                  |           +             |                  |          +         |
    |        .          |   Group action   |           +             |   Group action   |          +         |
    |        .          | end message (m3) |           +             |  end message(m3) |          +         |
    |        .          | <--------------  |           +             | <--------------- |  send end message  |
    |        .          |                  |           +             |           \----> |          .         |
    |        .          |                  |           +             |                  |          .         |
    |        .          |   Group action   |           +             |   Group action   |          .         |
    |        .          | end message(m2)  |           +             |  end message(m2) |          .         |
    |   declare action  | <--------------  |    send end message     | -------------->  |   declare action   |
    |     finished      |             \--> | declare action finished |                  |     finished       |
    |                   |                  |      UDF returns        |                  |                    |
    |                   |                  |                         |                  |                    |

</br>



3. From single to multi primary 
===============================

The first on the list is the change from when the member is on single
primary mode to multi-primary.    
We start to design this one as the inverse change is more complex.

In this change, all members become writable, but there is also
restrictions to the allowed transactions in the group that must be
enforced.

The HLD for this operation is then:

1. A message is sent to all members starting the configuration change
   in all members, same for the invocation member.

2. All members set enforce_update_everywhere_checks to true.     

3. The primary waits for all transactions currently running to be
   processed by GR.     
   These transactions can have updates to tables with cascading FK for
   example, something that can cause issues in a multi-primary
   environment.
   
4. A message is sent to all members meaning: "I executed all running
   transactions, from now on, all transaction are safe."

5. When members receive this message they queue a packet in the
   plugin pipeline that will activate certification.

6. In the secondaries we extract the current GTIDs queued in the
   applier relay log and wait for its application.

7. Every member can set the single_primary_mode to false.   
   Members invoke a SET PERSIST instruction to make the option
   persistent.   
   The enforce_update_everywhere_checks is also made persistent here.

8. All members change the auto increment settings to the automatic
   values to avoid transaction collision.  
   Previous values are cached.

9. All secondaries can disable the read only mode when they complete
   step 6.

10. All members send a message when the action terminates.   
    When N messages are received, the action terminates.

</br>


4. From multi-primary to primary / primary election
===================================================

In this change, there is an election of a new primary, either selected
by the internal election algorithm or appointed by the user.   
So, either when the user changes the primary mode to true or when it
sets this variable the same set of tasks will be executed in the
group.

In this change, only one member becomes writable and the transaction
limitations that are enforced on multi-primary are no longer needed.     

The HLD for this operation is then:

1. A message is sent to all members starting the configuration change
   in all members, same for the invocation member.
   
2. A validation phase is executed:    
   If the candidate must be of the lowest version present in the
   group.   
   Same thing for invalid uuids passed as an argument or the member is
   no longer in the group.     
   Everyone sends a message stating the existence of slave channels.   
   If more than a member has slave channels: error out    
   If slave channels exist in a member that is not the selected
   primary: error out   
   If no primary is appointed and a sole member exists with slave channels,
   force that member to be the new primary.

3. [Extra step] If a primary already exists:      
   All members set enforce_update_everywhere_checks to true.
   The primary waits for all transactions currently running to be
   processed by GR.     
   This means the old primary, if present, is the one that sends the
   primary election request message.

4. Run primary election on all members, using either the present
   election algorithm or choosing the user appointed member.   
   Member roles change as a result of the selection process. 
   Under the new election algorithm (section 5) the new appointed
   primary will wait for messages from the old primary (if existant)
   up to this point.
   
5. The new primary will send a message to all members that election
   can continue and members also update the read mode status at this point.   
   Note that setting read mode to true will wait for executing
   transactions meaning this must be done in a spawned thread.

6. Everyone queues a message in the applier pipeline to re-enable the
   certifier (when migrating from multi-primary it is already enabled). 

7. The new primary waits for a message from all members stating when
   they are on read mode.  The primary also states it set his read
   mode to false.

8. When all member receive N messages they can set *single_primary_mode* to true.
   SET PERSIST is used here to make the mode persistent.   
   Secondaries can set enforce_update_everywhere_checks to false.    
   This step can be skipped if already on single primary mode.

9. When the primary receives N messages move to step 10

10. The primary shall wait for all the applier relay log to be
    consumed.       
    It sends a packet informing about this change.    
    Only then it can set enforce_update_everywhere_checks to
    false to avoid concurrency between local and remote transactions.

11. The primary should return values for auto increment to the user
    cached values if changing the mode.     
    These were stored before the multi primary values were set to avoid
    collisions. 

12. All members disable the certifier when they receive the packet
    from the primary.
    Secondaries also change the auto increment settings for easy
    future primary failovers.
   
13. All members send a message when the action terminates. When N
    messages are received, the action terminates.
   
Note: Steps like 6), the first part of step 10) and 12) are already part
of the current primary election process. They are only placed here for
clarity.
   
</br>  
   

5. Changes to Primary election
==============================

In order to solve an old safety issue surrounding primary election and
the algorithms presented here we propose a change to the election
mechanism.

Dwelling into it, the base of the issue is that single primary mode
allows the execution of transactions that could lead to data
divergence in multi primary setups.   
One of these examples are transactions that have foreign keys and
cascading side effects that could lead to different execution results in
different members.  
In theory such transactions are safe because the primary is the only
source of truth in the group at all times.   

Until now, when changing from one primary to the other, the new primary
would accept new transactions the moment it was elected.    
This meant that, for a window of time, it was possible for such transactions
coming from both the old and the new primary to be executed in
concurrency.   
This breaks the assumption that a sole source of truth
exists at each moment in time.    

Other point of concern is that we must wait for old primaries to be in
read mode (not applicable to crashes situations).   
This is something that can take time, and certification must be ON
during this period.

For these reasons this algorithm will now change and elections will have 5
stages whose invocation depends on the context of the election.

1. When the old primary dies or there is a change of the primary
   member, every member does an election and chooses the new primary.  
   The algorithm, or appointed server parameter make the election have
   the same outcome on all members.
   
2. The new appointed primary waits for its relay log queue to be
   consumed totally or in part.   
   If the old primary failed we wait for the relay log to be fully
   consumed as no more messages will arrive.   
   If this is a switch from the old to the new primary then we should
   wait for the transactions of all messages up to this point.    
   This prevents the local vs remote conflicts that would lead to data
   divergence.   
   
3. When these transactions are consumed the member elected will send a
   message to all members and when received they all change their read
   modes according to their roles.    
   Enabling the read mode on members must be done in a spawned thread
   is it would deadlock with ongoing transaction messages in the GCS
   layer.
   
4. When receiving this message the members might or might not enable
   certification.    
   
5. If an old primary(ies) is still present, then the new primary must
   wait for all members to send a message stating they are on read
   mode.   
   When it receives this message it will wait for its current relay
   log backlog to be executed, instructing the members to disable
   certification afterwards.

A note here about how this is a choice of safety over availability on
failure scenarios so it may result on write downtime for the end user.    
On the other hand, during live primary switches we have the option
to restrict and monitor user transactions to preserve availability.

In terms of version coexistence, if a member of a version 5.X or
previous to this WL release is present in the group, the old primary
election algorithm will be the one executed.

</br>

Primary election: Brief look into the primary election scenarios. 
-----------------------------------------------------------------

### **If the old primary dies**:

On failure cases the old primary dies and when the new one is elected
there could still be some old transactions being applied on this member.    
For this reason we need to wait for the execution of the transactions
from the old primary before declaring the new primary writable. 

So in this situation steps 1, 2 and 3 are executed to ensure safety. 

One particularity of this case is that there is only one source of
truth at a time, i.e., we ensure a member does not accept writes when
applying updates from another member.   
In practice this means there is no wait for the old primary to be on
read mode and no certification activation is needed.    


    | ....member 1 (m1).... |             | .....member 2 (m2)..... |                          | .....member3 (m3)..... |
    |     (Old Primary)     |             |       (Secondary)       |                          |     (New Primary)      |
    |  (read mode is OFF)   |             |     (read mode is ON)   |                          |   (read mode is ON)    |
    |                       |             |                         |                          |                        |
    |       Failure         | View change |                         |       View change        |                        |
    |                       | ########### |       View change       | ######################## |      View change       |
    |                       |             |     elect a primary     |                          |    elect a primary     |
    |                       |             |       m3 elected        |                          |      m3 elected        |
    |                       |             |                         |                          |   wait for queue = 0   |
    |                       |             |                         |                          |  wait for transaction  |
    |                       |             |                         |                          |       execution        |
    |                       |             |                         |                          |          +             |
    |                       |             |                         | Primary election message |          +             |
    |                       |             |                         | primary is ready         |          +             |
    |                       |             |                         | <----------------------- |    backlog executed    |
    |                       |             |     Read mode = ON      |                  \       |          .             |
    |                       |             |                         |                   \----> |    Read mode = OFF     |
    |                       |             |                         |                          |                        |
    |                       |             |                         |                          |                        |


### **If we switch from one primary to another**:

This scenario is the most complex one in order to preserve safety but
also availability, something we cannot in the above case.   

Since we are focusing on the primary election part, lets recall that
under the full primary change algorithm there is a first phase
where the old primary enables *enforce_update_everywhere_checks*.    
So when election is invoked step 2 is executed to ensure all
transactions executed before changing this variable are processed in
the new primary. 

Note that the old primary is still accepting requests until step 3 is
invoked.   
Hence, when the read mode is set, it must be done outside the GCS
framework or else it could deadlock against running transactions
waiting for certification messages.    

It is also for this reason, that when the switch happens there are
updates being executed from both the new and the old primary so we
need to execute step 4.   

So, now the new primary will wait for all members to be in read mode.   
When that happens then it will wait on it back log, instructing the all members, when
finished, that certification can be turned off.    


    | ....member 1 (m1).... | ........................ | .....member 2 (m2)..... | ........................ | .....member3 (m3)..... |
    |     (Old Primary)     |                          |       (Secondary)       |                          |     (New Primary)      |
    |  (read mode is OFF)   |                          |     (read mode is ON)   |                          |   (read mode is ON)    |
    |                       |                          |                         |                          |                        |
    |   Action Invocation   |                          |                         |                          |                        |
    |      validations      |                          |                         |                          |                        |
    | update checks = true  |                          |                         |                          |                        |
    |   wait for ongoing    |                          |                         |                          |                        |
    |    transactions       |                          |                         |                          |                        |
    |         +             |                          |                         |                          |                        |
    |         +             |                          |                         |                          |                        |
    |   primary election    |                          |                         |                          |                        |
    |                       |                          |                         |                          |                        |
    /////////////////////////////////////////////////////// Primary Election /////////////////////////////////////////////////////////
    |                       |                          |                         |                          |                        |
    | Invoke an election    | Primary election message |                         | Primary election message |                        |
    |    Send message       |  elect a new member (m3) |                         |  elect a new member (m3) |                        |
    |                       | -----------------------> |                         | -----------------------> |                        |
    |   elect a primary     | <----/                   |     elect a primary     |                          |     elect a primary    |
    |     m3 elected        |                          |       m3 elected        |                          |       m3 elected       |
    |                       |                          |                         |                          |  [wait for queue = 0]  |
    |                       |                          |                         |                          | [wait for transaction  |
    |                       |                          |                         |                          |      execution]        |
    |                       |                          |                         |                          |          +             |
    |                       |                          |                         |                          |          +             |
    |                       | Primary election message |                         | Primary election message |          +             |
    |                       |     primary is ready     |                         |     primary is ready     |          +             |
    |                       | <----------------------- |                         | <----------------------- |   backlog executed     |
    |  enable certification |                          |   enable certification  |                  \       |                        |
    |  [Set Read mode = ON] |                          |   [Set Read mode = ON]  |                   \----> |  enable certification  |
    |          +            |                          |            +            |                          | [Set Read mode = OFF]  |
    |          +            | Primary election message |    Read mode = true     | Primary election message |                        |
    |          +            | member in read mode (m2) |                         | member in read mode (m2) |                        |
    |          +            | <----------------------- |                         | -----------------------> |                        |
    |          +            |                          |                         | <----/                   |                        |
    |          +            |                          |                         |                          |                        |
    |          +            |                          |                         |                          |                        |
    |          +            |                          |                         |                          |                        |
    |   Read mode = ON      | Primary election message |                         | Primary election message |                        |
    |                       | member in read mode (m1) |                         | member in read mode (m1) |                        |
    |                       | -----------------------> |                         | -----------------------> |   [Wait for backlog]   |
    |                       | <----/                   |                         |                          |           +            |
    |                       |                          |                         |                          |           +            |
    |                       | Single primary message   |                         | Single primary message   |           +            |
    | disable certification | <----------------------- | disable certification   | <----------------------- |    backlog executed    |
    |                       |                          |                         |                   \----> |  disable certification |
    |                       |                          |                         |                          |                        |
    |                       |                          |                         |                          |                        |
        
Operations surround by "[]" mean they are executed in spawned process
and not on the GCS stack.

### **If we switch from multi primary to a single primary**: 

When electing a primary coming from a multi primary group one thing to have
in mind is that *enforce_update_everywhere_checks* was already true
before the election.   
In practice this means that there is no need to wait for the execution
of transactions from the old primary.  
So the above described step 2 is skipped here.    
Apart from that, the algorithm remains the same.   


    | ....member 1 (m1).... | ........................ | .....member 2 (m2)..... | ........................ | .....member3 (m3)..... |
    |   (Multi Primary)     |                          |     (Multi Primary)     |                          |   (Appointed primary)  |
    |  (read mode is ON     |                          |    (read mode is ON)    |                          |    (read mode is ON)   |
    |                       |                          |                         |                          |                        |
    |   Action Invocation   |                          |                         |                          |                        |
    |      validations      |                          |                         |                          |                        |
    |   primary election    |                          |                         |                          |                        |
    |                       |                          |                         |                          |                        |
    /////////////////////////////////////////////////////// Primary Election /////////////////////////////////////////////////////////
    |                       |                          |                         |                          |                        |
    | Invoke an election    | Primary election message |                         | Primary election message |                        |
    |    Send message       |  elect a new member (m3) |                         |  elect a new member (m3) |                        |
    |                       | -----------------------> |                         | -----------------------> |                        |
    |   elect a primary     | <----/                   |     elect a primary     |                          |     elect a primary    |
    |     m3 elected        |                          |       m3 elected        |                          |       m3 elected       |
    |                       |                          |                         |                          |         /              |
    |                       | Primary election message |                         | Primary election message |        /               |
    |                       |     primary is ready     |                         |     primary is ready     |       /                |
    |                       | <----------------------- |                         | <----------------------- | ------                 |
    |  enable certification |                          |   enable certification  |                  \       |                        |
    |  [Set Read mode = ON] |                          |   [Set Read mode = ON]  |                   \----> |  enable certification  |
    |          +            |                          |            +            |                          | [Set Read mode = OFF]  |
    |          +            | Primary election message |    Read mode = true     | Primary election message |                        |
    |          +            | member in read mode (m2) |                         | member in read mode (m2) |                        |
    |          +            | <----------------------- |                         | -----------------------> |                        |
    |          +            |                          |                         | <----/                   |                        |
    |          +            |                          |                         |                          |                        |
    |          +            |                          |                         |                          |                        |
    |          +            |                          |                         |                          |                        |
    |   Read mode = ON      | Primary election message |                         | Primary election message |                        |
    |                       | member in read mode (m1) |                         | member in read mode (m1) |                        |
    |                       | -----------------------> |                         | -----------------------> |   [Wait for backlog]   |
    |                       | <----/                   |                         |                          |           +            |
    |                       |                          |                         |                          |           +            |
    |                       | Single primary message   |                         | Single primary message   |           +            |
    | disable certification | <----------------------- | disable certification   | <----------------------- |    backlog executed    |
    |                       |                          |                         |                   \----> |  disable certification |
    |                       |                          |                         |                          |                        |
    |                       |                          |                         |                          |                        |


</br>  
   


6. Facing member failures or stops
==================================

All is nice if there are no problems while the process is running.   
One issue that might happen is that some member can leave during the
process.  
This can be an intentional leave as the DBA stopped the member or a
server/machine/network failure that made the group expel the member.   
In this section we handle exits under a majority, for partitions check
section 7.

</br>

### How is it handled: Failures at the coordination level

* **The invoking member fails**:
The way this WL is structured, the invoking member only plays a
key role on the start and end part of the action when the result is
returned to the user.

In practice what this means is that once an action is accepted by the
group any failure on the invoking member will not stop the action
progress on the group.   
The group action will continue its work on other members until all
declare its end. 

If the coordinator dies before sending the action then nothing
happens.

* **Any member fails**:

When a member leaves or fails, all the running action coordinators in
the other members needs to wait for 1 less member to declare the action as terminated.


</br>

### How it is handled: Single primary -> Multi-primary

* **If primary fails**:    
We have to break the wait on the secondaries if they are waiting.     
Note that this means no more transactions will come from the old
primary so all transactions from this point on are safe.    

There is however a question here of what to do with concurrent primary
elections.   
We have 2 options:   
    - **A** We elect a new primary, causing a secondary to be writable and
activating certification before applying all pending transactions
from the old primary.   
The upside here is that the group write downtime is smaller, the
downside is that there is a window for transaction divergence.

    - **B** We don't elect a new primary and the process will wait for the
secondaries to be up to date.    
This option while safer may mean the group wont be writable for a
period of time. 

> For now we go with **B** for safety as explained on section 5.    
   
* **If secondary fails**:    
If a secondary member fails the algorithm will not be affected and no
action is needed in the remaining members


</br>

### How it is handled: Multi-primary -> Single primary / Primary election

* **If the old primary fails:** the process must break any waits for
  the old primary.    
  This means that if we are waiting for the old primary to be safe we
  can invoke a new primary election at this point.    
  If we are still validating the parameters and the action execution
  then it must select a new member to invoke the primary election.
  On the other hand, if the process is already waiting for this member
  to be in read mode, then we can skip waiting for it. 

* **If the new primary fails**: the process must abort if the member
  is not yet elected.   
  If the primary was elected already, then this failure does not
  affect the group action and is handled as a traditional primary
  failure.      
  If the election is still ongoing, it depends on the action. Primary
  member changes will abort and try to elect the old primary again.   
  If the election is still ongoing and we are changing to single
  primary mode, the action will output a warning but wont fail.

* **If secondary fails**: the waiting members should be adjusted if waiting
  for messages.



</br>

### How we register leaves and messages 

For handling such events as member exits and primary elections mentioned
above two options existed:  

1. Methods placed at each point that are directed to the coordinator
   class and then direct at each executing group action
   
2. Observers at each point that can be used by actions if needed. 

We went with 2, while it is a bit more complex than 1) it allows for more
versatility.   

Other reason we went for 2 is that we used the same pattern for
messages as we needed to add new behaviors to old messages.

This way new plugin components can emerge reading events from the
group and messages without changes to the gcs_event_handler code.

</br>  

7. Other scenarios - partitions and joins
=========================================

Partitions 
----------

Partitions are an orthogonal issue in this WL.   
Being a distributed algorithm, it is normal that actions will block as any
other transaction message will on a minority partition.   
Question remains on how the DBA can handle it, and how will the system
react.

Lets reason about the two different types of partitions:

**Asymmetric partitions**

In this case the group still has a majority and some members will be
expelled.   
To the majority, the way it handles these exits falls under section 6. 

On the minority, on the absence of a network connection, the executing
group actions will block.   
On these members the DBA may stop group replication in the member and
the group action process will terminate on that member.

If a value for *group_replication_unreachable_majority_timeout* is
defined then eventually these members in a minority partition will error
out.    
The members can also be eventually expelled by the group and error out
when they are again enable to contact it.   

When this happens the action is stopped by the plugin error handling
code.

**Symmetric partitions**

On symmetric partitions or multi group partitions where no sub-group
holds a majority the DBA options are:

1. The DBA manages to restore the network, and if so the process
   should continue normally.
2. The DBA forces a new group membership. You can check:   
   https://dev.mysql.com/doc/refman/8.0/en/group-replication-network-partitioning.html

If the DBA goes with option 2 then a new majority will be formed where
the group action will unblock.   
Again, how this new majority handles the leaving members is described
on section 6.    

Example:   
Is the old primary in the unreachable member and you were waiting for a message?
When a majority is formed and the old primary is expelled, the wait
will unblock, 

On expelled member, the view will mark them as leaving members making them terminate their actions locally.

</br>

A new member joins
------------------

The way this WL was designed, no concurrent joins are allowed during a
group action as it would make them even more complex. 

The frontier is the reception of the action starting message.    
At this point if there are members in recovery the action will abort.   
If there are no joiners, then after this point all joins will fail,
meaning the joiner will leave the group when it sees a running
action. 
  
Note that joins, recovery state changes, and action starts all rely in
GCS events and are for that reason not concurrent.

</br>  
   


8. Facing process failures
==========================

When an unexpected failure happens when executing a step the process
also must behave accordingly.    
Many of the execution steps are simple tough, like setting a plugin var, so
they should not fail.    
Others are critical and threaten the consistency of the group
like falling to disallow writes on a secondary member.


So the strategy may vary and the process should accept that:

* **Message sending failures:** Message retry is already something
  inside GCS.   
  This means that sending errors are serious and not easy to handle.   
  When a group action fails to send messages how can it say to abort?    
  If the action already is running, the only option is to leave the
  group (and enable read mode). 

* **Enabling the read mode:** all failures enabling the read mode will
  leave the member in a undesired state, so the server process shall
  abort. For this, a service shall be implemented, as described in the
  low level design.

* **Disabling the read mode:** Failures when disabling the read mode can
  be handled by the DBA, so a logged error should be sufficient.    
  
* **Failures in assessing the number of transactions running:** When a
  operation needs to know the number of transactions running to ensure
  safety, any failure should make the process abort.
  
* **Failures on SET PERSIST or other critical failures:** Whenever
  there is a failure that prevents the algorithm from progressing, the
  member should exclude himself from the group (and enable read mode).   
  
</br>  


9. Monitoring and error reporting
=================================

One point not mentioned until now is how the DBA can check the
progress and status of coordinated configuration changes. 

First of all, we must answer the question: **What does the user wants to know?**

A. What is the action running.
B. What is the progress on that action. 
C. Is there an error, and if so, what was it. 

Lets start with A. and B.
To not overload our current performance schema tables with fields that
have an unknown lifetime and evolution we went for an alternative. 

Thus, monitoring will be based on the current stage event table:

    performance_schema.events_stages_current

How this works is that under 

    performance_schema.setup_instruments

We have new instruments for monitoring

</br>

Stages 
------

So, the planned stages consist of several steps where the algorithm
will probably wait on and we can give some progress information.   
Thence we skip here singular steps like action acceptance or primary
election invocation.    

**Multi primary switch stages**

> stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.

The old primary updated the *enforce_update_everywhere_checks* variable and
collected the set of currently ongoing transactions.   
In this stage we show the progress of how many transactions are left
for execution. 

> stage/group_rpl/Multi-primary Switch: waiting on another member step completion

While the above step runs, the old secondaries are in wait state.   
This stage shows we are waiting on a message from the old
primary member.

> stage/group_rpl/Multi-primary Switch: applying buffered transactions.

The old primary executed all the above transactions.   
The secondaries must also wait for them to be executed locally.   
This stage reports when the process is over.

> stage/group_rpl/Multi-primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in
different members can differ.   
This stage reports how much members finished vs the ones that are
still missing.  

**Single primary switch stages**

>  stage/group_rpl/Single-primary Switch: checking group pre-conditions.

Check if the group has running channel slaves or members of an invalid
version.     
This stage reports the completion of the several verification steps. 

> stage/group_rpl/Single-primary Switch: executing Primary election

Primary election is invoked and runs in all members. This stage means
the algorithm is waiting on the election.

> stage/group_rpl/Single-primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in
different members can differ.  
This stage reports how much members finished vs the ones that are
still missing.  

**Primary switch stages**

> stage/group_rpl/Primary switch: checking current primary pre-conditions.

Check if the old primary has running channel slaves.
This stage reports the completion of the several verification steps. 

> stage/group_rpl/Primary Switch: waiting for pending transactions to finish.

The old primary updated the *enforce_update_everywhere_checks* variable and
collected the set of currently ongoing transactions.   
In this stage we show the progress of how many transactions are left
for execution. 

> stage/group_rpl/Primary Switch: waiting on another member step completion


While the above step runs, the old secondaries are in wait state.   
This stage shows we are waiting the execution of the above
transactions that will lead to the primary election phase.

> stage/group_rpl/Primary Switch: executing Primary election

Primary election is invoked and runs in all members. This stage means
the algorithm is waiting on the election outcome.

> stage/group_rpl/Primary Switch: waiting for operation to complete on all members.

Due to the asynchronous nature of the algorithm, completion time in
different members can differ.  
This stage reports how much members finished vs the ones that are
still missing.  

**Primary election stages**

> stage/group_rpl/Primary Election: applying buffered transactions.

The old primary executed all the above transactions.   
The secondaries must also wait for them to be executed locally.   
This stage reports the progress of this stage

> stage/group_rpl/Primary Election: waiting on current primary transaction execution

When the secondaries are waiting on the primary to end the above stage.

> stage/group_rpl/Primary Election: waiting for members to turn on super_read_only

When primary election is invoked all members shall wait for the other
members to be in read mode.
This stage reports how many servers are not yet in read mode. 

> stage/group_rpl/Primary Election: stabilizing transactions from former primaries. 

Once all servers are in read mode the new primary shall consume its
backlog and then declare that certification can be disable on all
members.   
This stage shows how many transactions are left to apply.


</br>

How to use it 
-------------

So when a action is running, the DBA can check stages table for
something like:

     SELECT event_name, source, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
     EVENT_NAME                                                                         SOURCE              WORK_COMPLETED  WORK_ESTIMATED
     stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.  stage_monitor.h:73		3               10	

When no action is running the query returns empty.   
So this solves A and B. 

This information can be checked on all members running an action,
being the progress reported the local one.   

For C. we rely on the UDF function mechanism server error primitives.    
If a function is executed with invalid parameters the UDF API will
return an error with our custom error message like: 

     ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary'; Member is in multi-primary mode.

If the function is validated and there is an error during execution
then we return a custom error ourselves.    
For that we will add the error: 
       
     ER_GRP_RPL_UDF_ERROR

As for now there is no number associated to this error we will use NNNN

     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed. There is a member joining the group. 

</br>  
   

10. User interface - a summary
==============================

To DBAs in general, here is the summary part.

* **Scenario A:**

If you are multi-primary and want to change to single primary.   
Just execute

     SELECT group_replication_switch_to_single_primary_mode()

And if you have a primary in mind you can opt to 

     SELECT group_replication_switch_to_single_primary_mode(primary_uuid);
     Mode switched to single-primary successfully 

While the action runs, you can check its progress with 

     SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
     EVENT_NAME                                                           WORK_COMPLETED  WORK_ESTIMATED
     stage/group_rpl/Single-primary Switch: checking group pre-conditions     	0               1	

</br>

* **Scenario B:**

If you are single-primary and want to change to multi-primary.   
Just execute

     SELECT group_replication_switch_to_multi_primary_mode();
     Mode switched to multi-primary successfully 

While the action runs, you can check its progress with 

     SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
     EVENT_NAME                                                                        WORK_COMPLETED  WORK_ESTIMATED
     stage/group_rpl/Multi-primary Switch: waiting for pending transactions to finish.   	2               10	

</br>

* **Scenario C:**

If you are single-primary and want to change the primary.   
Just execute

     SELECT group_replication_set_as_primary(server_uuid);
     Primary server switched to: UUID

While the action run, you can check its progress with 

     SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
     EVENT_NAME                                                                  WORK_COMPLETED  WORK_ESTIMATED
     stage/group_rpl/Primary Switch: waiting on another member step completion        0               1	

Note that when the primary election algorithm kicks in, you can also monitor
that in another stage:

     SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
     EVENT_NAME                                                                         WORK_COMPLETED  WORK_ESTIMATED
     stage/group_rpl/Primary Election: Waiting for members to turn on super_read_only        3               6	

</br>
  
* **Error cases 1: State changes under the same mode**   
   
You are in single-primary / multi primary and you execute a migration
to the mode the system is already in.

     SELECT group_replication_change_to_multi_primary_mode();
     The system is already on multi-primary mode

The function just returns a string stating that. No error.

</br>

* **Error cases 2: Primary switches in multi-primary mode**   

You are in multi primary and you execute a primary election

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; In multi-primary mode. Use group_replication_switch_to_single_primary_mode.

</br>

* **Error cases 3: Generic Validation errors**      

You want to execute a function and you give an improper argument, none
at all or some other case, some of the errors are:

    SELECT group_replication_switch_to_single_primary_mode(____)
    ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; Wrong arguments: This function either takes no arguments or a single server uuid
    ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; Wrong arguments: The server uuid is not valid.
    ERROR HY000: Can't initialize function 'group_replication_switch_to_single_primary_mode'; The requested uuid is not a member of the group.
    ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: You need to specify a server uuid.
    
    
    SELECT group_replication_set_as_primary(____);
    ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: You need to specify a server uuid.
    ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; Wrong arguments: The server uuid is not valid.
    ERROR HY000: Can't initialize function 'group_replication_set_as_primary'; The requested uuid is not a member of the group.
    
    SELECT group_replication_switch_to_multi_primary_mode(____);
    ERROR HY000: Can't initialize function 'group_replication_switch_to_multi_primary_mode'; Wrong arguments: This function takes no arguments.

</br>

* **Error cases 4: The DBA doesn't have privileges**   

You try to execute an action with no privileges

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary';  User 'group_rpl_user'@'%'. needs SUPER or GROUP_REPLICATION_ADMIN privileges.

</br>

* **Error cases 5: Member is not in a valid state**   

The member is in error state or unreachable.

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR 1123 (HY000): Can't initialize function 'group_replication_set_as_primary'; The member needs to be ONLINE and in a reachable partition.

</br>

* **Error cases 6: An action is already running**   

There is already a group action running.   
This is a runtime error, not validation error:

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; There is already a configuration action being executed. Wait for it to finish.

</br>

* **Error cases 7: A member is joining**   

There is a member joining.   
This is a runtime error, not validation error:

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; A member is joining the group, wait for it to be ONLINE.

</br>

* **Error cases 8: A member of a lower version is present**   

A member that has a lower version and cannot execute this actions is
present in the group.   
This is a runtime error, not validation error:

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; The group has a member with a version that does not support group coordinated operations.

</br>

* **Error cases 9: The primary fails before election**   

We are electing a primary or changing to single primary mode with an appointed primary and it fails.   
This is a runtime error, not validation error:

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; The appointed primary for election left the group, this operation will be aborted. No primary election was invoked under this operation.
</br>

* **Error cases 10: The primary fails during election - Change to Single Primary mode**   

We are changing to single primary mode with an appointed primary and
it fails during election.   
The operation completes but there is an warning.

     SELECT
     group_replication_switch_to_single_primary_mode(server_uuid)
     Mode switched to single-primary with reported warnings: The appointed primary being elected exited the group. Check the group member list to see who is the primary
     Warnings:
     Warning	NNNN	The appointed primary being elected exited the group. Check the group member list to see who is the primary. There were warnings detected also on other members, check their logs.
</br>

* **Error cases 11: The primary fails during election - Change of Primary Member**   

We are changing to single primary mode with an appointed primary and
it fails during election.   
This is a runtime error, not validation error:

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; Primary assigned for election left the group, this operation will be aborted and if present the old primary member will be re-elected. Check the group member list to see who is the primary.
</br>


* **Error cases 12: The primary fails after election**  
 
We are electing a primary or changing to single primary mode with an
appointed primary and it fails when the member was already elected.  
Only a warning is thrown.

     SELECT group_replication_switch_to_single_primary_mode("MEMBER1_UUID")
     Mode switched to single-primary with reported warnings: The appointed primary left the group as the operation is terminating. Check the group member list to see who is the primary
     Warnings:
     Warning NNNN	The appointed primary left the group as the operation is terminating. Check the group member list to see who is the primary
</br>

* **Error cases 13: A slave channel prevents the operation**  

When you execute a function but a channel presence prevents it
     
     SELECT group_replication_switch_to_single_primary_mode("MEMBER2_UUID");
     ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. The requested primary is not valid as a slave channel is running on member MEMBER1_UUID

     SELECT group_replication_switch_to_single_primary_mode();
     ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. There is more than a member in the group with running slave channels so no primary can be elected.

     SELECT group_replication_set_as_primary("MEMBER2_UUID");
     ERROR HY000: The function 'group_replication_set_as_primary' failed. There is a slave channel running in the group's current primary member.

</br>

* **Error cases 14: There is a member in group from an older version**  

When you execute a function but there is a member that does not have this feature.
     
     SELECT group_replication_switch_to_single_primary_mode();
     ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. The group has a member with a version that does not support group coordinated operations.

</br>

* **Error cases 15: When you kill a coordinated change**  

When you execute a coordinated change and you kill it.   
Note that depending on progress messages can be different .

     SELECT group_replication_switch_to_single_primary_mode();
     ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. This operation was locally killed and for that reason terminated. The member will now leave the group.
     
     SELECT group_replication_switch_to_single_primary_mode();
     ERROR HY000: The function 'group_replication_switch_to_single_primary_mode' failed. This operation was locally killed and for that reason terminated. However the member is already configured to run in single primary mode, but the configuration was not persisted. The member will now leave the group.


* **Error cases 16: Critical failures**  

When you execute a function and some critical error occurs
     
     SELECT group_replication_set_as_primary(server_uuid);
     ERROR HY000: The function 'group_replication_set_as_primary' failed. A critical error occurred during the local execution of this action. The member will now leave the group.

</br>

* **Error cases 17: Other failures**   

We are electing a primary or a migration and something fails

     SELECT group_replication_set_as_primary(server_uuid);
     ERROR NNNN (HY000): The function 'group_replication_set_as_primary' failed; Error message here


</br>

11. Other points 
================

Upgrade/Downgrade
-----------------

This WL does not depend on any upgrade or does it brings restrictions
to downgrades.   

There are however some behaviors that are associated to
upgrade/downgrade processes where the group is formed by mixed
versions.

*A mix of 5.7 and 8.0 members*

In such a group:

If an action is invoked on 8.0 members the action shall error out when
it checks that 5.7 version members are present. 

About 5.7 members there is no issue as actions cannot be invoked from
such members.   
One point about 5.7 members is that the old election algorithm shall
be used whenever a member of this version is present. 

*A mix of 8.0 and 9.0 members*

This applies for this algorithm for any version difference between
members above 8.0.

If in a mixed group where actions are supported by all members, the
primary election shall only allow the selection of a primary of the
group lowest version.    
Selecting a higher version as primary could lead to the dissemination
of unknown messages to the lower version members. 

</br>

Security
--------

Users must have GROUP_REPLICATION_ADMIN privileges as for START and
STOP commands. 

Users that have this privilege can also stop group replication in all
members interrupting service so no new angle of attack is introduced. 

In terms of operation abuse, no two commands can run at the same time
so malicious DBAs could only cause sequential elections or changes of
mode.   
These can cause some delay in the cluster, but overall there is no
effect in terms of availability.   

</br>

Query life cycle and killing behaviors
--------------------------------------

The way query execution is envisioned in this WL is that a DBA
executes a query in the form of a UDF, and that will block until a
instantiated group action is concluded.   
The query process will remain waiting for a result while a new spawned
thread executes the action process.

Question here is how this relates to query termination from the DBA
part.    

In terms of kill semantics, we should start with the basics on how query
kills works on MySQL.   
The basics is that the process in execution should check in different
stages if the thread was given a kill signal.

Same principle applies here, actions should regularly check if its
thread was killed and if so, they should abort.    
The notion of not committing or roll-backing locally don't apply here
though.    
This has 2 major consequences:

1. Killing an accepted group coordinated change means this member is
   diverging in configuration from the group.   
   Hence, the member shall leave the group and move to error mode.

2. There is a point in time where the kill signal may not mean nothing
   as only trivial operation remains in the process.   
   So, the DBA may kill a group action only to find out that it
   completed successfully.
   
   In practical terms this can be tracked trough stages.   
   When the final stage kicks in, it means the action is now completed locally and is only
   waiting for other members to finish. 
   This means that any kill request after this will not cancel the
   operation.

Another point here is that the DBA may kill the stuck query but it is
indeed the underlying action execution thread that is taking too much
time.     
This design also takes that into account, guaranteeing that upon
detecting it was killed the query process shall kill the local action
process.     

</br>

Deployment / Install
--------------------

This new plugin brings no new changes in terms of install and deploy.   
Just some notes:

1. Performance schema needs to be enabled for monitoring.   
Also, some of the above setup instruments may have to be enabled if
needed.

2. UDF functions are auto installed so no user actions is needed here. 

3. The use of SET PERSIST means some user defined settings in the
configuration files may be ignored as newer settings take precedence
over them. 

</br>  

12. Points not considered in the Worklog
========================================

**Cancellation**:     
We do not handle cancellation of requests in this worklog due to its
complexity.
1. Coordinator
==============
    
### Coordinator - Code Skeleton 

    //The coordinator class where actions are submitted
    class Group_action_coordinator
 
    public:
      //Proposes a group action
      int coordinate_action_execution(Group_action* action);
 
      /*
        Asks the coordinator to stop any ongoing action
        @param coordinator_stop is the coordinator terminating
      */ 
      int stop_coordinator_process(bool coordinator_stop);
 
      //Handle incoming  action message (start or stop)
      int handle_action_message(Group_action_message *msg);
 
      //Queue notification (primary change, message received,.. )
      int queue_notification(Action_notification notification);
 
      //Returns if there is a group action running
      bool is_group_action_running();
 
      /*
        Adapts the coordinator
        @param number_members   the current number of members
        @param is_leaving       is this member leaving?
      */ 
      void handle_leaving_members(int number_members, bool is_leaving);

    private:
 
      //Handle incoming start action message
      int handle_action_start_message(Group_action_message *msg);
 
      //Handle incoming stop action message
      int handle_action_stop_message(Group_action_message* msg);
 
      //Declare this action as terminated to other members
      // @param message_type for the sent message
      int signal_action_terminated(enum_action_message_type);
 
      // Leave the group and change state to error 
      int leave_on_action_error(); 
     
      //Handle the termination of current action
      void terminate_action();
 
       //The id defined for each action that is currently running
      enum_group_action_type current_action_id;
 
      //The id defined for each action that is currently running
      Group_action executing_action;
 
      //Declare this action as terminated to other members
      Queue<Action_notifications> notifications;
 
      //The number of members known for the current action
      list<uuid> known_messages_uuids;
 
      //The lock too coordinate start and stop requests
      lock coordinator_process_lock;
 
      //The flag to avoid concurrent action start requests
      bool action_ongoing;
 
      //The flag to avoid action starts post stop
      bool coordinator_terminating;
      
      //Is the action terminating
      bool action_terminating
      
      //The handler where actions can report progress through stages
      Plugin_stage_monitor_handler* monitoring_stage_handler;



</br>

### Coordinator - Method logic 
    
* **General idea**

->user action

    Group_action action_X = new Custom_action(parameters_from_user);
    error= group_action_coordinator.coordinate_action_execution(action_X);
    return error;

- **coordinate_action_execution(Group_action action)**

</p>

1. Lock coordinator_process_lock
2. [action_ongoing == true || coordinator_terminating == true]    
    Then abort (fail early)
3. Set action_ongoing to true   
4. Get action message with **Group_action::get_action_message**
5. Send the message to all members   
   If it fails to send, return error to the user
6. Unlock coordinator_process_lock
7. Create a Plugin_stage_monitor_handler instance and set a stage;
8. Wait for response from action execution (execution is on another thread).
9. Set action_ongoing to false.
10. [If return value is either GROUP_ACTION_KILLED or GROUP_ACTION_ERROR]    
    Execute **leave_on_action_error()**
11. End the stage on the Plugin_stage_monitor_handler instance.
12. Check the response and return either success or error to the client


</p>

Check the below **Killing queries** section for that code path.

- **stop_coordinator_process(bool coordinator_stop)**
 
    [Is an action running]
 
</p>
 
1. Lock coordinator_process_lock
2. Set coordinator_terminating= coordinator_stop
3. Invoke **Group_action::stop_action_execution(false)**.   
4. Wait for the thread executing the action to finish. 
5. Invoke **Group_action_coordinator::terminate_action()**
6. Unlock coordinator_process_lock
  
</p>


- **handle_action_message(Group_action_message *msg)**

</p>

1. [coordinator_terminating == true]    
Return
2. [Is it is a start message]     
Invoke: **Group_action_coordinator::handle_start_action_message**   
If there is an error on handling, return    
If local, awake the action coordination so it aborts.    
3. [Is it is a stop message]     
   Invoke: **Group_action_coordinator::handle_stop_action_message**
4. [Is it is a abort message]
Invoke **Group_action_coordinator::stop_coordinator_process(false)**
 
</p>


* **is_group_action_running()**

</p>

1. Return true if there is a running action.    
   Current ideas it to test if there is a defined action id.   
   Using action_ongoing can also be an option but this is set before the
   action is accepted.   
   So using action_ongoing can cause unnecessary join failures from
   new members. 

</p>

* **handle_leaving_members(int number_members, bool is_leaving)**
   
</p>

1. [is_leaving == true]  
   Invoke   
   **Group_action_coordinator::stop_coordinator_process(true)**
   Return 
2. Update *known_messages_uuids*.
3. [Are the termination messages == known_messages_uuids]    
   Invoke **Group_action_coordinator::terminate_action()**;

</p>

- **handle_start_action_message(Group_action_message *msg)**

</p>

1. [Is an action is already/still running]    
   Then abort    
   If local, awake the action coordination so it aborts.    
   [No action running]    
   Go to 2)
2. [Are there any members of the group in recovery]    
   Then abort    
   If local, awake the action coordination so it aborts.    
   [No action running]    
   Set *known_messages_uuids* (this is a logical consistent moment)   
   Go to 3)
3. Get the local action coordinator.    
   Set the current action id.     
   [If it is the sender]     
   Get the Group_action object for this message    
   [If remote]   
   Instantiate a new Group_action object.  
   Set *action_terminating* to false;
4. Give the message to the Group_action object for processing.   
   **Group_action::process_action_message(Group_action_message msg)**
5. Instantiate a new Plugin_stage_monitor_handler and set *monitoring_stage_handler*
6. Invoke the execution method of the Group_action class in a spawned new thread    
   **Group_action::execute_action**
7. Return 

</p>

- **handle_start_action_message - Execution thread**

</p>

1. Execute **Group_action::execute_action()** 
2. If the method returns a *GROUP_ACTION_RESTART* signal, re-execute the action.
3. When the thread job finishes, execute
   **Group_action_coordinator::signal_action_terminated()**

</p>

* **handle_stop_action_message(Group_action_message *msg)**
   
</p>

1. Update the completed work on  *monitoring_stage_handler* 
2. [are the termination messages uuids == known_messages_uuids]     
   Then declare the action as terminated, go to 3)    
3. Update known_messages_uuids to remove the received member uuid.
4. Use the end stage method on *monitoring_stage_handler*
5. Invoke **Group_action_coordinator::terminate_action()**

</p>


* **signal_action_terminated(enum_action_message_type)**

</p>

1. Set *action_terminating* to false;
2. Use **get_termination_key()** from the Group_Action class.    
   Set the stage on the *monitoring_stage_handler*   
   Set the estimated work to the number of known members, and the
   completed to the number of received messages.  
3. Instantiate message of type Group_action_message.    
   Use the given message type  and use ACTION_END_PHASE.
4. Send the message.

</p>

* **terminate_action()**
   
</p>

1. Delete any notification not used by the current action.   
2. Awake coordinate_action_execution method.
3. Unset the current_action_id. 

</p>

* **leave_on_action_error()**

</p>

1. Change member state to error 
2. Leave the group
3. Cancel pending transactions
4. Set read mode to true.

</p>



</br>

### Coordinator - Code related changes

* **Plugin_gcs_events_handler::check_group_compatibility(**

As it is stipulated in the requirements, new members cannot join
during a group action. 

To accomplish this we need to change this method and add a check for:  
Group_action_coordinator::is_group_action_running()

* **Plugin_gcs_events_handler::handle_joining_members(**

In a continuation of the above requirement, but with the intent of
increasing user experience we propose to add the same check here.

The idea is that if joining the member will error out, but other
members should also print the cause of the member being expelled. 

So, on the code branch 

    else if (number_of_joining_members > 0 ||
            (number_of_joining_members == 0 && number_of_leaving_members == 0))
    {

We shall add a check and a print to the error log if the member joined
while an action was ongoing. 

* **Plugin_gcs_events_handler::handle_leaving_members(**

Similar to the recovery call for update, we also invoke here:   
Group_action_coordinator::handle_leaving_members(int number_members, is_leaving)



</br>

### Coordinator - Concurrency notes

* **Start vs Start scenarios**

The coordinator will stop several start messages from being sent at
the same time.   
Only when an action returns you can send another due to the *coordinator_process_lock*  
About concurrency between requests from other members, we rely on the
sequential nature of GCS.    
The first received start action message is the one executed, the other fails.   

* **Start vs Stop scenarios** 

Stop happens on plugin.cc method terminate_plugin_modules().   
This means that it happens when the member left the group so in theory
there are no more messages starting actions.   

There are however possible requests for actions in parallel.   
Due to the coordinator locks, either:    
The stop goes first and sets coordinator_terminating, so all requests
will fail.   
The start goes first, but stop ends it.


</br>

### Coordinator - Life cycle 

* **Initialization**

This class is initialized on plugin.cc on plugin_group_replication_start.   
Since it does not depend on any server service it does not rely on
the delayed thread class. 

* **Termination**

This class is terminated and deleted on terminate_plugin_modules().   
The method **Group_action_coordinator::stop_coordinator_process(true)** is invoked

* **Invocation**

See section 8, UDF functions


</br>

### Coordinator - Killing queries 

As described on the functional requirements and high level design this
WL shall be implemented with a responsive behavior to DBA kill
requests. 

On key point is that query kill signals shall also kill the underlying
action process. 

So on **coordinate_action_execution(Group_action action)** it is
assumed that step 7 is not a hard wait but a timed one that will
periodically check if the requested was killed. 

If killed, the process shall:

1. Invoke **Group_action::stop_action_execution(true)**.   
2. Wait for the thread executing the action to finish. 
3. Invoke **Group_action_coordinator::terminate_action()**
4. [Group Action result is GROUP_ACTION_NOT_KILLED]   
   Send a warning stating the action finished in spite of the kill 
5. [Group Action result is GROUP_ACTION_KILLED]   
   Send an error stating that the action was killed and the member
   will leave the group. 

</br>

2. Group Actions : Parent class 
===============================

The parent class for all actions
    
### Group Action - Code Skeleton 
    
    //The base class that each action implements
    class Group_action
    
      // Enum for existent group actions classes
       enum_group_action_type     {
       GROUP_ACTION_MULTI_PRIMARY    //change to multi primary
       GROUP_ACTION_PRIMARY_ELECTION //primary election
       NO_GROUP_ACTION
      }
    
      // Enum for the end results of a action execution
      enum_action_execution_result{
       GROUP_ACTION_TERMINATED // Terminated with success 
       GROUP_ACTION_ERROR      // Error on execution 
       GROUP_ACTION_RESTART    // Due to an error the action shall be restarted
       GROUP_ACTION_ABORTED    // Was aborted due to some internal check
       GROUP_ACTION_KILLED     // Action was killed 
       GROUP_ACTION_NOT_KILLED // Action was killed but finished 
      }
    
      //Constructor giving the class access to notifications
      Group_action(Queue<Action_notifications> notifications);
    
      /*
        Get the message with parameters to this action
        @param message  [out] the message to start the action
      */
      virtual void get_action_message(Group_action_message** message)=0
    
      /*
        Get the message with parameters to this action
        @param message  [in]  the message to start the action
      */
      virtual int process_action_message(Group_action_message& message)=0
    
      /*
        Execute the action
        @param invoking_member is the member that invoked it
        @param stage_handler the stage handler to report progress
        
        @returns the execution result 
      */
      virtual enum_action_execution_result 
          execute_action(bool invoking_member,
                         Plugin_stage_monitor_handler stage_handler)=0;
    
      /*
        Get the error message in case of error
        @param [out] error_msg
        
        @returns the execution result 
      */
      virtual enum_action_execution_result get_error_message(string& error_msg)=0;
    
      /*
        Terminate the execution process
        @param killed are we killing the action. 
      */
      virtual stop_action_execution(bool killed)=0;
    
      //Returns the action identifier
      virtual int get_action_id()=0;
    
      // Returns the action name (for debug)
      virtual int get_action_name()=0;

      //Allow each class to have its own end stage key/message
      virtual PSI_stage_key get_termination_key()=0;



</br>

### Group Action - Method logic and ideas

* **get_action_message(Group_action_message** message)**

[Method extended by child classes]   

This method should return the class that contains the parameters for
execution.   
The idea here is that each class can defined what parameters it has
and how to encode them.   
    
* **process_action_message(Group_action_message& message)**    
    
[Method extended by child classes]   

Each action class reacts on their own way to their message/parameters.   
There is however another side to this method.    
This method is executed upon message receive and that means it is
processed at the same logical moment in all members.   
You can use this method to check something about the current group
view or so. 
    
* **execute_action(bool invoking_member, Plugin_stage_monitor_handler stage_handler)**    

[Method extended by child classes]   

Each action executes its logic on this method.    
This method is executed in spawned thread.   
This method can self repeat if you return GROUP_ACTION_RESTART.

* **stop_action_execution(bool killed)**

[Method extended by child classes]   

This method should simply unblock any wait and make the execution
method return faster.

* **get_error_message(string& error_msg)**

[Method extended by child classes]   

This method is omitted in the below classes but basically we assume
error messages are stored and can be retrieved. 


</br>

### Group Action - Killing queries 

Not going into details in the child implemented classes, lets point
here the basics of killing a Group Action process. 

The basics is that **execute_action** shall contain stages where it is
checked if the thread was killed, or  stop_action_execution was
invoked with a true flag.   
If the thread was killed and we detect it at this points, a flag like
*action_was_killed* is set.

When the action terminates, if there was an attempt to kill the query
and it was killed in on of these points we output
*GROUP_ACTION_KILLED*.   

If there was an attempt to kill it, but it failed, it outputs *GROUP_ACTION_NOT_KILLED*

</br> 

2.1 Multi Primary migrations
============================

The action block to do a migration from single primary setups to multi-primary setups. 
    
### Multi-primary migration - Code Skeleton 

    //Class for multi primary migrations
    class Multi_primary_migration_action : public Group_action

      virtual void get_action_message(Group_action_message** msg)

      virtual int process_action_message(Group_action_message& msg)

      virtual enum_action_execution_result 
          execute_action(bool invoking_member,
                         Plugin_stage_monitor_handler stage_handler)

      virtual int stop_action_execution(bool killed)

      virtual int get_action_id()

      virtual int get_action_name()

      // Listener:  React to view changes
      after_view_change(joining, leaving, group, *skip_election)

      // Listener: React to messages
      before_message_handling(message, *skip_message)

    private:

      // The current primary member
      string primary_member

      // If the action was aborted
      bool action_aborted

</br>

### Multi primary migration - Method logic

* **get_action_message(Group_action_message** msg)**

</p>

1. Instantiate message of type Group_action_message.    
   Use ACTION_MULTI_PRIMARY_MESSAGE and use ACTION_START_PHASE.    
   No need for a custom message as this action has no parameters.   

</p>

* **process_action_message(Group_action_message& msg)**

</p>

1. Get what is the current primary and set *primary_member*.   
2. Register listener on Group_events_observation_manager

</p>

* **execute_action(invoking_member, monitoring_stage_handler)**

</p>

1. Set *enforce_update_everywhere_checks* to true
2. [If primary member]    
   Use a *Server_query_execution_handler.* instance    
   Extract a list of the current executing server transactions.     
   When all transactions are executed we can proceed to 3.
3. [If primary member]   
   Send a message to all members stating that all transactions are now safe.     
   Use a Single_primary_message with type SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS.  
   If there is an error when checking transactions execution send an Action Message:   
   Use phase ACTION_ABORT_PHASE and return. 
4. When all members receive the above said message the method
   before_message_handling is executed.      
   See **Multi_primary_migration_action::before_message_handling below**.    
   Here the process pools the notification to move to step 5.
5. When awaken by the Queue_checkpoint_packet:    
   [If secondary member]   
   Use the channel_get_retrieved_gtid_set method from the channel
   interface to get the current applier retrieved set.    
   Loop until the server GTID executed contains all the retrieved
   transactions.    
6. Set the plugin.cc var single_primary_mode to false.   
   Use **Persistent_variables_handler::set_persistent_variable(**
7. [If secondary  member]    
   Disable read mode.  
   Use methods on read_mode_handler.h
8. Unregister listener on Group_events_observation_manager
9. return

</p>

* **stop_action_execution(bool killed)**

> For simplicity we omit in the execution methods the termination
> checks.   
> It is assumed that a regular check for the action_aborted flag is
> made.    
> Same thing for checks on notifications of type
> TERMINATE_EXECUTION_NOTIFICATION.   

</p>

1. Set action_aborted to true. 
2. Invoke Server_query_execution_handler::abort_waiting_process()
3. Queue a TERMINATE_EXECUTION_NOTIFICATION notification to unblock
   any waits
   
</p>

* **after_view_change(joining, leaving, group, *skip_election)**

</p>

1. [If secondary and the old primary died ]    
   Queue a notification DEAD_PRIMARY_NOTIFICATION that will unblock the wait for message from
   the primary.    
2. Execute the method Applier_module::queue_certification_enabling_packet(true).   
   Queue a Queue_checkpoint_packet that will awake the main process when queue is empty.
3. Set the out parameter *skip_primary_election* to true.   

</p>

* **before_message_handling(message, *skip_message)**

</p>

1. [If message type = SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS]    
Queue a notification TRANSACTIONS_SAFE_NOTIFICATION in the notification queue.    
Execute the method Applier_module::queue_certification_enabling_packet(true).   
Check the below section for notes on this method.
Queue a Queue_checkpoint_packet that will awake the main
process when queue is processed up to this point.     
 
</p>
    
</br>
    
### Multi primary migration - Code related changes 

To enable the certification in the applier we need some tweaks to this
class.  
The main issue is that we must prevent the execution of the
**check_single_primary_queue_status()** method.   
This method, used for old elections will turn off certification after
the "new primary" SQL thread is idle.

Even on primary elections, we want to have a more fine control of the
moment when the primary declares that certification is no longer
needed. 


    class Applier_module
    
      /*
        Queues a Single_primary_action_packet in the applier queue
        @param multi_primary_context is there more than a primary
      */
      + int queue_certification_enabling_packet(bool multi_primary_context)

      /*
        Signals that 
      */
      + void end_multi_primary_period()

     private:
    
      // Is the member in situation where more that one member does updates
      + bool multi_primary_context

* **queue_certification_enabling_packet(bool multi_primary_context)**

</p>

1. Create Single_primary_action_packet with NEW_PRIMARY
2. Set *multi_primary_context* to the passed parameter

</p>

* **end_multi_primary_period()**

</p>

1. Set *multi_primary_context* to false

</p>

* **check_single_primary_queue_status()**

Add a new check here for *multi_primary_context*

</br>

### Multi primary migration - Concurrency notes

* Multi primary changes and primary elections 

As stated in the HLD, the primary elections during multi primary
migrations are disabled in case of failure.    
Any election during the mode change will be skipped, so there could be
some writting downtime. 

</br>


### Multi primary migration - Monitoring notes

Here we describe when process stages change and how we do the
monitoring of progress.

Here, we will use the steps from **Multi primary migration - Method
logic**, in particular for the **execute_action(invoking_member, monitoring_stage_handler)**

* Step 2 on the primary 

The stage is set to 

    Multi-primary Switch: waiting for buffered transactions to finish.
    
The progress is set in the handler, passing into it the
Plugin_stage_monitor_handler object.

* Step 2 on the secondary members

The stage is set to 
    
    Multi-primary Switch: waiting on another member step completion
    
Completed work is set to 0, estimated work is set to 1.

* Step 5 on secondaries 

The stage is set to 
    
    Multi-primary Switch: applying buffered transactions.

The progress is tracked in the form of what is the initial difference
between the retrieved and executed sets and how it evolves in time. 

</br>

2.2 Single primary election 
===========================
The action block to elect a primary chosen by the user or in a
migration from multi-primary.  
    
### Single primary election - Code Skeleton 

    //Class for primary primary election / migration
    class Primary_election_action
    
      // Enum for the end results of a action execution
      enum_primary_election_state{
       PRIMARY_VALIDATION_PHASE   // Check if primary is valid
       PRIMARY_SAFETY_CHECK_PHASE // Make the change safe
       PRIMARY_ELECTION_PHASE     // Invoke primary election
      }
    
      virtual void get_action_message(Action_message** messg)
    
      virtual int process_action_message(Action_message& messg)
    
      virtual enum_action_execution_result 
          execute_action(bool invoking_member,
                         Plugin_stage_monitor_handler stage_handler)
    
      virtual int stop_action_execution(bool killed)
    
      virtual int get_action_id()
    
      virtual int get_action_name()
    
      // Listener:  React to view changes
      after_view_change(joining, leaving, group, *skip_election)
    
      // Listener: React to messages
      before_message_handling(message, *skip_message)
    
      // Listener: After primary election
      after_primary_election(primary_uuid)
    
    private:
    
      // Changes the phase where the action is currently
      void change_action_phase(enum_primary_election_state s)
      // The current phase
      enum_primary_election_states current_action_phase
      // Lock for the phase change
      lock phase_lock
    
      // The member that invokes primary election
      string invoking_member_uuid
    
      // The selected primary uuid to change to
      string selected_primary_uuid
    
      // If the action was aborted
      bool action_aborted

</br>

### Single primary election - Method logic

* **get_action_message(Action_message** msg)**

</p>

1. Instantiate message of type Primary_election_action_message.    
   Use ACTION_START_PHASE and the chosen primary uuid. 

</p>

* **process_action_message(Action_message& msg)**

</p>

1. Cast the Action_message objecto to Primary_election_action_message.
2. If a uuid is selected, set *selected_primary_uuid*
3. [If primary candidate is defined]    
   validate_primary_uuid(primary_uuid, error_message)
4. validate_primary_version(error_message).    
   Step 3 and 4 are executed here for the consistent view of the group.
5. Get what is the current primary and set *invoking_member_uuid*.   
   If the no previous primary exists, define the invoking members as being the
   member that invoked the action change.
6. Register listener on Group_events_observation_manager
7. Set current_action_phase to PRIMARY_VALIDATION_PHASE

</p>  

* **execute_action(invoking_member, monitoring_stage_handler)**

</p>

1. Invoke Primary_election_validation_handler::**validate_election(uuid, valid_uuid, String& error_msg)**    
   [result = INVALID_PRIMARY || CURRENT_PRIMARY ]  
   return GROUP_ACTION_ABORT.    
   [result = GROUP_SOLO_PRIMARY]   
   set *selected_primary_uuid* to *valid_uuid* and go to 2    
   [result = VALID_PRIMARY]  
   go to 2
2. Set current_action_phase to PRIMARY_SAFETY_CHECK_PHASE
3. [If previous primary member exists]    
   Set enforce_update_everywhere_checks to true on all members.   
   [If (old) primary member]    
   Use the *Server_query_execution_handler.*     
   When all transactions are executed we can proceed to 4.                
   If there is an error when checkting transactions execution send an Action Message:   
   Use phase ACTION_ABORT_PHASE and return. 
4. The *invoking_member_uuid* invokes:   
   Primary_election_handler::request_group_primary_election(primary_uuid,mode)        
   [If coming from multi master]   
   mode=SAFE_OLD_PRIMARY   
   [If primary switch]   
   mode=UNSAFE_OLD_PRIMARY   
   This sends a message and handles the election on all members.      
   Primary election handles the certification enabling.       
   It also handles the read mode settings.      
5. Wait for PRIMARY_ELECTED_NOTIFICATION notification in the queue.   
   Pop the notification from the queue.    
   Set the plugin.cc var single_ primary_mode to true.    
   Use **Persistent_variables_handler::set_persistent_variable(**   
   [If secondary member]  
   Set  enforce_update_everywhere_checks to false;    
   Use **Persistent_variables_handler::set_persistent_variable(**
6. When the message       
   Single_primary_message::SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE   
   arrives, queue a notification.
7. [If primary  member]    
   In the action process wait for above notification.  
   Set enforce_update_everywhere_checks to false;    
   Use **Persistent_variables_handler::set_persistent_variable(**
8. return. 

</p>

* **stop_action_execution(bool killed)**

> For simplicity we omit in the execution methods the termination
> checks.   
> It is assumed that a regular check for the action_aborted flag is
> made.    
> Same thing for checks on notifications of type
> TERMINATE_EXECUTION_NOTIFICATION.   

</p>

1. Set action_aborted to true. 
2. Invoke Server_query_execution_handler::abort_waiting_process()
2. Queue a notification of type TERMINATE_EXECUTION_NOTIFICATION to unblock
   any waits
   
</p>


*  **change_action_phase(enum_primary_election_state phase_var)**  

</p>

1. Lock phase_lock
2. Change *current_action_phase* to phase_var
3. Unlock phase_lock

</p>

* **after_view_change(joining, leaving, group, *skip_election)**

</p>

1. Lock phase_lock
2. [If the old primary died]   
   Set the out parameter *skip_primary_election* to true   
   [Is current_action_phase == PRIMARY_VALIDATION_PHASE]    
      Change the invoking member from the primary to the invoking member.   
      If no invoking member exists, select the lowest uuid member.   
   [Is current_action_phase == PRIMARY_SAFETY_CHECK_PHASE]   
      Invoke:   
      Set current_action_phase to PRIMARY_ELECTION_PHASE 
      **Primary_election_handler::execute_primary_election(primary_uuid, DEAD_OLD_PRIMARY)**  
3. [If the *selected_primary_uuid* died]   
   [Is current_action_phase == PRIMARY_VALIDATION_PHASE || PRIMARY_SAFETY_CHECK_PHASE]    
      Abort the action. Return GROUP_ACTION_ERROR    
   [Is current_action_phase == PRIMARY_ELECTION_PHASE]   
    Do nothing.  
    The algorithm will remain waiting for a primary to be elected.      
    Secondaries only have to disable transaction checks.
4. Unlock phase lock

</p>

* **after_primary_election(string uuid, int error)**

</p>

1. [error !=0]    
   Queue a notification TERMINATE_EXECUTION_NOTIFICATION   
   return;
2. Queue a notification PRIMARY_ELECTED_NOTIFICATION


</p>

* **before_message_handling(message, *skip_message)**

</p>

1. [If message type = Single_primary_message::SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE]    
Queue a notification PRIMARY_QUEUE_APPLIED_NOTIFICATION
2. [If message type = Single_primary_message::SINGLE_PRIMARY_ELECTION]    
Set current_action_phase to PRIMARY_ELECTION_PHASE 

</p>

</br>

### Single primary election - Concurrency notes

* **Single primary changes and primary elections** 

The main issue here is when members leave, in particular the old or
the new primary.    
The reaction to these events depend however in the current state of
the action. That is why the phase lock is important. 

Question can also pop as: what if the new member fails when one member
is on phase 1 and the other on phase 2.    
That is why the post primary election phase is triggered using a GCS
event.   
Same as why the single primary mode is set to true at this point.   
This guarantees all the members make the same decision.   
  
When the new primary dies before the election then the process aborts
in all members with no needed coordination.  
  
The primary election is a box to the action algorithm.  
If a new primary dies, the election algorithm will elect a new
member.    
The action algorithm only waits for a valid primary and the group to
be ready and reacts to that.   
  
When the new primary dies after election, this process is already over
from its own point of view.   
Secondaries simply set update  checks to false and the action
terminates.    
The election will be handled by the group.
  
A note here also about how he skip elections in some cases.    
When the old primary dies this means there is no safety wait for
currently executing transactions.

</br>

### Single primary migration - Monitoring notes

Here we describe when process stages change and how we do the
monitoring of progress.

Here, we will use the steps from **Single primary migration - Method
logic**, in particular for the **execute_action(invoking_member, monitoring_stage_handler)**

Some of the stages depend on the context of the operation.
Simple primary changes use stages that begin with 
     
     Primary switch:

Changes from multi primary to single primary use stages that begin with: 

    Single-primary switch:
 
* Step 1 on all members 

The stage is set to 
    
    Single-primary switch: checking group pre-conditions.
or
  
    Primary switch: checking current primary pre-conditions.

Work completed/estimated can be the number of message to receive.
For simplicity reason we can although consider this a unique step

* Step 3 on old primary 

Set the stage to 

    Primary Switch: waiting for buffered transactions to finish.

The progress is set in the transaction handler, passing into it the
Plugin_stage_monitor_handler object.

* Step 3 on old secondaries 

Set the stage to 

      Primary Switch: waiting on another member step completion

They will remain in this state until primary election is invoked. 

* Between steps 4 and 5 on all member.

When the leader election message comes, change the stage to 

    Primary Switch: executing primary election

or 

    Single-primary Switch: executing primary election

These are single step stages, all progress on primary election is
reported on its own process. 

Also, from this point one, no more stages are invoked under the
context of these actions. 
    
</br>

3. Notifications
================

The basic ideas about notifications is to warn you when a message
is received or something happens in the plugin.   
Notifications are important in cases where a group action wants to know
about something that might happen before or after a point P.


### Current used notification

    // Enum for the end results of validation
    enum_action_notification_types{
     DEAD_PRIMARY_NOTIFICATION          // The current primary is dead
     DEAD_MEMBER_NOTIFICATION           // On of the member is dead
     CHANNEL_VALIDATION_NOTIFICATION    // member has channels?
     PRIMARY_ELECTED_NOTIFICATION       // a new primary was elected
     PRIMARY_QUEUE_APPLIED_NOTIFICATION // queue consumed on primary
     TRANSACTIONS_SAFE_NOTIFICATION     // all transactions are now safe
     TERMINATE_EXECUTION_NOTIFICATION   // terminate current process
    }

### Code Skeleton - Base class

    // Notification events to alert actions of some event
    class Action_notification
    
    public:
    
      enum_action_notification_types action_type

### Dead_member_notification 
    
    // Notification about a member that left or died
    class Dead_member_notification : public Action_notification
    
    private:
    
      //The dead member uuid
      string uuid


### Channel_validation_notification

    // Notifications with information about channels on members
    class Channel_validation_notification : public Action_notification
    
    private:
    
      bool has_slave_channels
      string uuid

</br>

4. Observers/Listeners - Group Events 
=====================================

In order to have a notification mechanism that is more extensible to
future actions we chose a observer/listener pattern.    
Beside being possible useful outside this worklog, this pattern also
allows the user to add behaviors to old messages without changing the
existing code.    

### The Listeners:

    // Listener class for events like view changes
    class Group_event_listener
     
      /*
        Executed before view install
        @param joining            members joining the group
        @param leaving            members leaving the group
        @param group              members in the group
        @param skip_election[out] skip primary election on view
      */
      int after_view_change(joining, leaving, group, *skip_election)
     
      /*
        Executed before primary election
        @param primary  the elected primary
        param error    if there was and error on the process
      */
      int after_primary_election(primary_uuid, int error)
     
      /*
        Executed before the message is processed
        @param message             The GCS message
        @param skip_election[out]  skip message handling if true
      */
      int before_message_handling(message, *skip_message);
    
### The Manager:

    // The class that registers and alerts listeners
    class Group_events_observation_manager
    
     /*
       The method to register new observers
       @param observer   An observer class to register
     */
     void register_channel_observer(Group_event_listener* observer)
    
     /*
       The method to unregister new observers
       @param obsvr      An observer class to unregister
     */
     void unregister_channel_observer(Group_event_listener* obsvr)
    
     /*
       Executed before view install
       @param joining            members joining the group
       @param leaving            members leaving the group
       @param group              members in the group
       @param skip_election[out] skip primary election on view
     */
     int after_view_change(joining, leaving, group, bool *skip_election)
    
     /*
       Executed before primary election
       @param primary  the elected primary
       @param error    if there was and error on the process
     */
     int after_primary_election(primary_uuid, int error=0)
    
     /*
       Executed before the message is processed
       @param message             The GCS message
       @param skip_election[out]  skip message handling if true
     */
     int before_message_handling(message, bool *skip_message);
    
    private:
      list<Group_event_listener*> group_events_listeners;
      readwritelock channel_list_lock;

### The Manager: Method logic: 

* **register_channel_observer(Group_event_listener* observer)** 

</p>

1. Lock channel_list_lock for writing
2. Add listener to *group_events_listeners*
3. Unlock
 
</p>

* **unregister_channel_observer(Group_event_listener* obsvr)**

</p>

1. Lock channel_list_lock for writing
2. Remove listener from *group_events_listeners*
3. Unlock
 
</p>

* **after_view_change(joining, leaving, group, *skip_election)**

</p>

1. Lock channel_list_lock for reading
2. For all member in *group_events_listeners*:   
   execute **after_view_change(joining, leaving, group, *skip_election)**    
   skip_election+= skip_election
3. Unlock
4. Return the sum of the error values from the invocation
 
 
</p>

* **after_primary_election(primary_uuid, error)**

</p>

1. Lock channel_list_lock for reading
2. For all member in the *group_events_listeners*:   
   execute **after_primary_election(primary_uuid, error)**   
3. Unlock
4. Return the sum of the error values from the invocation
 
</p>

* **before_message_handling(message, *skip_message)**

</p>

1. Lock channel_list_lock for reading
2. For all member in *group_events_listeners*:   
   execute **before_message_handling(message, *skip_message)**    
   skip_message+= skip_message
3. Unlock
4. Return the sum of the error values from the invocation
 
</p>
    
### The Manager: Initialization

The manager class is initialized on plugin.cc after 

<code>
  channel_observation_manager= new Channel_observation_manager(plugin_info);
</code>

### The Manager: Invocation 

* **after_view_change(joining, leaving, group, *skip_election)**

Before the following snippet on    
Plugin_gcs_events_handler::on_view_changed

    // Handle primary election if needed 
    this->handle_leader_election_if_needed();

This snippet will only be executed if skip_election is false

* **after_primary_election(primary_uuid, error)**

This method should be executed in the context of the primary election
process.   
Check sections 6.2.1 and 6.2.2 for details on the invocation. 

This ensures the method is executed on automatic and invoked primary
elections.

* **before_message_handling(message, *skip_message)**

Executed on     
Plugin_gcs_events_handler::on_message_received(const Gcs_message& message)    
before:    

    switch (message_type)
    {

The message is scrapped if skip_message is true.    
For performance reasons we may skip transactional messages here.

</br>

5. Observers/Listeners - Transactions
=====================================

Same idea as section 4, but now for transactions.
    
### The Listeners:

    // Listener for transaction life cycle events
    class Group_transaction_listener
    
      // Enum for transaction origins
      enum_group_transaction_origin{
       GROUP_APPLIER_TRANSACTION  // Group applier transaction
       GROUP_RECOVERY_TRANSACTION // Distributed recovery transaction
       GROUP_LOCAL_TRANSACTION    // Local transaction
      }
    
      /*
        Executed before commit
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */  
      int before_commit(thread_id, enum_group_transaction_origin)
    
      /*
        Executed before rollback
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */
      int before_rollback(thread_id, enum_group_transaction_origin)
    
      /*
        Executed after commit
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */ 
      int after_commit(thread_id, enum_group_transaction_origin)
    
      /*
        Executed after rollback
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */
      int after_rollback(thread_id, enum_group_transaction_origin)

### The Manager:

    // The class that registers and alerts listeners
    class Group_transaction_observation_manager
    
      /*
        The method to register new observers
        @param observer   An observer class to register
      */
      void register_transaction_observer(Group_transaction_listener *obsvr)
    
      /*
        The method to unregister new observers
        @param obsvr      An observer class to unregister
      */
      void unregister_transaction_observer(Group_transaction_listener *obsvr)
    
      /*
        Executed before commit
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */
      int before_commit(thread_id, enum_group_transaction_origin)
    
      /*
        Executed before rollback
         @param thread id          the transaction thread id
         @param enum_group_transaction_origin who applied it
     */
      int before_rollback(thread_id, enum_group_transaction_origin)
    
      /*
        Executed after commit
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */
      int after_commit(thread_id, enum_group_transaction_origin)
    
      /*
        Executed after rollback
        @param thread id          the transaction thread id
        @param enum_group_transaction_origin who applied it
      */
      int after_rollback(thread_id, enum_group_transaction_origin)
    
      // Are there any observers present
      bool is_any_observer_present()
    
    private:
    
      //List of observers
      list<Group_transaction_listener*> group_transaction_listeners;
    
      //The lock to protect the list
      readwritelock channel_list_lock;
    
      //Flag that indicates that there are observers (for performance)
      bool registered_observers;



### The Manager: Method logic: 

* **register_channel_observer(Group_transaction_listener* observer)** 

</p>

1. Lock channel_list_lock for writing
2. Add listener to *group_transaction_listeners*
3. registered_observers = true
4. Unlock
 
</p>

* **unregister_channel_observer(Group_transaction_listener* obsvr)**

</p>

1. Lock channel_list_lock for writing
2. Remove listener from *group_transaction_listeners*
3. registered_observers= (group_transaction_listener != 0)
4. Unlock
 
</p>

* **before/after_commit(thread_id, enum_group_transaction_origin)**

</p>

1. Lock channel_list_lock for reading
2. For all member in *group_transaction_listeners*:   
   execute **before/after_commit(thread_id, enum_group_transaction_origin)**
3. Unlock
 
</p>

* **before/after_rollback(thread_id, enum_group_transaction_origin)**

</p>

1. Lock channel_list_lock for reading
2. For all member in the *group_transaction_listeners*:   
   execute **before/after_rollback(thread_id, enum_group_transaction_origin)**
3. Unlock
 
</p>

* **is_any_observer_present()**

</p>

1. Return registered_observers;
 
</p>

### The Manager: Initialization

The manager class is initialized on plugin.cc after 

<code>
  channel_observation_manager= new Channel_observation_manager(plugin_info);
</code>


### The Manager: Invocation 

* **before_commit(thread_id, enum_group_transaction_origin)**

Executed in the group_replication_trans_before_commit.    
Since we don't have a use for this method now, we may skip its
implementation for now.

* **before_rollback(thread_id, enum_group_transaction_origin)**

Executed in the group_replication_trans_before_rollback.    
Since we don't have a use for this method now, we may skip its
implementation for now.

* **after_rollback(thread_id, enum_group_transaction_origin)**

</p>

1. shared_plugin_stop_lock->grab_read_lock();
2. [Group_transaction_observation_manager::is_any_observer_present() != false]   
   return 
3. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_applier"))]  
   type= GROUP_APPLIER_TRANSACTION
4. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_recovery"))]   
   type= GROUP_RECOVERY_TRANSACTION
5. [else]    
   type = GROUP_LOCAL_TRANSACTION
6. Group_transaction_observation_manager::after_rollback(thread_id, type); 

</p>

* **after_commit(thread_id, enum_group_transaction_origin)**

</p>

1. shared_plugin_stop_lock->grab_read_lock();
2. [Group_transaction_observation_manager::is_any_observer_present() != false]   
   return 
3. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_applier"))]  
   type= GROUP_APPLIER_TRANSACTION
4. [channel_interface.is_own_event_applier(param->thread_id,"group_replication_recovery"))]   
   type= GROUP_RECOVERY_TRANSACTION
5. [else]   
   type = GROUP_LOCAL_TRANSACTION
6. Group_transaction_observation_manager::after_commit(thread_id, type); 

</p>

</br>
    
6.1 Utility class: Validate Primary Member
==========================================

This class contains the logic to check if a chosen member is valid to
be a new primary.    
If no member is selected, it validates that the group can change into
a primary mode setup.

### Code Skeleton 

    //The base class request and execute an election
    class Primary_election_validation_handler
     : public Group_event_listener
    
     // Enum for the end results of validation
     enum_primary_validation_result{
      VALID_PRIMARY       // Primary / Group is valid
      INVALID_PRIMARY     // Primary is invalid
      CURRENT_PRIMARY     // Primary is the current one
      GROUP_SOLO_PRIMARY  // Only a member can become primary
     }
    
    public:
    
      //Constructor
      Primary_election_validation_handler(Queue<notifications>)
    
      /*
       * Validate group for election
       * @param uuid[in]   member to validate
       * @param valid_uuid[out] only member valid for election
       * @param error_msg[out] error message
       * @returns VALID_PRIMARY if valid
       * @returns INVALID_PRIMARY if not valid
       * @returns CURRENT_PRIMARY if it is already the primary
       * @returns GROUP_SOLO_PRIMARY only one member is valid
      */
      int validate_election(uuid, valid_uuid, String& error_msg)
    
      /* 
        Check that the UUID is valid and present in the group
        @param uuid[in]   member to validate
      */
      int validate_primary_uuid(uuid, error_message)
    
      /*
        Check that the group members have valid versions
        @param uuid[in]   member to validate
      */
      int validate_primary_version(error_message)
    
    private:
    
      //Check that the old primary doesn't have channels
      int validate_old_primary_channels(error_message)
    
      //Check wich members have slave channels
      int validate_group_slave_channels(uuid, error_message)
    
      //Listener: React to view changes
      after_view_change(joining, leaving, group, *skip_election)
    
      //Listener: React to messages
      before_message_handling(message, *skip_message)
    
      //Number of known members uuids
      int known_messages_uuids;
    
      Queue<Action_notifications> notifications;

### Method logic

* **Primary_election_validation_handler(Queue notifications)**

</p>

1. Register listener on Group_events_observation_manager
2. Set the local notifications queue to the parameter.

</p>

* **validate_primary_uuid(primary_uuid, error_message)**

</p>

1. Check the uuid is valid
2. Check if the uuid is equal to the current primary    
   If so, set error message, return CURRENT_PRIMARY
3. Check the group member manager and check the uuid exists    
   If not, set error message, return INVALID_PRIMARY
 
</p>

* **validate_primary_version(error_message)**

</p>

1. Loop in the group, is there a member with a lower version than 8.0?    
   If so, abort. Set error message, return INVALID_PRIMARY
2. Are there members in the group with a major lower version than the appointed primary?    
   If so, abort. Set error message, return INVALID_PRIMARY

</p>

* **int validate_old_primary_channels(error_message)**

>This method assumes it is called in all members at the same logical time.     
>A version of this method without this assumption would assume a
> query message and a response message.

</p>

1. [If primary]    
   Use the method **is_any_slave_channel_running**.    
   Create a Group_validation_message with the response and send it.  
2. Poll the notification list for messages     
   If it is a notification about a dead primary (DEAD_PRIMARY_NOTIFICATION) return VALID_PRIMARY
3. See if the old primary has running channels.    
   If so, set error message, return INVALID_PRIMARY.

</p>

* **validate_group_slave_channels(valid_uuid, error_message)**

> This method assumes it is called in all members at the same logical time.     
> A version of this method without this assumption would assume a
> query message and a response message.

</p>

1. Use the method **is_any_slave_channel_running**.    
   Create a Group_validation_message with the response and send it.
2. Poll the notification list for messages    
   If it is a notification about a dead member (DEAD_MEMBER_NOTIFICATION), skip it.   
   Do it until we have *known_messages_uuids* messages
3. Count the number of members with slave channels.     
   [If 0] the group is valid, return VALID_PRIMARY       
   [If 1] there is only one option, so set the valid_uuid param and return GROUP_SOLO_PRIMARY       
   [If >1] the group cannot be run in primary mode, return INVALID_PRIMARY     
   Set error message accordingly 

</p>

* **int validate_election(uuid, valid_uuid, String& error_msg)**
    
</p>

1. [If in Single primary mode]     
   [Is there a primary member]    
    return validate_old_primary_channels()    
    [Else]    
     return VALID_PRIMARY    
2. [If in multi primary mode]    
   result= validate_group_slave_channels(valid_uuid, error_message)    
    [If result=GROUP_SOLO_PRIMARY && uuid is defined && uuid != valid_uuid]    
    return INVALID_PRIMARY    
    [If result=GROUP_SOLO_PRIMARY && uuid is defined && uuid == valid_uuid]    
     return VALID_PRIMARY    
    [If result=GROUP_SOLO_PRIMARY && uuid is not defined]    
     return GROUP_SOLO_PRIMARY    
     valid_uuid was already set in the method invocation     
    [Else]    
   return result

</p>


* **before_message_handling(message, *skip_message)**

</p>

1. [If message type = CT_GROUP_VALIDATION_MESSAGE]    
Update *known_messages_uuids* as this is a logical consistent moment  
Extract the result and uuid from the message    
Queue a notification CHANNEL_VALIDATION_NOTIFICATION with this info
 
</p>


* **after_view_change(joining, leaving, group, *skip_election)**

</p>

1. Update *known_messages_uuids* as this is a logical consistent moment
2. [If current primary is dead]    
Queue a notification into the queue: DEAD_PRIMARY_NOTIFICATION
3. [If a non-primary is dead]    
Queue a notification into the queue: DEAD_MEMBER_NOTIFICATION   
Use class Dead_member_notification.

</p>
    
### Other related changes

On plugin.cc we can see the method
**initialize_asynchronous_channels_observer()**.    
This method assumes that this observer only needs to be initialized if
the server is in single primary mode.   

This needs to be changed and the check for the primary check must pass
to the observer methods.


</br>


6.2 Utility class: Invoke Primary Election 
==========================================

This class will be used to invoke a primary election locally or send a
message to do it on all members.
    
### Code Skeleton 

    // The base class request and execute an election
    class Primary_election_handler
    
      // Enum for election types 
      enum_primary_election_mode{
        SAFE_OLD_PRIMARY   // Migrating from multi primary
        UNSAFE_OLD_PRIMARY // Changing from one primary to other
        DEAD_OLD_PRIMARY   // Old primary died
      }
    
      // Send a message to all members requesting an election
      int request_group_primary_election(primary_uuid,  enum_primary_election_mode);
    
      // Get the election message and parameters
      int handle_primary_election_message(Primary_message);
    
      // Elect a new primary
      int execute_primary_election(primary_uuid, enum_primary_election_mode);
    
      // End any running election process. 
      int terminate_election_process();
    
    private:
    
      // Set the status and start certification
      int internal_primary_election(primary_uuid, enum_primary_election_mode);

      // Executes the old primary election algorithm. 
      int legacy_primary_election();
      
      /* The handler to handle the election on the primary member  */
      Primary_election_primary_process*   primary_election_handler;
      
      /* The handler to handle the election in the secondary members */
      Primary_election_secondary_process* secondary_election_handler;
      
</br>      
      
### Primary Election: Method logic

* **request_group_primary_election(primary_uuid, enum_primary_election_mode)**

</p>

1. Create a Single_primary_message with the given uuid and mode. 
2. Send message to group
 
</p>

* **handle_primary_election_message(Primary_message)**

</p>

1. Extract parameters if any
2. Invoke the execute_primary_election() method

</p>

* **execute_primary_election(primary_uuid, enum_primary_election_mode)**

</p>

 This method shall be a derivative of the gcs_even_handlers method 

 **Plugin_gcs_events_handler::handle_primary_election_if_needed()** 

 The idea should be to copy the method and invoke it from the handler file.    
 The file is overloaded at the moment, so this is a plus.
 
 Currently the method is structured as

1. Sort members and get the valid version "frontier" iterator
2. Check if an old primary exists, also if the member is leaving
3. Init a sql command interface
4. Select a new primary
5. If the primary changed, update member roles and set read mode.
   Also queue a packet to activate certification.
6. If there is no valid primary, then log a warning and set read only
   mode.

 So, first a comment here as due to recent code refactoring, point 3 is
no longer needed.

 Also, point that if we have a chosen primary, only point 5 is
needed.
 So that should be moved to another method.

 We then have the 2 following methods:

* **execute_primary_election(primary_uuid, enum_primary_election_mode)** 
 
</p>

1. [no primary uuid is given]   
   Executes points 1 to 4 and 6 if no valid primary is found.    
   [If lowest version > 5.7]   
   Invoke internal_primary_election.  
   [Else]   
   Invoke legacy_primary_election();
2. [primary uuid is given]
   Invoke internal_primary_election   

</p>

* **internal_primary_election(primary_uuid,enum_primary_election_mode mode)** 
 
Here we have the old point 5 but with heavy rework as mentioned on
HLD. The steps are now resumed to: 

</p>

1. [Primary_election_secondary_process::is_election_process_running()]   
   Invoke **Primary_election_secondary_process::terminate_election_process()**
2. [If member uuid = primary uuid to elect]   
   Invoke **Primary_election_primary_process::launch_primary_election_process()**    
3. [Else]   
   Invoke **Primary_election_secondary_process::launch_primary_election_process()**  

</p>
   
We only check the secondary process before starting as no valid case
exists where a primary process is running and a new election begins.   
   
* **legacy_primary_election()**   

This method preserves the old version of step 5.    
This method is used when there are members in the group whose version
does not contain this worklog code. 

* **terminate_election_process()**

</p>

1. [Primary_election_secondary_process::is_election_process_running()]    
   Invoke **Primary_election_secondary_process::terminate_election_process()**
2. [Primary_election_primary_process::is_election_process_running()]    
   Invoke **Primary_election_primary_process::terminate_election_process()**

</p>



</br>

### Primary Election: Life cycle 

* **Initialization**
 
This class is initialized on plugin.cc on **start_group_communication()**.   

* **Termination**

This class is terminated and deleted on terminate_plugin_modules().   
The method **Primary_election_handler::terminate_election_process()** is invoked



</br>

### Primary Election: Related code changes (Primary member message) 

* **Plugin_gcs_events_handler::on_message_received(**

Since this is a more generic handler that we want to be stateless, we wont rely on listeners here.   

So on this method we now invoke   
**handle_primary_election_message(Primary_message)**   

* **Plugin_gcs_events_handler::>handle_leader_election_if_needed(**

This method will now consist of:

</p>

1. [Member not in primary mode && there is no running election process]   
   Return
2. [Is the primary dead]   
   Invoke **execute_primary_election(NULL, DEAD_OLD_PRIMARY)**

</p>

</br>


6.2.1 Utility class: Invoke Primary Election - The primary sub process 
-----------------------------------------------------------------------

This class will be used control the election process on the new
appointed member. 
    
### The primary process: Code Skeleton 

    // The class that controls the election from the primary perspective. 
    class Primary_election_primary_process 
      : public Group_event_listener
      /*
        Launch the local process on the primary member for primary election
        
        @param election_mode the context on which election is occurring 

        @returns 0 in case of success, or 1 otherwise
      */
      int launch_primary_election_process(enum_primary_election_mode election_mode);
    
      /*
        Is the election process running? 
        @returns  election_process_running
      */
      bool is_election_process_running()
    
      /*
        Terminate the election process on shutdown
      */
      int terminate_election_process()
    
    private:

      /*
        Internal thread execution method with the election process 
      */
      int primary_election_process_handler();

      //Listener: React to view changes
      after_view_change(joining, leaving, group, *skip_election)
    
      //Listener: React to messages
      before_message_handling(message, *skip_message)
      
      /* Is the election process running */
      bool election_process_running;
      /* Is the process aborted */
      bool election_process_aborted;
      /* Waiting for old primary transaction execution */
      bool waiting_on_old_primary_transactions;
      
      /* The election invocation context */
      enum_primary_election_mode election_mode;
      
      //The number of members known for the current action
      list<uuid> known_members_uuids;
      
      /* The stage handler for progress reporting*/
      Plugin_stage_monitor_handler* stage_handler;
      
      mysql_mutex_t election_lock;
      mysql_cond_t  election_cond;

</br>

### The primary process: Method logic

**int launch_primary_election_process(enum_primary_election_mode election_mode)** 

</p>

1. Set the *election_mode* field
2. Set the list of know member uuids: *known_members_uuids*    
   Must be done here as this step is executed under the GCS serial process
3. Register the listeners for group events.
4. Instantiate the *stage_handler*
5. Launch a thread that will call primary_election_process_handler();
6. Check that the thread was launched and it running. 

</p>

**int primary_election_process_handler()**

</p>

1. Set *election_process_running* = true
2. [election_mode == SAFE_OLD_PRIMARY]   
   Go to step 5
3. Submit a Queue_checkpoint_packet on the applier module and wait for it to be consumed
4. Use the channel_get_retrieved_gtid_set method from the channel interface to get the current applier retrieved set.    
   Loop until the server GTID executed contains all the retrieved GTIDs. 
5. Send a message stating that the primary is now ready for election    
   Use a Single_primary_message with type SINGLE_PRIMARY_PRIMARY_READY.  
6. [election_mode != DEAD_OLD_PRIMARY]   
   Execute **Applier_module::queue_certification_enabling_packet(true)**.
7. Set the server super read only mode to false.    
   Use enable_server_read_mode.
8. [election_mode == DEAD_OLD_PRIMARY]   
   return 
9. Wait for all members to be in read mode   
   lock election lock  
   while(known_members_uuids is not empty) wait on election condition  
   unlock election condition  
10. Set *waiting_on_old_primary_transactions* to true   
    Execute **Applier_module::end_multi_primary_period()**
11. The certification disabling process follows the old algorithm from this point. 
12. Wait for all transactions of old primary to be executed   
    lock election lock  
    while(waiting_on_old_primary_transactions) wait on election condition  
    unlock election condition  
13. End the stage on Plugin_stage_monitor_handler;
14. Unregister the group event listeners
15. Declare *election_process_running* = false;

</p>

**after_view_change(joining, leaving, group, *skip_election)**

</p>

1. Lock election lock
2. Remove the leaving members from *known_members_uuids*
3. [known_members_uuids is empty]   
   Awake the election_condition
4. Unlock the election lock 

</p>


**before_message_handling(message, *skip_message)**

</p>

1. Lock election lock
2. [If message type = SINGLE_PRIMARY_READ_MODE_SET]    
   Remove the received uuid from the *known_members_uuids* list   
   [known_members_uuids is empty]   
   Execute the observer **after_primary_election(primary_uuid, 0)**   
   Awake the election condition
3. [If message type = SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE]   
   waiting_on_old_primary_transactions = false    
   Awake the election condition
4. Unlock the election lock 

</p>

**int terminate_election_process()**

It is assumed here that step 4,9 and 12 of
**primary_election_process_handler()** have termination flags for election_process_aborted;

</p>

1. Set election_process_aborted to true;
2. Execute **event_is_consumed()** for the Queue_checkpoint_packet
3. Awake the election condition.
4. Wait for *election_process_running* = false

</p>


</br>

### The primary process: Monitoring

Here we describe when process stages change and how we do the
monitoring of progress.

Here, we will use the steps from **primary_election_process_handler()**

* Step 2

The stage is set to 

    Primary Election: applying buffered transactions
    
The progress is set in the handler, passing into it the
Plugin_stage_monitor_handler object.

* Step 8

The stage is set to 
    
    Primary Election: Waiting for members to turn on super_read_only

The estimated work is the size of *known_members_uuids*
Progress is reported when the array changes. 

* Step 9

The stage is set to 
    
    Primary Election: Stabilizing transactions from former primaries. 

We can either do:   
1) A sleep loop on the thread process checking the difference between
received and executed GTIDs    
2) Just set estimated work to 1 and set progress when we see a message of
type SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE.   
We might go for 2 given the time restrictions on implementation. 
    

</br>

### The primary process: Error handling 

It is assumed that when the thread errors out for some reason, the
process will leave the group and the plugin will enable the read mode
on the server.    
The hook **after_primary_election** will be invoked with an error
value to alert possible listeners.   


</br>


6.2.1 Utility class: Invoke Primary Election - The secondary sub process 
-----------------------------------------------------------------------

This class will be used control the election process on the secondary
members of the election. 
    
### The secondary process: Code Skeleton 

    // The class that controls the election from the secondary perspective. 
    class Primary_election_secondary_process 
      : public Group_event_listener
      
      /*
        Launch the local process on the secondary members for primary election
        
        @param election_mode the context on which election is occurring 

        @returns 0 in case of success, or 1 otherwise
      */
      int launch_primary_election_process(enum_primary_election_mode election_mode);
    
      /*
        Is the election process running? 
        @returns  election_process_running
      */
      bool is_election_process_running()
    
      /*
        Terminate the election process on shutdown
      */
      int terminate_election_process()

    private:

      /*
        Internal thread execution method with the election process 
      */
      int primary_election_process_handler();
      
      //Listener: React to messages
      before_message_handling(message, *skip_message)
    
      //Listener: React to view changes
      after_view_change(joining, leaving, group, *skip_election)
      
      /* The stage handler for progress reporting*/
      Plugin_stage_monitor_handler* stage_handler;
          
      /* Is the election process running */
      bool election_process_running;
      /* Is the process aborted */
      bool election_process_aborted;
      /* Waiting for old primary transaction execution */
      bool waiting_on_old_primary_transactions;
      
      //The number of members known for the current action
      list<uuid> known_members_uuids;
      
      /* Is the primary ready? */
      bool primary_ready;
      mysql_mutex_t election_lock;
      mysql_cond_t  election_cond;

</p>


</br>

### The secondary process: Method logic

**int launch_primary_election_process(enum_primary_election_mode election_mode)** 

</p>

1. Set the *election_mode* field
2. Set the list of know member uuids: *known_members_uuids*
3. Register the listeners for group events.
4. Instantiate the *stage_handler*
5. Launch a thread that will call primary_election_process_handler();
6. Check that the thread was launched and it running. 

</p>

**int primary_election_process_handler()**

</p>

1. election_process_running = true
2. Wait for primary ready message.    
   lock election lock  
   while(!primary_election) wait on election condition  
   unlock election condition  
3. [election_mode != DEAD_OLD_PRIMARY]   
   Set *waiting_on_old_primary_transactions* to true   
   Execute Applier_module::queue_certification_enabling_packet(false).
4. Set the server super read only mode to true.    
   On failure (if not aborted) invoke **abort_server_process()**
5. Send message as the member is on read mode   
   Use a Single_primary_message with type SINGLE_PRIMARY_READ_MODE_SET.  
6. The certification disabling process follows the old algorithm from this point.  
7. Wait for all transactions of old primary to be executed lock election lock  
   while(waiting_on_old_primary_transactions) wait on election condition  
   unlock election condition
8. End the stage on Plugin_stage_monitor_handler;
9. Unregister the group event listeners
10. Declare *election_process_running* = false;

</p>

**before_message_handling(message, *skip_message)**

</p>

1. Lock election lock
2. [If message type = SINGLE_PRIMARY_PRIMARY_READY]    
   Set *primary_ready* to true   
   Awake the election condition
3. [If message type = SINGLE_PRIMARY_READ_MODE_SET]    
   Remove the received uuid from the *known_members_uuids* list   
4. [known_members_uuids is empty]   
   Execute the observer **after_primary_election(primary_uuid, 0)**
5. Unlock the election lock 

</p>

**after_view_change(joining, leaving, group, *skip_election)**

</p>

1. Remove the leaving members from *known_members_uuids*
2. Set election_process_aborted to true; (Accelerate the termination process)
   
</p>

**int terminate_election_process()**

It is assumed here that step 2 and 7 of
**primary_election_process_handler()** have termination flags for election_process_aborted;

</p>

1. Set election_process_aborted to true;
2. Instantiate a new SQL session and issue a KILL QUERY to the read mode query.
3. Awake the election condition.
4. Wait for *election_process_running* = false

</p>


</br>

### The secondary process: Monitoring 

Here we describe when process stages change and how we do the
monitoring of progress.

Here, we will use the steps from **primary_election_process_handler()**

* Step 1

The stage is set to 

    Primary Election: Waiting on current primary transaction execution
    
Estimated work is 1, and progress is incremented when the message
comes. 

* Step 3

The stage is set to 
    
    Primary Election: Waiting for members to turn on super_read_only

The estimated work is the size of *known_members_uuids*
Progress is reported when the array changes. 

* Step 5

The stage is set to 
    
    Primary Election: Stabilizing transactions from former primaries. 

There is no good way to track progress here.   
So we just set estimated work to 1 and progress is set when the
message from the primary comes. 


</br>

### The secondary process: Error handling 

It is assumed that when the thread errors out for some reason, the
process will leave the group and the plugin will enable the read mode
on the server.    
The hook **after_primary_election** will be invoked with an error
value to alert possible listeners.   

We don't include here errors when enabling read mode as they will lead
to a server abort (as pointed on the HLD) 

</br>

6.3 Utility class: Check Server query execution
===============================================

This class will be used to extract the number of running transactions
in the server. 

### Code Skeleton 

    // Class to query about what transactions are running
    class Server_query_execution_handler :
     public Group_transaction_listener
    
    public:
    
      /*
        Get the list of running transactions from the server
        @param ids[out] an array of thread ids
        @returns 0 in case of success, 1 in case of error
      */
      int get_server_running_transactions(my_thread_id** ids)
    
      /*
        Gets running transactions and waits for its end
        @returns 0 in case of success, 1 in case of error
      */
      int wait_for_current_transaction_load_execution()
    
      // Abort any running waiting process
      void abort_waiting_process();
    
      after_rollback(thread_id, enum_group_transaction_origin)
    
      after_commit(thread_id, enum_group_transaction_origin)
    
    private:
    
      queue<thread_id> thread_ids_finished;
    
      lock query_wait_lock;
    
      bool wait_process_aborted;
    
### Method logic

* **get_server_running_transactions(my_thread_id** ids)**

</p>

1. Get server service for transaction querying 
2. If valid, get all current running transactions.
3. Discard the invoking thread id if in the list
 
</p>

* **wait_for_current_transaction_load_execution(Plugin_stage_monitor_handler stage_handler=NULL)**

</p>

1. Lock query_wait_lock 
2. Register itself on Group_transaction_observation_manager::register_channel_observer
   This allows the code to receive notifications for commits and aborts. 
3. Invoke get_server_running_transactions(list_of_thread_ids)
4. Unlock query_wait_lock
5. while (list_of_thread_ids.size != 0 || !wait_process_aborted)   
  remove any all members from list_of_thread_ids that match thread_ids_finished  
  execute get_server_running_transactions(new_list_of_thread_ids)   
  remove any entry that is on list_of_thread_ids and not new_list_of_thread_ids  
  sleep 1 second

</p>

In terms of monitoring, i.e, if a *Plugin_stage_monitor_handler* is
given

</p>

1. Set the estimated work to the number of transactions in the list_of_thread_ids
2. Whenever the code loops, set the completed work to the initial total minus the
   remaining transactions.

</p>


* **abort_waiting_process()**

</p>

1. wait_process_aborted = true;

</p>


* **after_rollback(thread_id, enum_group_transaction_origin)**

</p>

1. Lock query_wait_lock
2. Add thread id to  thread_ids_finished
3. Unlock query_wait_lock
 
</p>

* **after_commit(thread_id, enum_group_transaction_origin)**

</p>

1. Lock query_wait_lock
2. Add thread id to thread_ids thread_ids_finished
3. Unlock query_wait_lock

</p>

### Server Service 

This query into how many transactions are running in the server is made
trough a server service. 

    BEGIN_SERVICE_DEFINITION(transactional_querying_service)
                             DECLARE_METHOD(size_t,
                             get_server_transactions,
                             (unsigned long** ids));
    END_SERVICE_DEFINITION(transactional_querying_service)

This service is then added to the server components in 

*components/mysql_server/server_component.cc*      
*components/mysql_server/server_component.h*

The plan for the method is: first create a class 

    class Get_running_transactions : public Do_THD_Impl
    
    public:
    
      /*
        Method executed for each thread
      */
      virtual void operator()(THD *thd)


Then when the service is invoked do

     Get_running_transactions trx_counter;
     Global_THD_manager::get_instance()->do_for_all_thd(&trx_counter);
     trx_counter.get_transaction_ids();

About the operator method, the idea is for each thread check

1. Has the thread a query plan?      
   If it is running, it has one.     
   We can also filter DML queries here, since we don't care for DDL

2. If there is no query plan, then maybe the transactions is in
   between statements.      
   If that is true, then the method       
   -*in_active_multi_stmt_transaction()*     
   will return true.
   
Some considerations about this service.   
Yes, this service may return a transaction that just finish or fail to
return a transaction that just started.    
Lets look at the context were we need though.    
There are a bunch of transactions that may have started and are
running that will be now incompatible because of reason R.    
If we only want for these to end, if this service is executed after R
is changed, then all the new transactions that are now starting don't
matter.    
Also, the ones that ended only mean less trouble for us.    

</br>

6.4 Utility class: SET PERSIST 
==============================

This class will be used to persist system variables using the session
API to call SET PERSIST commands. 

This commands will make some of the changes made in the plugin
persistent to restarts. 


### Code Skeleton 

    // Class to execute SET PERSIST queries 
    class Persistent_variables_handler
    
    public:
    
      /*
        Get the list of running transactions from the server
        @param name the name of the query
        @param value the value to set in the variable
        @param session_isolation what isolation the server connection must have 

        @note use this method when there is not an open server connection

        @returns 0 in case of success, or the error value from the query
      */
      int set_persistent_variable(string name, string value, enum_plugin_con_isolation session_isolation)
    
      /*
        Get the list of running transactions from the server
        @param name the name of the query
        @param value the value to set in the variable
        @param command_interface the interface to the session API 

        @note use this method when there is already an open server connection 

        @returns 0 in case of success, or the error value from the query
      */
      int set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)

### Method logic

* **set_persistent_variable(string name, string value, enum_plugin_con_isolation session_isolation)**

</p>

1. Create a Sql_service_command_interface instance. 
2. Invoke **set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)**
 
</p>

* **set_persistent_variable(string name, string value, Sql_service_command_interface *command_interface)**

</p>

1. Construct the set persist query with the given parameters.
2. Execute the query and extract the return result. Throw an error if
   needed

</p>


</br>

6.5 Utility class: Abort server mechanism 
=========================================

### Code Skeleton 

No need for a class here, just add a method to *plugin_utils.h/cc*

    int abort_server_process()

### Method logic 

* **abort_server_process()**

</p>

1. Set a registry reference extracted from **mysql_plugin_registry_acquire**
2. Fetch the *server_abort_service* service from the registry
3. Invoke the **abort_server_process** in the service 
 
</p>

### The Service

The idea behind this class it use a service that will encapsulate an
abort procedure.

So we need a new service

    BEGIN_SERVICE_DEFINITION(server_abort_service)
        DECLARE_BOOL_METHOD(abort_server_process, const char* message);
    END_SERVICE_DEFINITION(server_abort_service)

This service is then added to the server components.

The implementation of such a method would be similar to the current
implementation of **exec_binlog_error_action_abort**. 

</p>

1. [Is THD present]   
   Try to send an error to the client about the fatal error    
   [else]   
   Print an error to the log.
2. Invoke *abort()* 

</p>

This also means a new error should be added like 

    ER_SERVICE_ABORT: A component aborted the mysql server: %s

since the basic ER_ABORTING doesn't allow generic messages. 

</br>

6.6 Utility class: Plugin stages for monitoring 
===============================================

An important part of this WL is the monitoring of actions currently
being executed.   
As described in the High Level Design, the idea is to use thread
stages to express the step the group action currently is and its
progress.  

Lets start with the base class that takes inspiration from the clone
plugin *clone_monitor.h*.

### Code Skeleton 

    // Class to execute SET PERSIST queries 
    class Plugin_stage_monitor_handler
    
    public:
    
      /* The class constructor */
      Plugin_stage_monitor_handler();
    
      /* The class destructor */
      ~Plugin_stage_monitor_handler();
      
      /*
        Set that a new stage is now in progress. 
        @param key The PSI key for the stage
        @param function the file for this stage
        @param line the line of the file for this stage
        @param estimated_work what work is estimated for this stage
        @param completed_work what work already completed for this stage

        @returns 0 in case of success, or 1 otherwise
      */
      int set_stage(PSI_stage_key key, string file, int line,
                    ulonglong estimated_work, ulonglong work_completed)
    
      /*
        Set the currently estimated work for this stage
      */
      int set_estimated_work(ulonglong estimated_work)
    
      /*
        Set the currently completed work for this stage
      */
      int set_completed_work(ulonglong completed_work)

      //get methods
      
      /*
        End the current stage
      */
      int end_stage();

    private:
      SERVICE_TYPE(registry) *registry;
      my_service<SERVICE_TYPE(psi_stage_v1)> stage_service;
      PSI_stage_progress* stage_progress_handler;

### Method logic 

* **Plugin_stage_monitor_handler()**

</p>

1. Set the **registry** field with a reference extracted from **mysql_plugin_registry_acquire**
2. Fetch the *psi_stage_v1* service from the registry and set **stage_service**

</p>

* **~Plugin_stage_monitor_handler()**

</p>

1. Delete **stage_service**
2. Use **mysql_plugin_registry_release** to relase the *registry* field;

</p>

* **set_stage(PSI_stage_key key, string file, int line, ulonglong estimated_work, ulonglong work_completed)**

</p>

1. Invoke the **start_stage** method in the service with the given key, file and
   line. 
2. Set **stage_progress_handler** with the *PSI_stage_progress* object returned on 2
3. Set the estimated work and completed work on **stage_progress_handler**

</p>

* **set_estimated_work(ulonglong estimated_work)**

</p>

1. Set the current work being estimated on **stage_progress_handler** 

</p>

* **set_estimated_work(ulonglong estimated_work)**

</p>

1. Set the current completed work on **stage_progress_handler** 

</p>

* **end_stage**

</p>

1. Just invoke **end_stage** on the service

</p>


</br>

### Life-cycle 

Under this worklog, this utility makes sense in the context of a group
action execution.   
Hence, it makes sense that an instance is created every time an action is accepted.    
The service is then only used while the action is running.    
This does not invalidate that other server parts may use this handler
for other purposes with a different life cycle.    
Such an example is the primary election algorithm that will use stages
even outside its invocation trough group actions. 


</br>

### Stage keys 

One of the key parts of this stage instrumation is the keys. 
They shall be registerd under the plugin_psi.h/cc under the form 

    PSI_stage_info gr_stage_group_action_running=
     {0, "Executing some group stage", PSI_FLAG_STAGE_PROGRESS};

As described in the HLD the stage keys are:

    Multi-primary Switch: waiting for pending transactions to finish.
    
    Multi-primary Switch: waiting on another member step completion
    
    Multi-primary Switch: applying buffered transactions.
    
    Multi-primary Switch: waiting for operation to complete on all members.
        

    Single-primary Switch: checking group pre-conditions.
    
    Single-primary Switch: executing primary election
    
    Single-primary Switch: waiting for operation to complete on all members.
    
    
    Primary switch: checking current primary pre-conditions.
    
    Primary Switch: waiting for pending transactions to finish.
    
    Primary Switch: waiting on another member step completion
    
    Primary Switch: executing primary election
    
    Primary Switch: waiting for operation to complete on all members.


    Primary Election: applying buffered transactions.

    Primary Election: Waiting on current primary transaction execution

    Primary Election: Waiting for members to turn on super_read_only
        
    Primary Election: Stabilizing transactions from former primaries. 


</br>

7. Messages
===========

**7.1 New Message: Action message**
------------------------------------

The messages used by actions must be extensible as new actions might
emerge.   

### Message type 

On *gcs_plugin_messages.h* add to     
<code> enum_cargo_type  </code>   
the new type   
CT_GROUP_ACTION_MESSAGE


</br>

### Group_action_message - Code Skeleton

    //The base message for action messages
    class Group_action_message : public Plugin_gcs_message
    
      // Enum for message payload
      enum_action_message_type{
       PIT_UNKNOWN= 0,      // Not used
       PIT_ACTION_TYPE=1,   // The action type
       PIT_ACTION_PHASE=2,  // The action phase
       PIT_ACTION_DATA=3,   // The action data
       PIT_MAX
      }
    
      // Enum for the types of message / actions
      enum_action_message_type{
       ACTION_MULTI_PRIMARY_MESSAGE      // Change to multi primary
       ACTION_PRIMARY_ELECTION_MESSAGE  // Elect a primary member
      }
    
     // Enum for the phase of the action in the message
     enum_action_message_phase{
      ACTION_START_PHASE  // Start a new action
      ACTION_END_PHASE    // The action was ended
      ACTION_ABORT_PHASE  // The action was aborted
     }
    
    public:
    
      // Constructor
      Group_action_message(enum_action_message_type, enum_action_message_phase)
    
      // Get the action type for this message
      enum_action_message_type get_action_type()
    
      // Get the action phase for this message
      enum_action_message_phase get_action_phase()
    
    protected:
    
      /*
        The inherited encode method
        @param buffer  [out]  the message encoded
      */
      void encode_payload(buffer);
    
      /*
        The inherited decode method
        @param[in] buffer the received data
        @param[in] end    the end pointer
      */
      void decode_payload(buffer, end)
    
      /*
        Encode the data associated to the action if existent
        @param buffer  [out]  the message encoded
      */
      virtual void encode_action_data(buffer);
    
      /*
        Decode the data associated to the action if existent
        @param[in] buffer the received data
        @param[in] end    the end pointer
      */
      virtual void decode_action_data(buffer, end);
    
     private:
    
      // The action type for this message
      enum_action_message_type action_type
    
      // If it is a start or stop message
      enum_action_message_phase action_phase
    
      // The potencial payload this action class has
      const uchar * action_data


</br>

### Group_action_message - Method logic

* **Group_action_message(enum_action_message_type, enum_action_message_phase)**

</p>

1. Set action type
2. Set action phase
3. action_data remains empty
 
</p>

* **encode_payload(buffer)**

</p>

1. Encode message type
2. Encode action phase
3. Invoke encode_action_data(buffer);
 
</p>

* **decode_payload(buffer)**

</p>

1. Decode and set message type
2. Decode and set action phase
3. Invoke encode_action_data(buffer);
 
</p>

* **encode_action_data(buffer)**

</p>

Since this is the default implementation of the method, nothing is
done here. 

</p>

* **decode_action_data(buffer)**

</p>

The default implementation of this method copies the remaining payload
to *action_data*

</p>



</br>

### Group_action_message - Code related changes

* **Plugin_gcs_events_handler::on_message_received(const Gcs_message& message)**

</p>

1. Add another case for CT_GROUP_ACTION_MESSAGE.
2. Get the Group_action_coordinator instance.
3. Invoke **Group_action_coordinator::handle_action_message()**
 
</p>


</br>

### Primary_election_action_message - Code Skeleton

    //The class for primary election message
    Primary_election_action_message :public Group_action_message
    
    public:
    
      // Constructor
      Primary_election_action_message(enum_action_message_phase, uuid)
    
      // Constructor
      Primary_election_action_message(Group_action_message)
    
    protected:
    
      /*
       Encode the data associated to the action if existent
        @param buffer  [out]  the message encoded
      */
      virtual void encode_action_data(buffer);
    
      /*
       Decode the data associated to the action if existent
       @param[in] buffer the received data
       @param[in] end    the end pointer
      */
      virtual void decode_action_data(buffer, end);
    
     private:
    
      // The uuid for election, can be empty if not defined
      string primary_uuid



</br>


### Primary_election_action_message - Method logic

* **Primary_election_action_message(enum_action_message_phase, uuid)**

</p>

1. Set action type to ACTION_PRIMARY_ELECTION_MESSAGE
2. Set action phase to the given parameter
3. Set the primary uuid.
 
</p>

* **Primary_election_action_message(Group_action_message)**

</p>

1. Assert action type is equal to ACTION_PRIMARY_ELECTION_MESSAGE
2. Copy action phase
3. Decode the uuid from *action_data*
 
</p>


* **encode_action_data(buffer)**

</p>

1. Encode the primary uuid

</p>

* **decode_action_data(buffer)**

</p>

1. Decode the primary uuid

</p>

</br>

**7.2 New Message: Validation message**
---------------------------------------

These messages are used to know that there are no slave channels in
non primary members.    
This message can be used in the future for validations on other
processes.

### Message type 

On *gcs_plugin_messages.h* add to     
<code> enum_cargo_type  </code>   
the new type   
CT_GROUP_VALIDATION_MESSAGE


</br>

### Group_validation_message -  Code Skeleton

    //The base message for action messages
    class Group_validation_message : public Plugin_gcs_message
    
      // Enum for message payload
      enum_action_message_type{
       PIT_UNKNOWN= 0,          // Not used
       PIT_VALIDATION_TYPE=1,   // The validation type
       PIT_VALIDATION_CHANNEL=2,  // The member has channel flag
       PIT_MAX
      }
    
      // Enum for the types of message / actions
      enum_action_message_type{
       ACTION_CHANNEL_VALIDATION_MESSAGE // Channel presence msg
      }
    
    public:
    
      // Constructor
      Group_validation_message(bool has_channels)
    
      // Does who sent this message has slave channels
      bool has_slave_channels()
    
    protected:
    
      /*
        The inherited encode method
        @param buffer  [out]  the message encoded
      */
      void encode_payload(buffer);
    
      /*
        The inherited decode method
        @param[in] buffer the received data
        @param[in] end    the end pointer
      */
      void decode_payload(buffer, end)
    
     private:
    
      // Does the member has channels?
      bool* has_channels



</br>

### Group_validation_message - Method logic

* **Group_validation_message(bool has_channels)**

</p>

1. Set *has_channels*
 
</p>

* **encode_payload(buffer)**

</p>

1. Encode message type
3. Encode has_channels
 
</p>

* **decode_payload(buffer)**

</p>

1. Decode message type
3. Decode has_channels
 
</p>


</br>


</br>

**7.3 Primary member message extension**
----------------------------------------

This is an extension of an already existent message class.    
**CT_SINGLE_PRIMARY_MESSAGE**

### Primary member message - Code Skeleton 

    //The base message for action messages
    class Single_primary_message : public Plugin_gcs_message
    
      // Enum for message payload
      enum_action_message_type{
       PIT_UNKNOWN= 0,          // Not used
       PIT_SINGLE_PRIMARY_MESSAGE_TYPE= 1, // The message type
       + PIT_SINGLE_PRIMARY_SERVER_UUID= 2,  // Uuid to elect
       + PIT_SINGLE_PRIMARY_ELECTION_MODE=3, // The election mode
       PIT_MAX
      }
    
      // Enum for the types of message / actions
      enum_action_message_type{
       SINGLE_PRIMARY_UNKNOWN
       SINGLE_PRIMARY_NEW_PRIMARY_MESSAGE
       SINGLE_PRIMARY_QUEUE_APPLIED_MESSAGE
       +SINGLE_PRIMARY_NO_RESTRICTED_TRANSACTIONS
       +SINGLE_PRIMARY_PRIMARY_ELECTION
       +SINGLE_PRIMARY_PRIMARY_READY
       +SINGLE_PRIMARY_READ_MODE_SET
       SINGLE_PRIMARY_MESSAGE_TYPE_END
      }
    
    public:
    
      // Constructor
      Single_primary_message(string primary_to_elect, enum_primary_election_mode mode);
    
      /*
        Returns the primary to elect for election messages
        @param uuid  [out]  the server uuid
      */
      void get_primary_to_elect(string& uuid)
    
    protected:
    
      /*
        The inherited encode method
        @param buffer  [out]  the message encoded
      */
      void encode_payload(buffer);
    
      /*
        The inherited decode method
        @param[in] buffer the received data
        @param[in] end    the end pointer
      */
      void decode_payload(buffer, end)
    
     private:
    
      // The uuid for the primary member
      String primary_uuid
      // The election mode
      enum_primary_election_mode election_mode

    

</br>

### Primary member message - Method logic 

* **Single_primary_message(string primary_to_elect, enum_primary_election_mode)**

</p>

1. Set type to SINGLE_PRIMARY_PRIMARY_ELECTION 
2. Set the uuid for the primary to be elected 
3. Set the mode
 
</p>

* **encode_payload(buffer)**

</p>

1. Encode message type
2. [If type == SINGLE_PRIMARY_PRIMARY_ELECTION]    
Encode the *primary_uuid* parameter 
Encode the *election_mode*
 
</p>

* **decode_payload(buffer)**

</p>

1. Decode and set message type
2. [If type == SINGLE_PRIMARY_PRIMARY_ELECTION]   
Decode and set *primary_uuid*
 
</p>


</br>

### Primary member message - Backport considerations 

For members in 5.7 that receive this message, there should be no
associated issues with these additions.    
Members in 5.7, there is no real change to the old messages and the
old decode and encode methods still work correctly.    
Only members on 8.0+ should receive the new election messages. 


</br>


</br>

8. UDF functions
================

One important point in the design is that these actions are made
trough user defined functions. 

<code>
 SELECT group_replication_switch_to_single_primary_mode([server_uuid]);
</code>

<code>
 SELECT group_replication_switch_to_multi_primary_mode();
</code>

<code>
  SELECT group_replication_set_as_primary(server_uuid);
</code>


Besides the necessary code base support, these also need to be created
alongside the plugin installation.     
In previous server versions this meant the user had to execute SQL
commands to create the functions, but not on the 8.0.2+ versions.    
With the UDF install service, these functions can now be created
alongside the install.


#### Code Skeleton - Functions 

    PLUGIN_EXPORT char*
    group_replication_switch_to_single_primary_mode(UDF_INIT*,
                                                    UDF_ARGS *args,
                                                    char *result,
                                                    unsigned long *length,
                                                    char*, char*)
    
    PLUGIN_EXPORT my_bool
    group_replication_switch_to_single_primary_mode_init(UDF_INIT* initid,
                                                         UDF_ARGS* args,
                                                         char* message)
    
    PLUGIN_EXPORT void
    group_replication_switch_to_single_primary_mode_deinit(UDF_INIT*)
    
    PLUGIN_EXPORT char*
    group_replication_switch_to_multi_primary_mode(UDF_INIT*,
                                                   UDF_ARGS *arg,
                                                   char *res,
                                                   unsigned long *length,
                                                   char*, char*)
    
    PLUGIN_EXPORT my_bool*
    group_replication_switch_to_multi_primary_mode_init(UDF_INIT* initid,
                                                        UDF_ARGS* args,
                                                        char* message)
    
    PLUGIN_EXPORT void
    group_replication_switch_to_multi_primary_mode_deinit(UDF_INIT*)


    PLUGIN_EXPORT char*
    group_replication_set_as_primary(UDF_INIT*,
                                     UDF_ARGS *arg,
                                     char *res,
                                     unsigned long *length,
                                     char*, char*)
    
    PLUGIN_EXPORT my_bool*
    group_replication_set_as_primary_init(UDF_INIT* initid,
                                          UDF_ARGS* args,
                                          char* message)
    
    PLUGIN_EXPORT void
    group_replication_set_as_primary_deinit(UDF_INIT*)



These must be announced trough a settings file 

<code>
rapid/plugin/group_replication/group_replication.def
</code>


#### Method logic - Functions 

* **group_replication_switch_to_single_primary_mode_init**

</p>

1. Check that the parameter count is 0 or 1
2. If given, check the parameter is a valid uuid
3. Check the uuid belongs to one of the members

</p>

* **group_replication_switch_to_single_primary_mode**

</p>

1. Lock the plugin auto_lock 
2. [Is plugin running?]    
If not, return
3. Check if the state is not the current already.
4. Group_action action = new Primary_election_action(uuid);  
error= group_action_coordinator.coordinate_action_execution(action);    
5. return to the user.    
Use Group_action::get_error_message if needed. 

</p>

* **group_replication_switch_to_multi_primary_mode**

</p>

1. Lock the plugin auto_lock 
2. [Is plugin running?]    
If not, return
3. Check if the state is not the current already.
4. Group_action action = new  Multi_primary_migration_action();  
error= group_action_coordinator.coordinate_action_execution(action);    
5. return to the user.    
Use Group_action::get_error_message if needed. 


</p>


* **group_replication_set_as_primary_init**

</p>

1. Check that the parameter count is 1
2. Check the parameter is a valid uuid
3. Check the uuid belongs to one of the members

</p>

* **group_replication_set_as_primary**

</p>

1. Lock the plugin auto_lock 
2. [Is plugin running?]    
If not, return
3. Check if the state is not the current already.
4. Group_action action = new Primary_election_action(uuid);  
error= group_action_coordinator.coordinate_action_execution(action);    
5. return to the user.    
Use Group_action::get_error_message if needed. 

</p>

#### Function installation

To automatically create the functions at plugin install we can use
the UDF install service.

    my_service<SERVICE_TYPE(udf_registration)> service("udf_registration.mysql_server", r);
    
    service->udf_register("group_replication_change_primary_to",
                          Item_result::STRING_RESULT,
                          (Udf_func_any) group_replication_switch_to_single_primary_mode,
                          group_replication_switch_to_single_primary_mode_init,
                          group_replication_switch_to_single_primary_mode_deinit);


This code is to be located in the plugin install.   
Due to the server initialization order we may have to rely on the
Delayed initialization thread. 

</br>

9. Applier Module Action Packet extension - Queue Checkpoint Packet
===================================================================

This small section is about a small addition to the applier module.    
The idea is to have a packet that you can use to wait until it is
processed, i.e., until the current queue is consumed.   

### Code Skeleton 

    enum enum_packet_action
    {
      TERMINATION_PACKET=0,  //Packet for a termination action
      SUSPENSION_PACKET,     //Packet to signal something to suspend
      CHECKPOINT_PACKET      //Packet to wait for queue consumption
      ACTION_NUMBER= 2       //The number of actions
    };
    
    /**
      @class Queue_checkpoint_packet
      A packet to wait for queue consumption 
    */
    class Queue_checkpoint_packet: public Action_Packet
    {
    public:
    
      /**
        Create a new action packet.
        @param  action           the packet action
      */
      Queue_checkpoint_packet()
        :Action_Packet(CHECKPOINT_PACKET), packet_consumed(false)
      {
        init lock;
        init condition;
      }
    
      ~Queue_checkpoint_packet() {}
      
      void wait_on_event_consumption();
      
      void event_is_consumed();
      
    private: 
      bool packet_consumed; 
      mysql_mutex_t lock;
      mysql_cond_t  cond;
    };

### Method logic 

* **wait_on_event_consumption()**

</p>

1. Lock
2. while the packet is not consumed wait
3. unlock 

</p>


* **event_is_consumed()**

</p>

1. lock
2. set the flag to true
3. unlock 

</p>
    
### Related changes

* **Applier_module::apply_action_packet(Action_packet *action_packet)**

On the method add a branch that does 

    if (action == CHECKPOINT_PACKET)
    {
      cast the packet to Queue_checkpoint_packet
      invoke event_is_consumed()
      return false
    }

</br>

10. File Structure
==================
 
Due to the number of files on plugin folder, this WL proposes a more
structured approach to the code.

All folder below, unless specified, refer to the base:
**rapid/plugin/group_replication**    
The **+** means addition of a new file.    
The **m** means the move of an existing file.   

Note that these change also affect the structure inside plugin CMakeLists.txt like

    SET(GROUP_REPLICATION_SOURCES
      src/*.cc
      src/XXX/*.cc
      src/YYY/*.cc
 
 
### Coordinator and actions 

These classes, as a new concept in Group Replication are located
in a new folder: **group_actions**

So we have 

**\+** src/group_actions/group_action_coordinator.cc        </br>
**\+** src/group_actions/group_action.cc                    </br> 
**\+** src/group_actions/multi_primary_migration_action.cc   </br> 
**\+** src/group_actions/primary_election_action.cc         </br> 

same for ".h" on include/group_actions/

#### Group action - notifications
 
Used solely for group actions for now, we propose the place the
notification on: **group_actions/notifications** 
 
**\+** src/group_actions/notifications/action_notification.cc        </br>
**\+** src/group_actions/notifications/dead_member_notification.cc   </br> 
**\+** src/group_actions/notifications/channel_validation_notification.cc   </br> 

same for ".h" on include/group_actions/notifications/ 

### Observers 
 
We already had some observers in the plugin for replications channel.   
With this worklog we add two more to be located under: **plugin_observers**

**\+** src/plugin_observers/group_event_listener.cc            </br>
**\+** src/plugin_observers/group_transaction_listener.cc      </br> 
**m** src/plugin_observers/channel_observation_manager.cc     </br> 

same for ".h" on include/plugin_observers/

### Handlers 

All classes that used in the plugin to execute a contained action
should be isolated into the folder **plugin_handlers**

So we have 

**\+** src/plugin_handlers/primary_election/primary_election_validation_handler.cc </br>
**\+** src/plugin_handlers/primary_election/primary_election_invocation_handler.cc </br> 
**\+** src/plugin_handlers/primary_election/primary_election_primary_process.cc </br> 
**\+** src/plugin_handlers/primary_election/primary_election_secondary_process.cc </br> 
**\+** src/plugin_handlers/server_transaction_checks_handler.cc </br> 
**\+** src/plugin_handlers/persistent_variables_handler.cc </br>
**\+** src/plugin_handlers/stage_monitor_handler.cc   </br> 
**m**  src/plugin_handlers/read_mode_handler.cc                    

same for ".h" on include/plugin_handlers/

### Messages

With the addition of new messages to the plugin, it is time to also
have a dedicated folder: **plugin_messages** 

**\+** src/plugin_messages/group_action_message.cc                </br>
**\+** src/plugin_messages/primary_election_action_message.cc     </br> 
**\+** src/plugin_messages/group_validation_message.cc            </br> 
**m**  src/plugin_messages/single_primary_message.cc              </br> 
**m**  src/plugin_messages/recovery_message.cc                     

same for ".h" on include/plugin_messages/

### UDF functions

For UDF functions we are adding 

**+** src/plugin_udf_functions.cc       </br>
**+** include/plugin_udf_functions.h    

And also the definition file:

**+** /group_replication.def

### Services

For the server side implementation we need to create the new files:

**+** include/mysql/components/services/transactional_querying.h  </br>
**+** sql/server_compoment/dynamic_transactional_querying_impl.cc </br>
**+** sql/server_compoment/dynamic_transactional_querying_impl.h  </br>
**+** sql/server_compoment/server_abort.cc </br>
**+** sql/server_compoment/server_abort.h