WL#13855: GR: Member actions

Affects: Server-8.0   —   Status: Complete

EXECUTIVE SUMMARY
=================

This worklog shall provide a framework for DBAs/operators to allow
configure a group to always stay in read-only mode.
The DBA/operator shall be able to trigger actions when a member
role changes to primary.


BACKGROUND
==========

In certain cases the DBA/operator may want that all members are read
only. For example, when this group (B) is replica of another group.


               Group A
        |---|   |---|   |---|
    /-- | P |   | S |   | S |
    |   |---|   |---|   |---|
    |     |       |       |
    |     -----------------
    |
    |
    |                       Group B
    | inbound        |---|   |---|   |---|
    \--------------> | P |   | S |   | S |
      replication    |---|   |---|   |---|
                       |       |       |
                       -----------------


USER STORIES
============

- As MySQL DBA, I want all members in group B, which is a replica
  of group A and should never be written into until group A is down,
  to be read only so that group B protects itself against
  direct/stray writes, and therefore prevent split brain situations
  between group A and group B under normal operations.
FUNCTIONAL REQUIREMENTS
=======================
FR-01: The member actions shall only be triggered on single-primary
       mode.

FR-02: The member actions configuration can only be changed on:
         a) a server that is part of the group majority in
            single-primary mode and is the primary;
         b) a server that is not part of a group.
       On both cases the server must be writable, that is,
       @@GLOBAL.read_only=OFF [1]

FR-03: When a member role changes to PRIMARY, a action shall be
       triggered on that member.
       This action is assigned to the event AFTER_PRIMARY_ELECTION.

FR-04: The following action types are allowed:
         INTERNAL: actions provided by Group Replication;

FR-05: Each action is assigned with a priority value, a integer
       between 1 to 100, that specifies the order on which the
       action will be run, the lower the value, the higher the
       priority.

FR-06: Each action shall specify the type of error handling, either
       IGNORE or CRITICAL.
         IGNORE:   errors will be ignored;
         CRITICAL: member will move into ERROR state and
                   --group_replication_exit_state_action option[2]
                   will be followed.

FR-07: The actions must be configured through the UDFs:
         group_replication_enable_member_action;
         group_replication_disable_member_action;
         group_replication_reset_member_actions.
       These UDFs only exist when the Group Replication plugin is
       installed.
       Enable a already enabled action or disable a already disabled
       action is allowed.

FR-08: The UDFs do require SUPER or GROUP_REPLICATION_ADMIN
       privilege.

FR-09: The UDFs must be executed when:
         a) the server is part of the group majority in
            single-primary mode and is the primary;
         b) the server is not part of a group.
       On both cases the server must be writable.
       Attempting to execute the UDFs on a secondary will throw
       ER_GRP_RPL_UDF_ERROR.

FR-10: A given action can only be used once per event, that is, no
       two actions with the same name can exist on the same event.

FR-11: Internal actions name must be prefixed with "mysql_".

FR-12: A action is configured with:
         name:           name
         enabled:        boolean
         type:           INTERNAL
         event:          AFTER_PRIMARY_ELECTION
         priority:       integer between 1 to 100
         error_handling: IGNORE, CRITICAL

FR-13: Only enabled actions shall be triggered.

FR-14: Group Replication provides the following INTERNAL primary
       election actions:
         action: mysql_disable_super_read_only_if_primary
           Which disables @@GLOBAL.super_read_only[3].

FR-15: The actions configuration shall be equal on all group
       members. As such Group Replication will ensure:
         a) The configuration done on the primary is propagated to
            all group members. This propagation is done through
            group messages and not through binary log.
         b) When a member joins the group, it will override its
            configuration with the one from the group.
         c) When a member joins the group, if all members are from a
            version that does not support member actions, then the
            joining member shall reset its actions configuration
            to the default one (described on FR-18).
         d) When a server bootstraps a group, that server
            configuration becomes the group configuration.
         e) After a group mode change from multi to single-primary,
            the primary shall propagate the actions configuration
            to all group members.

FR-16: An error while storing the configuration during the UDFs
         group_replication_enable_member_action;
         group_replication_disable_member_action;
         group_replication_reset_member_actions.
       will throw ER_GRP_RPL_UDF_ERROR.

FR-17: An error while receiving or storing the configuration on a
       group member during configuration propagation, will move that
       member into ERROR state and follow the
       --group_replication_exit_state_action option[2].

FR-18: The default actions configuration is composed by a single
       action:
         name:           mysql_disable_super_read_only_if_primary
         enabled:        1
         type:           INTERNAL
         event:          AFTER_PRIMARY_ELECTION
         priority:       1
         error_handling: IGNORE
       which only takes place on the primary, in order to keep
       the current read only behaviour.
       DBA can disable this action, meaning that after the primary
       is elected it will remain read only.

FR-19: Group Replication will keep the current behaviour of enabling
       super_read_only on join, before primary elections and errors.
       This behaviour is not configurable.

FR-20: The actions configuration can be queried on the
       `performance_schema.replication_group_member_actions`
       table. This table is only selectable by design.
       This table only exists when the Group Replication
       plugin is installed.

FR-21: The actions configuration version can be queried on the
       `performance_schema.replication_group_configuration_version`
       table. This table is only selectable by design.
       This table only exists when the Group Replication
       plugin is installed.

FR-22: An error while receiving or storing the configuration on a
       member during member join, will make the join error out.

FR-23: The member actions configuration can only be reset to the
       default configuration (described on FR-18) on a server that
       is not part of a group, using the UDF:
         group_replication_reset_member_actions
       The server must be writable, that is,
       @@GLOBAL.read_only=OFF [1]


NON-FUNCTIONAL REQUIREMENTS
===========================
None


[1] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_read_only
[2] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_exit_state_action
[3] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_super_read_only
SUMMARY OF THE APPROACH
=======================
The DBA will be able to configure actions that shall be triggered
when a primary election takes place on group in single-primary mode.

Approach:

- Internally, GR will intercept the primary election event and
  then will run an action named
  `mysql_disable_super_read_only_if_primary`
  The action will make the server read-writable if it has become
  the primary.

- This action can be enabled/disabled by privileged user through
  UDFs.

- GR will conditionally call the action, depending on the
  configuration.


Disabling:
```
mysql> SELECT group_replication_disable_member_action("mysql_disable_super_read_only_if_primary", "AFTER_PRIMARY_ELECTION");
```

Enabling:
```
mysql> SELECT group_replication_enable_member_action("mysql_disable_super_read_only_if_primary", "AFTER_PRIMARY_ELECTION");
```


USER INTERFACE
==============
A action is configured with:
  name:           name
  enabled:        boolean
  type:           INTERNAL
  event:          AFTER_PRIMARY_ELECTION
  priority:       integer between 1 to 100
  error_handling: IGNORE, CRITICAL

`name` is:
  the name of the action.

`enabled`:
  whether the action is enabled.
  0 means disabled.
  1 means enabled.

`type` is:
  INTERNAL: actions provided by Group Replication.

`event` is:
  AFTER_PRIMARY_ELECTION:
    on which event the action is triggered after the member role
    change to PRIMARY.

`priority` is:
   a integer between 1 and 100, that specifies the order on which
   the action will be run, lower values first.

`error_handling` is one of:
 IGNORE:   errors will be ignored;
 CRITICAL: error will be handled according to
           --group_replication_exit_state_action option[1].

Enable a action
---------------
The DBA can enable actions through the UDF
```
Name: group_replication_enable_member_action
  Arguments:
   - name: string
   - event: string
  Return:
   - string
  Throws:
   - ER_UDF_ERROR
```

Example:
```
mysql> SELECT group_replication_enable_member_action("mysql_disable_super_read_only_if_primary", "AFTER_PRIMARY_ELECTION");
```

The following log message will be logged:
```
Name: ER_GRP_RPL_MEMBER_ACTION_ENABLED

Input message: Member action enabled: "%s", type: "%s", event: "%s", priority: "%d", error_handling: "%s".

Materialized message: 2020-10-01T11:29:31.296927Z 0 [System] [MY-013742] [Repl] Plugin group_replication reported: 'Member action enabled: "mysql_disable_super_read_only_if_primary", type: "INTERNAL", event: "AFTER_PRIMARY_ELECTION", priority: "1", error_handling: "IGNORE".'
```

Disable a action
----------------
The DBA can disable actions through the UDF
```
Name: group_replication_disable_member_action
  Arguments:
   - name: string
   - event: string
  Return:
   - string
  Throws:
   - ER_UDF_ERROR
```

Example:
```
mysql> SELECT group_replication_disable_member_action("mysql_disable_super_read_only_if_primary", "AFTER_PRIMARY_ELECTION");
```

The following log message will be logged:
```
Name: ER_GRP_RPL_MEMBER_ACTION_DISABLED

Input message: Member action disabled: "%s", type: "%s", event: "%s", priority: "%d", error_handling: "%s".

Materialized message: 2020-10-01T11:29:31.296927Z 0 [System] [MY-013743] [Repl] Plugin group_replication reported: 'Member action disabled: "mysql_disable_super_read_only_if_primary", type: "INTERNAL", event: "AFTER_PRIMARY_ELECTION", priority: "1", error_handling: "IGNORE".'
```

Reset actions configuration
---------------------------
The DBA can reset the actions configuration through the UDF
```
Name: group_replication_reset_member_actions
  Arguments:
   - None
  Return:
   - string
  Throws:
   - ER_UDF_ERROR
```
It will reset the configuration to the default one, described on FR-18,
and set the version to 1.

Example:
```
mysql> SELECT group_replication_reset_member_actions();
```

The following log message will be logged:
```
Name: ER_GRP_RPL_MEMBER_ACTIONS_RESET

Input message: Member actions configuration was reset.

Materialized message: 2020-10-01T11:29:31.296927Z 0 [System] [MY-013744] [Repl] Plugin group_replication reported: 'Member actions configuration was reset'.'
```

The DBA must have SUPER or GROUP_REPLICATION_ADMIN privilege to call
these UDFs.


CONFIGURATION
=============
The member actions configuration is stored in a system table.
In order to change it, the DBA must use the UDFs presented on the
USER INTERFACE section.

The UDFs will ensure:
  a) that the configuration is updated on the primary (or on a
     single server);
  b) its arguments correctness;
  c) its propagation to all group members;
  d) its persistence.

When the configuration is changed on a single server, obviously it
is not propagated to outside that server.

The configuration propagation is done through group messages and not
through binary log, that is, it will not be written into the binary
log, thence no GTIDs will be consumed neither this configuration
will reach servers outside the group.

All group members will have the same configuration, the change
propagation guarantees that.

Since the configuration change will not generate GTIDs and the
propagation through the group is eventually consistent, there will
be a dedicated table to keep track of the version of the
configuration: `mysql.replication_group_configuration_version`.
The table will have two columns:
 1) name: the configuration name;
 2) version: the version of the configuration.
Every configuration change will increase that version.
Example:
```
|----------------------------------+---------|
| NAME                             | VERSION |
|----------------------------------+---------|
| replication_group_member_actions |       1 |
|----------------------------------+---------|
```

The version default value is 1.
The version is propagated together with the configuration.

Every time the UDFs
  group_replication_enable_member_action
  group_replication_disable_member_action
are run, the version for the row `replication_group_member_actions`
on the table `mysql.replication_group_configuration_version` is
incremented.

Every time the UDF
  group_replication_reset_member_actions
is run, the version for the row `replication_group_member_actions`
on the table `mysql.replication_group_configuration_version` is
set to 1.

That version can be queried on the table
`performance_schema.replication_group_configuration_version`.

Configuration will first be stored locally and then propagated, more
specifically:
 1) open the configuration and version tables;
 2) update both tables;
 3) commit;
 4) propagate changes;

In the case of a uncontrolled full group shutdown, the DBA can query
the table `performance_schema.replication_group_configuration_version`,
to identify the latest version of the actions configuration.

When a member joins the group, it will override its configuration
with the one from the group.

The default primary election actions configuration is composed by a
single action:
  name:           mysql_disable_super_read_only_if_primary
  enabled:        1
  type:           INTERNAL
  event:          AFTER_PRIMARY_ELECTION
  priority:       1
  error_handling: IGNORE
in order to keep the current read only behaviour, that is, once the
primary is elected, it becomes writable.
The default configuration version is 1.

The DBA can disable this action, meaning that after the primary is
elected it will remain read only.


SECURITY CONTEXT
================
The DBA must have SUPER or GROUP_REPLICATION_ADMIN privilege to call
the UDFs presented on USER INTERFACE section.


ACTIONS
=======

INTERNAL
--------
This type of actions are provided by Group Replication.

There are the following INTERNAL actions:
  mysql_disable_super_read_only_if_primary
    Which disables @@GLOBAL.super_read_only[2] on the primary.


UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION
===============================================
There are no repercussions on upgrade scenarios, since the default
configuration (described at FR-18) will provide the previous
behavior.

Group Replication will keep the current behaviour of enabling
super_read_only on join, primary elections and errors.
This behaviour is not configurable.

To prevent the following scenario:
  1) A group with 3 members of a version that do not support
     actions.
  2) A server (S4) that supports actions is configured with actions
      while outside of the group.
  3) S4 joins the group, it will not receive the group actions
     configuration, then it will join with possible different
     actions of the group.
when a joining member does not receive the group actions
configuration during the join, the joining member will reset its
actions configuration to the default one (described at FR-18).


OBSERVABILITY
=============
A performance schema table will be added to list the configured
actions.
This table is only selectable by design.
```
performance_schema.replication_group_member_actions (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The action name.',
  event CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action event.',
  enabled BOOLEAN NOT NULL COMMENT 'Whether the action is enabled.',
  type CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action type.',
  priority TINYINT UNSIGNED NOT NULL COMMENT 'The order on which the action will be run, value between 1 and 100, lower values first.',
  error_handling CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'On errors during the action will be handled: IGNORE, CRITICAL.');
```

A performance schema table will be added to list the configuration
versions.
This table is only selectable by design.
```
performance_schema.replication_group_configuration_version (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The configuration name.',
  version BIGINT UNSIGNED NOT NULL COMMENT 'The version of configuration.');
```

Log messages will be logged before each action is triggered.

```
Name: ER_GRP_RPL_MEMBER_ACTION_TRIGGERED

Input message: The member action "%s" for event "%s" with priority "%d" will be run.

Materialized message: 2020-10-01T11:29:31.296927Z 0 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "1" will be run.'
```


DEPLOYMENT AND INSTALLATION
===========================
There are no repercussions, the new tables and default configuration
will be automatically added by the upgrade step.


PROTOCOL
========
Not a protocol change, but the propagation of member actions
configuration will rely on the services `gr_message_service_send`
and `gr_message_service_recv` introduced by WL#12896: "Group
Replication: delivery message service".

These services use a generic message type `Group_service_message`
composed by a tag and a raw payload.

When the member actions configuration is changed on the
primary, the complete configuration is encoded with Protocol
Buffers[3], which will go in the raw payload with the tag
`mysql_replication_group_member_actions`.
This message is delivered to all members, replacing its configuration.

In order to update the configuration on a new member, the state
exchanged during the join will be extended with the complete primary
election actions configuration from the primary.
The member actions configuration included on the state exchange
will also be encoded with Protocol Buffers.


FAILURE MODEL SPECIFICATION
===========================

The UDF sections presented on USER INTERFACE can throw several errors
=====================================================================

Member role
-----------
The UDFs can only be executed on the primary and it must belong to
the group majority, or when the server does not belong to a group.
ER_UDF_ERROR will be thrown when those conditions are not met.
The `group_replication_reset_member_actions` UDF can only be
executed when the server does not belong to a group.

Invalid parameters
------------------
All parameters are mandatory and will be check according to the
functional requirements.
ER_UDF_ERROR will be thrown when those conditions are not met.


Persistence error on a single server
------------------------------------
If there is a error while storing the configuration during the UDFs
  group_replication_enable_member_action;
  group_replication_disable_member_action;
  group_replication_reset_member_actions.
ER_GRP_RPL_UDF_ERROR will be thrown.


Persistence or receiving error on a group
-----------------------------------------
If there is a error while receiving the configuration update, either
on the message decoding or while persisting the configuration
locally, this member will move into ERROR state and follow the
--group_replication_exit_state_action option[1].
The following error message will be logged:
```
ER_GRP_RPL_MESSAGE_SERVICE_FATAL_ERROR: "A message sent through the Group Replication message deliver service was not delivered successfully. The server will now leave the group. Try to add the server back to the group and check if the problem persists, or check previous messages in the log for hints of what could be the problem."
```


Actions errors
==============
Each member action does specify how a error during the action is
handled:

  IGNORE: errors will be ignored;
    A error message will be logged:
```
Name: ER_GRP_RPL_MEMBER_ACTION_FAILURE_IGNORE
Input message: The member action "%s" for event "%s" with priority "%d" failed, this error is ignored as instructed. Please check previous messages in the error log for hints about what could have caused this failure.

Materialized message: 2020-10-01T11:29:31.296927Z 0 [Error] [MY-013732] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_ELECTION" with priority "1" failed, this error is ignored as instructed. Please check previous messages in the error log for hints about what could have caused this failure.'
```

  CRITICAL: member will move into ERROR state and
            --group_replication_exit_state_action option[1]
             will be followed.
    A error message will be logged:
```
Name: ER_GRP_RPL_MEMBER_ACTION_FAILURE
Input message: The member action "%s" for event "%s" with priority "%d" failed. Please check previous messages in the error log for hints about what could have caused this failure.

Materialized message: 2020-10-01T11:29:31.296927Z 0 [Error] [MY-013733] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_ELECTION" with priority "1" failed. Please check previous messages in the error log for hints about what could have caused this failure.'
```


[1] https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_exit_state_action
[2] https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_super_read_only
[3] https://developers.google.com/protocol-buffers
SUMMARY OF CHANGES
==================

- Add the system table `mysql.replication_group_member_actions`
  to persist the member actions configuration.
```
CREATE TABLE mysql.replication_group_member_actions (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The action name.',
  event CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action event.',
  enabled BOOLEAN NOT NULL COMMENT 'Whether the action is enabled.',
  type CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action type.',
  priority TINYINT UNSIGNED NOT NULL COMMENT 'The order on which the action will be run, value between 1 and 100, lower values first.',
  error_handling CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'On errors during the action will be handled: IGNORE, CRITICAL.',
PRIMARY KEY(name, event))
DEFAULT CHARSET=utf8mb4 STATS_PERSISTENT=0
COMMENT 'The member actions configuration.';
```

- Add the default content to `mysql.replication_group_member_actions`
  system table.
```
    name:           mysql_disable_super_read_only_if_primary
    enabled:        1
    type:           INTERNAL
    event:          AFTER_PRIMARY_ELECTION
    priority:       1
    error_handling: IGNORE
```

- Add the system table `mysql.replication_group_configuration_version`
  to persist the member actions configuration version.
```
CREATE TABLE mysql.replication_group_configuration_version (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The configuration name.',
  version BIGINT UNSIGNED NOT NULL COMMENT 'The version of configuration.',
PRIMARY KEY(name))
DEFAULT CHARSET=utf8mb4 STATS_PERSISTENT=0
COMMENT 'The group configuration version.';
```

- Add the performance schema table.
```
CREATE TABLE performance_schema.replication_group_member_actions (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The action name.',
  event CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action event.',
  enabled BOOLEAN NOT NULL COMMENT 'Whether the action is enabled.',
  type CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'The action type.',
  priority TINYINT UNSIGNED NOT NULL COMMENT 'The order on which the action will be run, value between 1 and 100, lower values first.',
  error_handling CHAR(64) CHARACTER SET ASCII NOT NULL COMMENT 'On errors during the action will be handled: IGNORE, CRITICAL.');
```

- Add the performance schema table.
```
CREATE TABLE performance_schema.replication_group_configuration_version (
  name CHAR(255) CHARACTER SET ASCII NOT NULL COMMENT 'The configuration name.',
  version BIGINT UNSIGNED NOT NULL COMMENT 'The version of configuration.');
```

- Add the UDFs
    group_replication_enable_member_action;
    group_replication_disable_member_action;
    group_replication_reset_member_actions.

- Introduce the `Member_actions_handler` to:
    1) handle the actions configuration persistence on
       `mysql.replication_group_member_actions` and
       `mysql.replication_group_configuration_version` tables;
    2) handle the actions encoding/decoding with Protocol Buffers
       for group propagation;
    3) handle the actions triggering during primary election.

- Actions configuration Protocol Buffers specification:
```
syntax = "proto2";

option optimize_for = LITE_RUNTIME;

package protobuf_replication_group_member_actions;

message Action {
  required string name = 1;
  required string event = 2;
  required bool enabled = 3;
  required string type = 4;
  required uint32 priority = 5;
  required string error_handling = 6;
}

message ActionList {
  required string origin = 1;
  required uint64 version = 2;
  required bool force_update = 3 [default = false];
  repeated Action action = 4;
}
```

- Refactor the state exchange during a member join to include the
  member actions configuration.

- Refactor the primary election algorithms to engage
  `Member_actions_handler` after a election takes place.