WL#10655: Global notification for GR membership changes

Affects: Server-8.0   —   Status: Complete

Motivation

Classic protocol was designed in a way that the client side starts the conversation (exception is authentication). Thus the client need to request something to get an information back (request based communication). This means that to detect a state changes the client needs to execute an query in intervals and compare the returned result with the older results.

The good example is the MySQL Router, which queries in intervals the group replication plugin state. One of queries that it execute is following:

SELECT member_id, member_host, member_port, member_state,
       @@group_replication_single_primary_mode
 FROM performance_schema.replication_group_members
 WHERE channel_name = 'group_replication_applier';

Thus pulling (like described above) has some disadvantages:

  • resources consumption CPU, network,
  • queried information is supplied to the system with given delay, which is the query interval.
  • clients code get more complicated

Goals

The purpose of this worklog is to supply to the client an information about group replication view changes without pulling the system by the client.

The goal can be described more accurately by following points:

  • give X Protocol possibility to broadcast an information to all interested clients without pulling it,
  • broadcast specific information about group replication state changes.

Group replication notices functional requirements

FR_S1
User must be able to enable/disable notices pointing that the view changed
FR_S2
User must be able to enable/disable notices pointing that the quorum was lost
FR_S3
Group replication notices must be delivered asynchronously when enabled
FR_S4
User must be able to observe number of GR event and events send to clients

Non Functional requirements

NF1
Implementation must not degrade X Plugin performance
NF2
subscription to an notice "topic" must not degrade X Plugin performance

Protocol

X Protocol global frames (global notices) were added to notify multiple clients that an event occur. The global notices are not bound to any flow and can be send any times (asynchronously). This matches what group replication events require, which must be delivered as fast as the server can.

Mysqlx.Notice.Frame

Frame message must be extended by GROUP_REPLICATION_STATE_CHANGED (new type), which will point that Group Replication events occur:

message Frame {
  enum Scope {
    GLOBAL = 1;
    LOCAL = 2;
  };
  enum Type {
    WARNING = 1;
    SESSION_VARIABLE_CHANGED = 2;
    SESSION_STATE_CHANGED = 3;
    GROUP_REPLICATION_STATE_CHANGED = 4;
  };
  required uint32 type = 1;
  optional Scope  scope = 2 [ default = GLOBAL ];
  optional bytes payload = 3;

  option (server_message_id) = NOTICE;
}

The payload of this frame type is going to be: serialized GroupReplicationStateChanged message. It must tell the user about group replication event type. The GR generates following events:

  • quorum loss (major number of members of GR were lost),
  • view changed (new view was installed).
  • role change (member role changed between: PRIMARY, SECONDARY),
  • state changed (member state changed: OFFLINE, ONLINE, RECOVERING, UNREACHABLE, ERROR),

which must be forwarded by X Plugin to client, which will make the client aware that something changed inside Group Replication setup. Those events must be send with current/newly installed view-id:

// Notify clients about group replication state changes
//
// ========================================== ==========
// :protobuf:msg:`Mysqlx.Notice::Frame` field value
// ========================================== ==========
// ``.type``                                  4
// ``.scope``                                 ``global``
// ========================================== ==========
//
// :param type: type of group replication event
// :param view_id: The view identifier
message GroupReplicationStateChanged {
  enum Type {
    MEMBERSHIP_QUORUM_LOSS = 1;
    MEMBERSHIP_VIEW_CHANGE = 2;
    MEMBER_ROLE_CHANGE = 3;
    MEMBER_STATE_CHANGE = 4;
  }
  required uint32 type = 1;

  optional string view_id = 2;
}

Admin commands

All four group replication events must be delivered to the client only in case when he explicitly requested it. X Plugin already implements two admin commands that control same behavior for other notices:

  • enable_notice
  • disable_notice

Arguments of both point to notice names which client asks to enabled/disabled. Following names should be defined for corresponding events IDs:

  • group_replication/membership/quorum_loss
  • group_replication/membership/view
  • group_replication/status/role_change
  • group_replication/status/state_change

To enable one of those, client must do following:

client -> server: Mysqlx.Sql.StmtExecute(
  ns:"admin"
  stmt:"enable_notice"
  args: {...array("group_replication/membership/view")...}
client <- server: Mysqlx.Sql.StmtExecuteOk()

To enable several of those, client must do following:

client -> server: Mysqlx.Sql.StmtExecute(
  ns:"admin"
  stmt:"enable_notice"
  args: {...array("group_replication/membership/view", 
                  "group_replication/membership/quorum_loss")...}
client <- server: Mysqlx.Sql.StmtExecuteOk()

Async messages

Global notices can be send by X Plugin at any time, with one exception: when the connection thread is block by execution of SQL.

In case when a group-replication event was delivered during execution of "SELECT SLEEP(10);", it will be postponed until the select/sleep ends.

Instrumentation

Status variables

  • Number of group replication notification handled by X Plugin
Property Value
Variable Name mysqlx_notified_by_group_replication
Type LONGLONG
Scope GLOBAL
Default 0

* Number of sent global notices - looking at previous status variable and on this one user can estimate an average number of clients which enabled group-replication notifications

Property Value
Variable Name mysqlx_notice_global_sent
Type LONGLONG
Scope GLOBAL/SESSION
Default 0

Networking

X Plugin uses VIO for networking, still it is not ready to handle asynchronous communication. In general this means that when the server waits for I/O, the operation can't be interrupted to delivery an asynchronous message.

There is a workaround, X Plugin can make a shorter read-io timeouts in which the code can check if there is a global notice ready to be send.

Timeouts shorter then mysqlx_read_timeout, mean that the server needs too keep track of summaric time and check if the total waiting time was not greated then mysqlx_read_timeout.