WL#8793: Group Replication: Integration with libmysql-gcs and XCom

Affects: Server-5.7   —   Status: Complete

MySQL Group Replication provides multi-master update everywhere
replication to MySQL.
Until now the group communication toolkit used was Corosync, with which Group
Replication interacts through a API binding implemented inside Group Replication.
That API binding did evolve to a independent component, that interacts with
Corosync and XCom.
This worklog will implement the needed changes on Group Replication plugin to
use the new API binding and also support XCom.
FR1: Group Replication must preserve the user interface for the
build procedure.

FR2: Group Replication must not have known regressions.

FR3. Group Replication must build on all platforms that MySQL GCS
builds.

FR4. When setting GCS kernel to be Corosync and Corosync binding is
not available then START GROUP_REPLICATION shall fail gracefully.

NFR1. Performance with MySQL GCS will be comparable to GCS with
Corosync.
OVERVIEW
========

MySQL Group Replication (GR) provides multi-master update everywhere
replication to MySQL.
Until now the group communication toolkit used was Corosync, with
which Group Replication interacts through a API binding implemented
inside Group Replication. That API binding did evolve to a
independent component, that interacts with Corosync and XCom.
This worklog will implement the needed changes on Group Replication
plugin to use the new API binding and also support XCom.


PLUGIN OPTIONS
==============

To support XCom communication toolkit, new options were introduced
in order to provide its configuration. Those options are:

group_replication_local_address
-------------------------------
TYPE: string
DYNAMIC: yes (only read when GR starts)
CONTEXT: global and/or configuration file
DESCRIPTION: The member local address, i.e., host:port that is
passed to MySQL GCS.
DEFAULT: empty string
EXAMPLE:

group_replication_peer_addresses
--------------------------------
TYPE: string
DYNAMIC: yes (only read when GR starts)
CONTEXT: global and/or configuration file
DESCRIPTION: The list of peers, comma separated. E.g.,
  host1:port1,host2:port2, that also belong to the group.
  The server will contact one of the peers in the group to
  add it if this is not the server that will bootstrap the
  group.
  This list may contain the member local address, it will be
  ignored.
DEFAULT: empty string

group_replication_bootstrap_group
---------------------------------
TYPE: boolean
DYNAMIC: yes (only read when GR starts)
CONTEXT: global and/or configuration file
DESCRIPTION: If set to true, the server will bootstrap the group.
group_replication_peer_addresses has no effect if this
option is set to true.
DEFAULT: False

group_replication_gcs_engine
----------------------------
TYPE: string
DYNAMIC: yes (only read when GR starts and stops)
CONTEXT: global and/or configuration file
DESCRIPTION: Specifies the communication toolkit to be used.
  Currently must be one of: XCom, Corosync.
DEFAULT: XCom

Note: this is not a new variable but a rename from
group_replication_gcs_protocol to group_replication_gcs_engine.

Two servers configuration example
.................................
server 1:
> SET GLOBAL group_replication_group_name= "UUID";
> SET GLOBAL group_replication_local_address="192.168.0.1:10001";
> SET GLOBAL group_replication_peer_addresses="192.168.0.1:10001,192.168.0.2:10002";
> SET GLOBAL group_replication_bootstrap_group= 1;

server 2:
> SET GLOBAL group_replication_group_name= "UUID";
> SET GLOBAL group_replication_local_address="192.168.0.2:10002";
> SET GLOBAL group_replication_peer_addresses="192.168.0.1:10001,192.168.0.2:10002";

The group member join order must be:
 1) server 1
 2) server 2
After server 1 bootstraps the group, its
group_replication_bootstrap_group must be set to 0, in order
to it be able to leave and rejoin the *same* group instead of
starting a new one.


MySQL GCS source
================

MySQL GCS source will be available at
internal/remotes/mysql-gcs folder, and a snapshot of it containing
only the interface and C++ bindings will be copied to
src/plugin/group_replication/gcs/ folder.

This will replace the currently used MySQL GCS source that
previoulsy was an internal implementation.

MySQL GCS source will be build embedded with the plugin without any
user interface change.

By default, Corosync binding is not built. User can force Corosync
binding build by specifying WITH_COROSYNC option on cmake. Example:

$ mkdir BIN
$ cd BIN
$ cmake .. -DWITH_MYSQL_SERVER_SOURCE_DIR="SERVER_SOURCE_PATH" \
           -DWITH_MYSQL_SERVER_CMAKE_ARGS="-DWITH_COROSYNC=ON"
$ make
SUMMARY OF CHANGES
==================

Plugin code
-----------

1. Add the new options to plugin.cc:
   group_replication_local_address
   group_replication_peer_addresses
   group_replication_bootstrap_group

2. Renamed options:
   group_replication_gcs_protocol to group_replication_gcs_engine

3. Adjust gcs_event_handlers.cc to the new MySQL GCS interface.


MySQL GCS code
--------------
The full MySQL GCS source will be available at
internal/remotes/mysql-gcs folder, and a snapshot of it containing
only the interface and C++ bindings will be copied to
src/plugin/group_replication/gcs/ folder.

MySQL GCS source will be build embedded with the plugin without any
build procedure change.

By default, Corosync binding is not build. User can force Corosync
binding build by specifying WITH_COROSYNC option on cmake. Example:


MTR infrastructure
------------------
To support the XCom configuration new MTR include files will be
added:

  group_replication.inc
  .....................
    Setup group replication on the current test.

  group_replication_end.inc
  .........................
    Shutdown/clean group replication on the current test.

  start_and_bootstrap_group_replication.inc
  .........................................
    Does the 3 steps:
      1) Set group_replication_bootstrap_group option to True;
      2) Start Group Replication;
      3) Reset group_replication_bootstrap_group option on all
         servers.

  group_replication_reset_bootstrap_group.inc
  ...........................................
    Reset group_replication_bootstrap_group option on all
    servers.

  group_replication_reset_configuration.inc
  .........................................
    Reset all XCom configuration on all servers.

  group_replication_set_bootstrap_group.inc
  .........................................
    Set group_replication_bootstrap_group option on current
    server.

  group_replication_configuration.inc
  .........................................
    Configures XCom on all members.
    As bootstrap server it will choose the server with greater
    server_id. Servers will join to group from the server with
    greater server_id to the on with server_id 1.
    XCom local port is assigned according to the rule:
      port= (MySQL_server_port - 3000)
    So a two member group will have the following XCom configuration
    (when MySQL server port is 13000):
      server1: 127.0.0.1:10000
      server2: 127.0.0.1:10001
      group_replication_peer_addresses:
        127.0.0.1:10001,127.0.0.1:10000

  have_corosync.inc
  .................
    Include that only allows to run if Corosync it is the loaded
    communication toolkit.

  have_group_replication_plugin.inc
  .................................
    Checks that GR plugin is loaded and setups XCom by executing
    group_replication_configuration.inc

  kill_and_restart_mysqld.inc
  ...........................
    Kill and restart mysqld without echoing the server parameters.


Important changes to developers
-------------------------------

Regular scenario
................
When servers are setup with group_replication.inc, everything is
done automatically, that is, a group with the specified number of
servers (2 by default) is started and the server with greatest
server_id is the one that bootstraped the group.

Example:
  --source ../inc/have_group_replication_plugin.inc
  --source ../inc/group_replication.inc

  DO SOMETHING

  --source ../inc/group_replication_end.inc

To specify the number of servers on the group, please use the option
$rpl_server_count, and create a .cnf file with the number of
required servers. Example:
  --source ../inc/have_group_replication_plugin.inc
  --let $rpl_server_count= 4
  --source ../inc/group_replication.inc


Custom server join order
........................
If developers want to specify the group join order, then must do the
following:
  --source ../inc/have_group_replication_plugin.inc
  --let $rpl_skip_group_replication_start= 1
  --source ../inc/group_replication.inc

  --connection server1
  --source ../inc/start_and_bootstrap_group_replication.inc

  --connection server2
  --source include/start_group_replication.inc

  DO SOMETHING

  --source ../inc/group_replication_end.inc


Custom group name
.................
If developers want to specify the group name, it must be done before
including have_group_replication_plugin.inc, since the group name is
required for the XCom configuration.
  --let $group_replication_group_name= UUID
  --source ../inc/have_group_replication_plugin.inc
  --source ../inc/group_replication.inc

  DO SOMETHING

  --source ../inc/group_replication_end.inc


Start GR on server start
........................
If developers want to use GR group_replication_start_on_boot option,
a server restart must be done.
We cannot use group_replication_start_on_boot directly on opt file
since at that point we don't have yet XCom configuration. XCom
configuration does depend on server port value which is only
available, in a scriptable way, after server start.
So to use the group_replication_start_on_boot option the following
steps must be done:
  --source include/force_restart.inc
  --let $rpl_skip_group_replication_start= 1
  --source ../inc/group_replication.inc

  --connection server1
  --let $_group_replication_local_address= `SELECT
@@GLOBAL.group_replication_local_address`
  --let $_group_replication_peer_addresses= `SELECT
@@GLOBAL.group_replication_peer_addresses`
  --let
$restart_parameters=restart:--group_replication_local_address=$_group_replication_local_address
--group_replication_peer_addresses=$_group_replication_peer_addresses
--group_replication_start_on_boot=1
  --replace_result $_group_replication_local_address
GROUP_REPLICATION_LOCAL_ADDRESS $_group_replication_peer_addresses
GROUP_REPLICATION_PEER_ADDRESSES
  --source include/restart_mysqld.inc

  DO SOMETHING

  --source ../inc/group_replication_end.inc

If the group is being bootstrapped, option
--group_replication_bootstrap_group=1 must be included on
$restart_parameters.


Uninstall/install plugin
........................
XCom configuration is set using SET GLOBAL commands, which values 
are not persisted, so when we uninstall and install plugin all 
plugin variables set by SET GLOBAL commands are reset to their 
default value.
To overcome this on MTR we need to do following steps:
  SET @_group_replication_local_address= @@GLOBAL.group_replication_local_address`
  SET @_group_replication_peer_addresses= @@GLOBAL.group_replication_peer_addresses`
  UNINSTALL PLUGIN group_replication;

  --eval INSTALL PLUGIN group_replication SONAME '$GROUP_REPLICATION'
  SET @@GLOBAL.group_replication_local_address= @_group_replication_local_address;
  SET @@GLOBAL.group_replication_peer_addresses= @_group_replication_peer_addresses;
  --eval SET GLOBAL group_replication_group_name= '$group_replication_group_name'