WL#10379: Group Replication: consistent reads

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
=================
Implement consistency guarantees on Group Replication, that is,
allow the user to configure globally or per transaction the
consistency provided by the group.

Four guarantees are provided:
- EVENTUAL (current behavior)
  A RO (read-only) or RW (read-write) transaction shall not wait
  for preceding transactions to be applied before executing.
  A RW transaction shall not wait for other members to apply a
  transaction.

- BEFORE
  A RW transaction shall wait for all preceding transactions to
  complete before execution takes place.
  A RO transaction shall wait for all preceding transactions to
  complete before execution takes place.

- AFTER
  A RW transaction will wait until its changes have been applied to
  on other members.
  No effect on RO transactions.

- BEFORE_AND_AFTER
  A RW transaction will wait for 1) all preceding transactions to
  complete before execution takes place and 2) until its changes have
  been applied on other members.
  A RO transaction shall wait for all preceding transactions to
  complete before execution takes place.


USER STORIES
============
- As a developer using MySQL I want specific transactions in my
  workload to always read up-to-date data from the group, so that
  whenever I update sensitive data (such as credentials for a file
  or similar data) I will enforce that reads shall read the most up
  to date value.

- As a developer using MySQL I want to load balance my reads without
  worrying about stale reads.

- As a developer using MySQL who has a predominantly read-only data,
  I want my RW transactions to be applied everywhere once they
  commit, so that subsequent reads are done on up-to-date data that
  includes my latest write and I do not pay the synchronization on
  every RO transaction, but only on RW ones.


SCOPE
=====
This feature is about providing consistency guarantees, it is not
intended to provide fencing mechanisms for member failures.
A non ONLINE member it is already fenced by Router, super_read_only
and group_replication_exit_state_action system variables.

This fits in the overall roadmap scheme of making a group provide
different distributed consistency guarantees to the application.
FUNCTIONAL REQUIREMENTS
=======================
FR-01: When group_replication_consistency=BEFORE, a transaction
       shall start its execution on the most up-to-date data.

FR-02: When group_replication_consistency=AFTER, after a RW
       transaction commits following transactions shall read a
       database state that includes its write, from any ONLINE
       member.

FR-03: When group_replication_consistency=BEFORE_AND_AFTER, a
       transaction shall read a database state that includes
       all previous changes by any preceding RW transaction.
       After a RW transaction commits it shall be possible to read
       a database state that includes its write, from any ONLINE
       member.

FR-04: The guarantees BEFORE, AFTER and BEFORE_AND_AFTER can only
       be used on ONLINE members. If they are used on other member
       states - except OFFLINE - the transaction will rollback.
       On the OFFLINE state or with the plugin not installed
       transactions are not intercepted by Group Replication.

FR-05: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
       a transaction will wait for an acknowledge from all ONLINE
       members to inform that the transaction was prepared before
       committing the transaction.

FR-06: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
       since the remote ONLINE members do acknowledge the
       transaction on prepare, the new transactions on those members
       shall be held until the preceding prepared are committed.

FR-07: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
       if a member leaves the group, either by STOP GROUP_REPLICATION
       or due to an error, the transaction will continue on the group
       without waiting for the leaving member consistency
       acknowledgement, even if the member that executed the
       transaction left.

FR-08: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
       if there are unreachable members but the group still has a
       reachable majority, the transaction will wait until that
       members are reachable or leave the group.

FR-09: If the group looses the majority and blocks, once the group is
       reestablished automatically or by the use of
       group_replication_force_members, the transaction will resume
       with the new membership, even if the member that executed the
       transaction left.

FR-10: The guarantees AFTER or BEFORE_AND_AFTER can only be used when
       all group members do support it, that is from 8.0.14. If the
       group contains a member from a previous version the
       transaction will rollback.

FR-11: The guarantee BEFORE can only be used on members that do
       support it, though the other group members do not need to
       support it, that is, can be from lower versions.

FR-12: If STOP GROUP_REPLICATION is executed or the plugin is
       stopped due to error, which implies that the member left the
       group, all ongoing consistent transactions are locally
       rollback, though they will continue on the group.


NON-FUNCTIONAL REQUIREMENTS
===========================
None
SUMMARY OF THE APPROACH
=======================
The user will be able to configure four guarantees:

- EVENTUAL
  The default guarantee, a transaction does commit as soon as a
  majority of the group members has the transaction data in memory,
  before committing it to the database.
  On this guarantee a client can write a transaction to tuple A on
  member 1, receive the commit acknowledge, read tuple A on member 2
  and receive an old value, since the transaction may not yet be
  committed.

- BEFORE
  Before a transaction starts it will wait until all writes are
  committed, that is: the current transaction will be globally
  ordered on the message stream, gets the global GTID_EXECUTED,
  waits until that GTID_EXECUTED it is committed on the local
  member and only then starts execution.
  This does ensure that a transaction always executes on the
  up-to-date data.

- AFTER
  Each member proceeds to commit a transaction only after it has
  collected acknowledgements from all ONLINE members that they are
  ready to commit it as well, that is, the transaction is prepared
  on all ONLINE members.
  After that, the client that executed the transaction receives the
  commit confirmation once the transaction it is committed locally,
  on the other members new transactions - independent of their
  guarantees - will be hold until the preceding prepared
  transactions are committed.
  This does ensure that once a client executes a transaction on a
  member it can read its write or following writes on any ONLINE
  member.
  AFTER guarantee used on one write transaction it is equivalent to
  use the BEFORE guarantee on *ALL* other transactions.

- BEFORE_AND_AFTER
  BEFORE and AFTER guarantees combined on the same transaction.
  This does ensure that a transaction always execute on the
  up-to-date data and after its commit its write or a following
  write will be read on any ONLINE member.


USER INTERFACE
==============
The user can specify the transaction consistency guarantee by
setting the system variable:
 - name: group_replication_consistency
 - values: { EVENTUAL, BEFORE, AFTER, BEFORE_AND_AFTER }
 - default: EVENTUAL
 - scope: session, global
 - dynamic: yes
 - replicated: no
 - persistable: PERSIST
 - credentials: session: none required
                global: SUPER/GROUP_REPLICATION_ADMIN
 - description: Transaction consistency guarantee


TRANSACTION ORDER
=================
Despite being possible to set the consistency per transaction, since
all transactions are totally ordered on the group, a
consistent transaction will also wait for all ongoing EVENTUAL
transactions that precede it.

Example E1:
 - Group with 3 members: M1, M2 and M3
 - on M1 a client executes:
   > SET SESSION group_replication_consistency= EVENTUAL;
   > BEGIN;
   > INSERT INTO t1 VALUES (1); # T1
   > COMMIT;
   > SET SESSION group_replication_consistency= BEFORE;
   > SELECT * FROM t1;          # T2
   > SET SESSION group_replication_consistency= AFTER;
   > BEGIN;
   > INSERT INTO t1 VALUES (2); # T3
   > COMMIT;
 - on M2, a client executes:
   > SET SESSION group_replication_consistency= EVENTUAL;
   > SELECT * FROM t1;          # T4

Notes on E1:
 - T1 is ordered before T3.
 - T2 shall wait for T1 to be applied on the local server before executing.
 - T3 shall commit and externalize after all ONLINE members have prepared.
 - T4 shall wait for both T1 and T3 changes to have been externalized,
   before executing.

The transaction order, which is decided by the communication layer,
does not depend nor it is changed by the consistency guarantee.


GUARANTEE CONTEXT
=================
The guarantee BEFORE, apart from being ordered on the transaction
stream, only has impact on the local member. That is, it does not
require coordination with the other members neither have
repercussions on their transactions.
In other words, BEFORE only does impact the member on which it is
executed.

The guarantee AFTER (and BEFORE_AND_AFTER) do have repercussions
on the other members transactions, it will make the other members
transactions to wait until the AFTER transaction is committed on
that member, even if the other members transactions have EVENTUAL
guarantee.
In other words, AFTER (and BEFORE_AND_AFTER) do impact all ONLINE
members.

Example E2:
 - Group with 3 members: M1, M2 and M3
 - on M1 a client executes:
   > SET SESSION group_replication_consistency= AFTER;
   > BEGIN;
   > INSERT INTO t1 VALUES (1); # T1
   > COMMIT;
 - on M2 a client executes:
   > SET SESSION group_replication_consistency= EVENTUAL;
   > SELECT * FROM t1; # T2
Despite T2 guarantee it is EVENTUAL, since T1 is AFTER, T2 will wait
until T1 it is committed before start its execution.


SECURITY CONTEXT
================
From a point of view of malicious attack to the group, since when
group_replication_consistency=AFTER or BEFORE_AND_AFTER a
transaction will wait for a acknowledge from all ONLINE members, a
UNREACHABLE member will block a transaction execution until that
member is reachable or leaves the group.

A malicious user can set group_replication_consistency=AFTER or
BEFORE_AND_AFTER on long lived transactions, which may block new
transactions while those long lived transactions are being applied.


UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION
===============================================
There are no repercussions on upgrade scenarios.

Obviously, after a downgrade to a version on which the consistency
guarantees are not implemented they cannot be used.

The guarantees AFTER or BEFORE_AND_AFTER can only be used when all
group members do support it, that is from 8.0.14. If the group
contains a member from a previous version the transaction will
rollback.

The guarantee BEFORE can only be used on members that do support it,
though the other group members do not need to support it, that is,
can be from lower versions.


OBSERVABILITY
=============
When group_replication_consistency=BEFORE, while the transaction it
is waiting for the up-to-date data to be committed the session state
will be "Executing hook on transaction begin" on
INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do
show the session state.

When group_replication_consistency=AFTER or BEFORE_AND_AFTER, on the
other members while the prepared transactions are being committed,
the new transactions that are on hold until that commits do happen
will have the session state "Executing hook on transaction begin" on
INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do
show the session state.


DEPLOYMENT AND INSTALLATION
===========================
There are no repercussions.


PROTOCOL
========
There are no repercussions.


FAILURE MODEL SPECIFICATION
===========================
There are no repercussions, how this feature handles the existing
failures it is expressed on the requirements and on the summary of
the approach.
SUMMARY OF CHANGES
==================

Server core changes
-------------------
- Call after_commit hook on XA PREPARE and XA ROLLBACK since these
  commands do log a transaction to the binary log.

Group Replication changes
-------------------------
- The transactions hold mechanism on the AFTER and BEFORE_AND_AFTER
  guarantees is the "Group Replication: hold reads and writes when 
  the new primary has replication backlog to apply".

- Extend the system variable group_replication_consistency with
  three more values: BEFORE, AFTER, BEFORE_AND_AFTER.

- Introduce a consistency manager to handle the new guarantees.

- Introduce two new message types: CT_TRANSACTION_PREPARED_MESSAGE
  and CT_TRANSACTION_WITH_GUARANTEE_MESSAGE.

- Refactor the inversion of control of certification to also include
  the consistency handling.


GUARANTEES WORKFLOW
===================

EVENTUAL
--------
Please see page 1 on consistent-reads-workflow.pdf

Algorithm:
  1) transaction T1 starts on member M1;
  2) it is executed up to the commit point, on that point the
     transaction data is sent to all group members, including the
     one on which the transaction was executed (M1);
  3) on transaction deliver, every member will check for conflicts:
     3.1) if there is a conflict the transaction is rolled back;
     3.2) otherwise, the transaction is committed on the member
          that did execute the transaction (M1), on the other
          members the transaction is queued to execution and
          commit.
  4) transaction T2 starts on member M3 before M3 did receive T1
     transaction data, T2 will execute before T1 is executed on M3,
     which will make T2 read not up-to-date data.

BEFORE
------
Please see page 2 on consistent-reads-workflow.pdf

Algorithm:
  1) transaction T1, with EVENTUAL consistency, starts on member M1;
  2) it is executed up to the commit point, on that point the
     transaction data is sent to all group members, including the
     one on which the transaction was executed (M1);
  3) on transaction deliver, every member will check for conflicts:
     3.1) if there is a conflict the transaction is rollback;
     3.2) otherwise, the transaction is committed on the member
          that did execute the transaction (M1), on the other
          members the transaction is queued to execution and
          commit.
  4) Transaction T2, with BEFORE consistency, starts on member M3.
     Before the transaction execution, T2 will send a message to
     all members. That message will provide T2 global order before
     execution (1st message round on the workflow);
  5) When that message is received and processed in-order, w.r.t.
     the message stream, on M3, M3 will fetch the Group Replication
     applier RECEIVED_TRANSACTION_SET, the set of remote
     transactions that were allowed to commit, independently of
     being already committed or not. This set gives us the remote
     transactions that do exist before this point. We only need to
     track remote transactions since the server already ensures
     consistency for local transactions.
     Other members will ignore this message, the message is sent
     to all to provide the global order.
  6) Transaction T2 on M3 will wait until all the transactions on
     Group Replication applier RECEIVED_TRANSACTION_SET are
     committed, only after that its execution will start. This does
     ensure that T2 will never read past data relatively to its
     global order, which in this example is: T1, T2.
     This wait only takes place on the server that executes the
     transaction with BEFORE consistency, on this case T2@M3. All
     others members are not affected by this wait.
  7) Once the transaction T2 execution starts, the next steps are
     the ones described on 2) and 3).

AFTER
-----
Please see page 3 on consistent-reads-workflow.pdf

Algorithm:
  1) transaction T1, with AFTER consistency, starts on member M1;
  2) it is executed up to the commit point, on that point the
     transaction data is sent to all group members, including the
     one on which the transaction was executed (M1);
  3) on transaction deliver, every member will check for conflicts:
     3.1) if there is a conflict the transaction is rollback;
     3.2) otherwise, it goes to step 4).
  4) On the other members the transaction is queued to execution.
     Once the transaction is prepared, it will send a acknowledge
     to all members.
  5) Once all members do receive acknowledges from all members - M1
     is acknowledge implicitly since it did already prepare the
     transaction - they all proceed to transaction commit.
  6) Transaction T2, with EVENTUAL consistency, starts on member M3.
     Since T1 is still being committed, T2 execution will be hold
     until T1 commit is completed. This will ensure that any
     transaction after T1 will read T1 data.
  7) Once the transaction T2 execution starts, the next steps are
     the ones described on EVENTUAL consistency 2) and 3).

BEFORE_AND_AFTER
----------------
It is BEFORE and AFTER workflows on the same transaction.