WL#10379: Group Replication: consistent reads
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= Implement consistency guarantees on Group Replication, that is, allow the user to configure globally or per transaction the consistency provided by the group. Four guarantees are provided: - EVENTUAL (current behavior) A RO (read-only) or RW (read-write) transaction shall not wait for preceding transactions to be applied before executing. A RW transaction shall not wait for other members to apply a transaction. - BEFORE A RW transaction shall wait for all preceding transactions to complete before execution takes place. A RO transaction shall wait for all preceding transactions to complete before execution takes place. - AFTER A RW transaction will wait until its changes have been applied to on other members. No effect on RO transactions. - BEFORE_AND_AFTER A RW transaction will wait for 1) all preceding transactions to complete before execution takes place and 2) until its changes have been applied on other members. A RO transaction shall wait for all preceding transactions to complete before execution takes place. USER STORIES ============ - As a developer using MySQL I want specific transactions in my workload to always read up-to-date data from the group, so that whenever I update sensitive data (such as credentials for a file or similar data) I will enforce that reads shall read the most up to date value. - As a developer using MySQL I want to load balance my reads without worrying about stale reads. - As a developer using MySQL who has a predominantly read-only data, I want my RW transactions to be applied everywhere once they commit, so that subsequent reads are done on up-to-date data that includes my latest write and I do not pay the synchronization on every RO transaction, but only on RW ones. SCOPE ===== This feature is about providing consistency guarantees, it is not intended to provide fencing mechanisms for member failures. A non ONLINE member it is already fenced by Router, super_read_only and group_replication_exit_state_action system variables. This fits in the overall roadmap scheme of making a group provide different distributed consistency guarantees to the application.
FUNCTIONAL REQUIREMENTS
=======================
FR-01: When group_replication_consistency=BEFORE, a transaction
shall start its execution on the most up-to-date data.
FR-02: When group_replication_consistency=AFTER, after a RW
transaction commits following transactions shall read a
database state that includes its write, from any ONLINE
member.
FR-03: When group_replication_consistency=BEFORE_AND_AFTER, a
transaction shall read a database state that includes
all previous changes by any preceding RW transaction.
After a RW transaction commits it shall be possible to read
a database state that includes its write, from any ONLINE
member.
FR-04: The guarantees BEFORE, AFTER and BEFORE_AND_AFTER can only
be used on ONLINE members. If they are used on other member
states - except OFFLINE - the transaction will rollback.
On the OFFLINE state or with the plugin not installed
transactions are not intercepted by Group Replication.
FR-05: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
a transaction will wait for an acknowledge from all ONLINE
members to inform that the transaction was prepared before
committing the transaction.
FR-06: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
since the remote ONLINE members do acknowledge the
transaction on prepare, the new transactions on those members
shall be held until the preceding prepared are committed.
FR-07: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
if a member leaves the group, either by STOP GROUP_REPLICATION
or due to an error, the transaction will continue on the group
without waiting for the leaving member consistency
acknowledgement, even if the member that executed the
transaction left.
FR-08: When group_replication_consistency=AFTER or BEFORE_AND_AFTER,
if there are unreachable members but the group still has a
reachable majority, the transaction will wait until that
members are reachable or leave the group.
FR-09: If the group looses the majority and blocks, once the group is
reestablished automatically or by the use of
group_replication_force_members, the transaction will resume
with the new membership, even if the member that executed the
transaction left.
FR-10: The guarantees AFTER or BEFORE_AND_AFTER can only be used when
all group members do support it, that is from 8.0.14. If the
group contains a member from a previous version the
transaction will rollback.
FR-11: The guarantee BEFORE can only be used on members that do
support it, though the other group members do not need to
support it, that is, can be from lower versions.
FR-12: If STOP GROUP_REPLICATION is executed or the plugin is
stopped due to error, which implies that the member left the
group, all ongoing consistent transactions are locally
rollback, though they will continue on the group.
NON-FUNCTIONAL REQUIREMENTS
===========================
None
SUMMARY OF THE APPROACH
=======================
The user will be able to configure four guarantees:
- EVENTUAL
The default guarantee, a transaction does commit as soon as a
majority of the group members has the transaction data in memory,
before committing it to the database.
On this guarantee a client can write a transaction to tuple A on
member 1, receive the commit acknowledge, read tuple A on member 2
and receive an old value, since the transaction may not yet be
committed.
- BEFORE
Before a transaction starts it will wait until all writes are
committed, that is: the current transaction will be globally
ordered on the message stream, gets the global GTID_EXECUTED,
waits until that GTID_EXECUTED it is committed on the local
member and only then starts execution.
This does ensure that a transaction always executes on the
up-to-date data.
- AFTER
Each member proceeds to commit a transaction only after it has
collected acknowledgements from all ONLINE members that they are
ready to commit it as well, that is, the transaction is prepared
on all ONLINE members.
After that, the client that executed the transaction receives the
commit confirmation once the transaction it is committed locally,
on the other members new transactions - independent of their
guarantees - will be hold until the preceding prepared
transactions are committed.
This does ensure that once a client executes a transaction on a
member it can read its write or following writes on any ONLINE
member.
AFTER guarantee used on one write transaction it is equivalent to
use the BEFORE guarantee on *ALL* other transactions.
- BEFORE_AND_AFTER
BEFORE and AFTER guarantees combined on the same transaction.
This does ensure that a transaction always execute on the
up-to-date data and after its commit its write or a following
write will be read on any ONLINE member.
USER INTERFACE
==============
The user can specify the transaction consistency guarantee by
setting the system variable:
- name: group_replication_consistency
- values: { EVENTUAL, BEFORE, AFTER, BEFORE_AND_AFTER }
- default: EVENTUAL
- scope: session, global
- dynamic: yes
- replicated: no
- persistable: PERSIST
- credentials: session: none required
global: SUPER/GROUP_REPLICATION_ADMIN
- description: Transaction consistency guarantee
TRANSACTION ORDER
=================
Despite being possible to set the consistency per transaction, since
all transactions are totally ordered on the group, a
consistent transaction will also wait for all ongoing EVENTUAL
transactions that precede it.
Example E1:
- Group with 3 members: M1, M2 and M3
- on M1 a client executes:
> SET SESSION group_replication_consistency= EVENTUAL;
> BEGIN;
> INSERT INTO t1 VALUES (1); # T1
> COMMIT;
> SET SESSION group_replication_consistency= BEFORE;
> SELECT * FROM t1; # T2
> SET SESSION group_replication_consistency= AFTER;
> BEGIN;
> INSERT INTO t1 VALUES (2); # T3
> COMMIT;
- on M2, a client executes:
> SET SESSION group_replication_consistency= EVENTUAL;
> SELECT * FROM t1; # T4
Notes on E1:
- T1 is ordered before T3.
- T2 shall wait for T1 to be applied on the local server before executing.
- T3 shall commit and externalize after all ONLINE members have prepared.
- T4 shall wait for both T1 and T3 changes to have been externalized,
before executing.
The transaction order, which is decided by the communication layer,
does not depend nor it is changed by the consistency guarantee.
GUARANTEE CONTEXT
=================
The guarantee BEFORE, apart from being ordered on the transaction
stream, only has impact on the local member. That is, it does not
require coordination with the other members neither have
repercussions on their transactions.
In other words, BEFORE only does impact the member on which it is
executed.
The guarantee AFTER (and BEFORE_AND_AFTER) do have repercussions
on the other members transactions, it will make the other members
transactions to wait until the AFTER transaction is committed on
that member, even if the other members transactions have EVENTUAL
guarantee.
In other words, AFTER (and BEFORE_AND_AFTER) do impact all ONLINE
members.
Example E2:
- Group with 3 members: M1, M2 and M3
- on M1 a client executes:
> SET SESSION group_replication_consistency= AFTER;
> BEGIN;
> INSERT INTO t1 VALUES (1); # T1
> COMMIT;
- on M2 a client executes:
> SET SESSION group_replication_consistency= EVENTUAL;
> SELECT * FROM t1; # T2
Despite T2 guarantee it is EVENTUAL, since T1 is AFTER, T2 will wait
until T1 it is committed before start its execution.
SECURITY CONTEXT
================
From a point of view of malicious attack to the group, since when
group_replication_consistency=AFTER or BEFORE_AND_AFTER a
transaction will wait for a acknowledge from all ONLINE members, a
UNREACHABLE member will block a transaction execution until that
member is reachable or leaves the group.
A malicious user can set group_replication_consistency=AFTER or
BEFORE_AND_AFTER on long lived transactions, which may block new
transactions while those long lived transactions are being applied.
UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION
===============================================
There are no repercussions on upgrade scenarios.
Obviously, after a downgrade to a version on which the consistency
guarantees are not implemented they cannot be used.
The guarantees AFTER or BEFORE_AND_AFTER can only be used when all
group members do support it, that is from 8.0.14. If the group
contains a member from a previous version the transaction will
rollback.
The guarantee BEFORE can only be used on members that do support it,
though the other group members do not need to support it, that is,
can be from lower versions.
OBSERVABILITY
=============
When group_replication_consistency=BEFORE, while the transaction it
is waiting for the up-to-date data to be committed the session state
will be "Executing hook on transaction begin" on
INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do
show the session state.
When group_replication_consistency=AFTER or BEFORE_AND_AFTER, on the
other members while the prepared transactions are being committed,
the new transactions that are on hold until that commits do happen
will have the session state "Executing hook on transaction begin" on
INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do
show the session state.
DEPLOYMENT AND INSTALLATION
===========================
There are no repercussions.
PROTOCOL
========
There are no repercussions.
FAILURE MODEL SPECIFICATION
===========================
There are no repercussions, how this feature handles the existing
failures it is expressed on the requirements and on the summary of
the approach.
SUMMARY OF CHANGES
==================
Server core changes
-------------------
- Call after_commit hook on XA PREPARE and XA ROLLBACK since these
commands do log a transaction to the binary log.
Group Replication changes
-------------------------
- The transactions hold mechanism on the AFTER and BEFORE_AND_AFTER
guarantees is the "Group Replication: hold reads and writes when
the new primary has replication backlog to apply".
- Extend the system variable group_replication_consistency with
three more values: BEFORE, AFTER, BEFORE_AND_AFTER.
- Introduce a consistency manager to handle the new guarantees.
- Introduce two new message types: CT_TRANSACTION_PREPARED_MESSAGE
and CT_TRANSACTION_WITH_GUARANTEE_MESSAGE.
- Refactor the inversion of control of certification to also include
the consistency handling.
GUARANTEES WORKFLOW
===================
EVENTUAL
--------
Algorithm:
1) transaction T1 starts on member M1;
2) it is executed up to the commit point, on that point the
transaction data is sent to all group members, including the
one on which the transaction was executed (M1);
3) on transaction deliver, every member will check for conflicts:
3.1) if there is a conflict the transaction is rolled back;
3.2) otherwise, the transaction is committed on the member
that did execute the transaction (M1), on the other
members the transaction is queued to execution and
commit.
4) transaction T2 starts on member M3 before M3 did receive T1
transaction data, T2 will execute before T1 is executed on M3,
which will make T2 read not up-to-date data.
BEFORE
------
Algorithm:
1) transaction T1, with EVENTUAL consistency, starts on member M1;
2) it is executed up to the commit point, on that point the
transaction data is sent to all group members, including the
one on which the transaction was executed (M1);
3) on transaction deliver, every member will check for conflicts:
3.1) if there is a conflict the transaction is rollback;
3.2) otherwise, the transaction is committed on the member
that did execute the transaction (M1), on the other
members the transaction is queued to execution and
commit.
4) Transaction T2, with BEFORE consistency, starts on member M3.
Before the transaction execution, T2 will send a message to
all members. That message will provide T2 global order before
execution (1st message round on the workflow);
5) When that message is received and processed in-order, w.r.t.
the message stream, on M3, M3 will fetch the Group Replication
applier RECEIVED_TRANSACTION_SET, the set of remote
transactions that were allowed to commit, independently of
being already committed or not. This set gives us the remote
transactions that do exist before this point. We only need to
track remote transactions since the server already ensures
consistency for local transactions.
Other members will ignore this message, the message is sent
to all to provide the global order.
6) Transaction T2 on M3 will wait until all the transactions on
Group Replication applier RECEIVED_TRANSACTION_SET are
committed, only after that its execution will start. This does
ensure that T2 will never read past data relatively to its
global order, which in this example is: T1, T2.
This wait only takes place on the server that executes the
transaction with BEFORE consistency, on this case T2@M3. All
others members are not affected by this wait.
7) Once the transaction T2 execution starts, the next steps are
the ones described on 2) and 3).
AFTER
-----
Algorithm:
1) transaction T1, with AFTER consistency, starts on member M1;
2) it is executed up to the commit point, on that point the
transaction data is sent to all group members, including the
one on which the transaction was executed (M1);
3) on transaction deliver, every member will check for conflicts:
3.1) if there is a conflict the transaction is rollback;
3.2) otherwise, it goes to step 4).
4) On the other members the transaction is queued to execution.
Once the transaction is prepared, it will send a acknowledge
to all members.
5) Once all members do receive acknowledges from all members - M1
is acknowledge implicitly since it did already prepare the
transaction - they all proceed to transaction commit.
6) Transaction T2, with EVENTUAL consistency, starts on member M3.
Since T1 is still being committed, T2 execution will be hold
until T1 commit is completed. This will ensure that any
transaction after T1 will read T1 data.
7) Once the transaction T2 execution starts, the next steps are
the ones described on EVENTUAL consistency 2) and 3).
BEFORE_AND_AFTER
----------------
It is BEFORE and AFTER workflows on the same transaction.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.