WL#10379: Group Replication: consistent reads
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= Implement consistency guarantees on Group Replication, that is, allow the user to configure globally or per transaction the consistency provided by the group. Four guarantees are provided: - EVENTUAL (current behavior) A RO (read-only) or RW (read-write) transaction shall not wait for preceding transactions to be applied before executing. A RW transaction shall not wait for other members to apply a transaction. - BEFORE A RW transaction shall wait for all preceding transactions to complete before execution takes place. A RO transaction shall wait for all preceding transactions to complete before execution takes place. - AFTER A RW transaction will wait until its changes have been applied to on other members. No effect on RO transactions. - BEFORE_AND_AFTER A RW transaction will wait for 1) all preceding transactions to complete before execution takes place and 2) until its changes have been applied on other members. A RO transaction shall wait for all preceding transactions to complete before execution takes place. USER STORIES ============ - As a developer using MySQL I want specific transactions in my workload to always read up-to-date data from the group, so that whenever I update sensitive data (such as credentials for a file or similar data) I will enforce that reads shall read the most up to date value. - As a developer using MySQL I want to load balance my reads without worrying about stale reads. - As a developer using MySQL who has a predominantly read-only data, I want my RW transactions to be applied everywhere once they commit, so that subsequent reads are done on up-to-date data that includes my latest write and I do not pay the synchronization on every RO transaction, but only on RW ones. SCOPE ===== This feature is about providing consistency guarantees, it is not intended to provide fencing mechanisms for member failures. A non ONLINE member it is already fenced by Router, super_read_only and group_replication_exit_state_action system variables. This fits in the overall roadmap scheme of making a group provide different distributed consistency guarantees to the application.
FUNCTIONAL REQUIREMENTS ======================= FR-01: When group_replication_consistency=BEFORE, a transaction shall start its execution on the most up-to-date data. FR-02: When group_replication_consistency=AFTER, after a RW transaction commits following transactions shall read a database state that includes its write, from any ONLINE member. FR-03: When group_replication_consistency=BEFORE_AND_AFTER, a transaction shall read a database state that includes all previous changes by any preceding RW transaction. After a RW transaction commits it shall be possible to read a database state that includes its write, from any ONLINE member. FR-04: The guarantees BEFORE, AFTER and BEFORE_AND_AFTER can only be used on ONLINE members. If they are used on other member states - except OFFLINE - the transaction will rollback. On the OFFLINE state or with the plugin not installed transactions are not intercepted by Group Replication. FR-05: When group_replication_consistency=AFTER or BEFORE_AND_AFTER, a transaction will wait for an acknowledge from all ONLINE members to inform that the transaction was prepared before committing the transaction. FR-06: When group_replication_consistency=AFTER or BEFORE_AND_AFTER, since the remote ONLINE members do acknowledge the transaction on prepare, the new transactions on those members shall be held until the preceding prepared are committed. FR-07: When group_replication_consistency=AFTER or BEFORE_AND_AFTER, if a member leaves the group, either by STOP GROUP_REPLICATION or due to an error, the transaction will continue on the group without waiting for the leaving member consistency acknowledgement, even if the member that executed the transaction left. FR-08: When group_replication_consistency=AFTER or BEFORE_AND_AFTER, if there are unreachable members but the group still has a reachable majority, the transaction will wait until that members are reachable or leave the group. FR-09: If the group looses the majority and blocks, once the group is reestablished automatically or by the use of group_replication_force_members, the transaction will resume with the new membership, even if the member that executed the transaction left. FR-10: The guarantees AFTER or BEFORE_AND_AFTER can only be used when all group members do support it, that is from 8.0.14. If the group contains a member from a previous version the transaction will rollback. FR-11: The guarantee BEFORE can only be used on members that do support it, though the other group members do not need to support it, that is, can be from lower versions. FR-12: If STOP GROUP_REPLICATION is executed or the plugin is stopped due to error, which implies that the member left the group, all ongoing consistent transactions are locally rollback, though they will continue on the group. NON-FUNCTIONAL REQUIREMENTS =========================== None
SUMMARY OF THE APPROACH ======================= The user will be able to configure four guarantees: - EVENTUAL The default guarantee, a transaction does commit as soon as a majority of the group members has the transaction data in memory, before committing it to the database. On this guarantee a client can write a transaction to tuple A on member 1, receive the commit acknowledge, read tuple A on member 2 and receive an old value, since the transaction may not yet be committed. - BEFORE Before a transaction starts it will wait until all writes are committed, that is: the current transaction will be globally ordered on the message stream, gets the global GTID_EXECUTED, waits until that GTID_EXECUTED it is committed on the local member and only then starts execution. This does ensure that a transaction always executes on the up-to-date data. - AFTER Each member proceeds to commit a transaction only after it has collected acknowledgements from all ONLINE members that they are ready to commit it as well, that is, the transaction is prepared on all ONLINE members. After that, the client that executed the transaction receives the commit confirmation once the transaction it is committed locally, on the other members new transactions - independent of their guarantees - will be hold until the preceding prepared transactions are committed. This does ensure that once a client executes a transaction on a member it can read its write or following writes on any ONLINE member. AFTER guarantee used on one write transaction it is equivalent to use the BEFORE guarantee on *ALL* other transactions. - BEFORE_AND_AFTER BEFORE and AFTER guarantees combined on the same transaction. This does ensure that a transaction always execute on the up-to-date data and after its commit its write or a following write will be read on any ONLINE member. USER INTERFACE ============== The user can specify the transaction consistency guarantee by setting the system variable: - name: group_replication_consistency - values: { EVENTUAL, BEFORE, AFTER, BEFORE_AND_AFTER } - default: EVENTUAL - scope: session, global - dynamic: yes - replicated: no - persistable: PERSIST - credentials: session: none required global: SUPER/GROUP_REPLICATION_ADMIN - description: Transaction consistency guarantee TRANSACTION ORDER ================= Despite being possible to set the consistency per transaction, since all transactions are totally ordered on the group, a consistent transaction will also wait for all ongoing EVENTUAL transactions that precede it. Example E1: - Group with 3 members: M1, M2 and M3 - on M1 a client executes: > SET SESSION group_replication_consistency= EVENTUAL; > BEGIN; > INSERT INTO t1 VALUES (1); # T1 > COMMIT; > SET SESSION group_replication_consistency= BEFORE; > SELECT * FROM t1; # T2 > SET SESSION group_replication_consistency= AFTER; > BEGIN; > INSERT INTO t1 VALUES (2); # T3 > COMMIT; - on M2, a client executes: > SET SESSION group_replication_consistency= EVENTUAL; > SELECT * FROM t1; # T4 Notes on E1: - T1 is ordered before T3. - T2 shall wait for T1 to be applied on the local server before executing. - T3 shall commit and externalize after all ONLINE members have prepared. - T4 shall wait for both T1 and T3 changes to have been externalized, before executing. The transaction order, which is decided by the communication layer, does not depend nor it is changed by the consistency guarantee. GUARANTEE CONTEXT ================= The guarantee BEFORE, apart from being ordered on the transaction stream, only has impact on the local member. That is, it does not require coordination with the other members neither have repercussions on their transactions. In other words, BEFORE only does impact the member on which it is executed. The guarantee AFTER (and BEFORE_AND_AFTER) do have repercussions on the other members transactions, it will make the other members transactions to wait until the AFTER transaction is committed on that member, even if the other members transactions have EVENTUAL guarantee. In other words, AFTER (and BEFORE_AND_AFTER) do impact all ONLINE members. Example E2: - Group with 3 members: M1, M2 and M3 - on M1 a client executes: > SET SESSION group_replication_consistency= AFTER; > BEGIN; > INSERT INTO t1 VALUES (1); # T1 > COMMIT; - on M2 a client executes: > SET SESSION group_replication_consistency= EVENTUAL; > SELECT * FROM t1; # T2 Despite T2 guarantee it is EVENTUAL, since T1 is AFTER, T2 will wait until T1 it is committed before start its execution. SECURITY CONTEXT ================ From a point of view of malicious attack to the group, since when group_replication_consistency=AFTER or BEFORE_AND_AFTER a transaction will wait for a acknowledge from all ONLINE members, a UNREACHABLE member will block a transaction execution until that member is reachable or leaves the group. A malicious user can set group_replication_consistency=AFTER or BEFORE_AND_AFTER on long lived transactions, which may block new transactions while those long lived transactions are being applied. UPGRADE/DOWNGRADE AND CROSS-VERSION REPLICATION =============================================== There are no repercussions on upgrade scenarios. Obviously, after a downgrade to a version on which the consistency guarantees are not implemented they cannot be used. The guarantees AFTER or BEFORE_AND_AFTER can only be used when all group members do support it, that is from 8.0.14. If the group contains a member from a previous version the transaction will rollback. The guarantee BEFORE can only be used on members that do support it, though the other group members do not need to support it, that is, can be from lower versions. OBSERVABILITY ============= When group_replication_consistency=BEFORE, while the transaction it is waiting for the up-to-date data to be committed the session state will be "Executing hook on transaction begin" on INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do show the session state. When group_replication_consistency=AFTER or BEFORE_AND_AFTER, on the other members while the prepared transactions are being committed, the new transactions that are on hold until that commits do happen will have the session state "Executing hook on transaction begin" on INFORMATION_SCHEMA.PROCESSLIST, and on all other sources that do show the session state. DEPLOYMENT AND INSTALLATION =========================== There are no repercussions. PROTOCOL ======== There are no repercussions. FAILURE MODEL SPECIFICATION =========================== There are no repercussions, how this feature handles the existing failures it is expressed on the requirements and on the summary of the approach.
SUMMARY OF CHANGES ================== Server core changes ------------------- - Call after_commit hook on XA PREPARE and XA ROLLBACK since these commands do log a transaction to the binary log. Group Replication changes ------------------------- - The transactions hold mechanism on the AFTER and BEFORE_AND_AFTER guarantees is the "Group Replication: hold reads and writes when the new primary has replication backlog to apply". - Extend the system variable group_replication_consistency with three more values: BEFORE, AFTER, BEFORE_AND_AFTER. - Introduce a consistency manager to handle the new guarantees. - Introduce two new message types: CT_TRANSACTION_PREPARED_MESSAGE and CT_TRANSACTION_WITH_GUARANTEE_MESSAGE. - Refactor the inversion of control of certification to also include the consistency handling. GUARANTEES WORKFLOW =================== EVENTUAL -------- Algorithm: 1) transaction T1 starts on member M1; 2) it is executed up to the commit point, on that point the transaction data is sent to all group members, including the one on which the transaction was executed (M1); 3) on transaction deliver, every member will check for conflicts: 3.1) if there is a conflict the transaction is rolled back; 3.2) otherwise, the transaction is committed on the member that did execute the transaction (M1), on the other members the transaction is queued to execution and commit. 4) transaction T2 starts on member M3 before M3 did receive T1 transaction data, T2 will execute before T1 is executed on M3, which will make T2 read not up-to-date data. BEFORE ------ Algorithm: 1) transaction T1, with EVENTUAL consistency, starts on member M1; 2) it is executed up to the commit point, on that point the transaction data is sent to all group members, including the one on which the transaction was executed (M1); 3) on transaction deliver, every member will check for conflicts: 3.1) if there is a conflict the transaction is rollback; 3.2) otherwise, the transaction is committed on the member that did execute the transaction (M1), on the other members the transaction is queued to execution and commit. 4) Transaction T2, with BEFORE consistency, starts on member M3. Before the transaction execution, T2 will send a message to all members. That message will provide T2 global order before execution (1st message round on the workflow); 5) When that message is received and processed in-order, w.r.t. the message stream, on M3, M3 will fetch the Group Replication applier RECEIVED_TRANSACTION_SET, the set of remote transactions that were allowed to commit, independently of being already committed or not. This set gives us the remote transactions that do exist before this point. We only need to track remote transactions since the server already ensures consistency for local transactions. Other members will ignore this message, the message is sent to all to provide the global order. 6) Transaction T2 on M3 will wait until all the transactions on Group Replication applier RECEIVED_TRANSACTION_SET are committed, only after that its execution will start. This does ensure that T2 will never read past data relatively to its global order, which in this example is: T1, T2. This wait only takes place on the server that executes the transaction with BEFORE consistency, on this case T2@M3. All others members are not affected by this wait. 7) Once the transaction T2 execution starts, the next steps are the ones described on 2) and 3). AFTER ----- Algorithm: 1) transaction T1, with AFTER consistency, starts on member M1; 2) it is executed up to the commit point, on that point the transaction data is sent to all group members, including the one on which the transaction was executed (M1); 3) on transaction deliver, every member will check for conflicts: 3.1) if there is a conflict the transaction is rollback; 3.2) otherwise, it goes to step 4). 4) On the other members the transaction is queued to execution. Once the transaction is prepared, it will send a acknowledge to all members. 5) Once all members do receive acknowledges from all members - M1 is acknowledge implicitly since it did already prepare the transaction - they all proceed to transaction commit. 6) Transaction T2, with EVENTUAL consistency, starts on member M3. Since T1 is still being committed, T2 execution will be hold until T1 commit is completed. This will ensure that any transaction after T1 will read T1 data. 7) Once the transaction T2 execution starts, the next steps are the ones described on EVENTUAL consistency 2) and 3). BEFORE_AND_AFTER ---------------- It is BEFORE and AFTER workflows on the same transaction.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.