WL#9053: Group Replication: Push Group Replication Plugin
Affects: Server-5.7
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog is the worklog that captures the work required to push the Group Replication plugin to mysql-trunk. Moreover, it also tracks the some of the user stories and the high level requirements for MySQL Group Replication. Background ========== MySQL Group Replication is a MySQL plugin that brings multi-master update everywhere to MySQL. This plugin ties together concepts and technologies from distributed systems, such as group communication, with traditional database replication. The ultimate result is a seamlessly distributed and replicated database over a set of MySQL servers cooperating together to keep the replicated state strongly consistent. This document lays out the user stories and the high level requirements for the project. High level requirements are then mapped into worklogs and test cases. Definitions =========== - Cluster - Throughout the following text, the word cluster means group of Group Replication servers. - Site - site or server are used interchangeable throughout the text below and refer to the same thing, a server. User Stories ============ - As a MySQL DBA I want to deploy a cluster of MySQL servers that automatically withstands server failures and rearranges the cluster configuration so that reconfiguration is done automatically and without human intervention. - As a MySQL DBA I want to be able to log into a single server in a cluster of MySQL Servers, which is responsible for a given partition of the data, so that I can discover the cluster membership and member status. - As a MySQL user I want to execute transactions on a MySQL cluster where the failure of the server where transactions execute shall not render data loss. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless Primary-Backups replication setup so that any transaction that is executed against the primary server is never lost in the event of primary failure. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless multi-master replication setup, so I don't have to deal with primary fail-over on the middleware in the event of primary server failures. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless multi-master replication setup, so I can load balance my write load across different servers in the cluster. - As a MySQL DBA I want to deploy a cluster of MySQL servers into which I can add and remove servers dynamically while the service is running so I don't have to deal with server synchronization. - As a MySQL DBA I want to connect two clusters across wide-area so I can deploy mutliple datacenters to address disaster recovery requirements. References ========== http://mysqlhighavailability.com/mysql-group-replication-hello-world/ http://mysqlhighavailability.com/category/replication/group-replication/
- Initialization
This section presents the high level requirements for the
initialization of the group replication plugin. Namely, in what form
group replication is loaded into the server and when and how group
replication can be loaded and started or unloaded and stopped.
I1. Group Replication SHALL be a MySQL plugin that the user can
load/unload while the MySQL server is online.
I2. Group Replication SHALL be a MySQL plugin that the user can
start and stop while the MySQL server is online.
I3. Group Replication plugin SHALL be able to start during the MySQL
server boot and be available when the first user session is opened.
- Deployment
These are the requirements for deploying a Group Replication cluster.
- General
D1. Group Replication SHALL support up to 9 servers in the group.
This is obviously dependent on the workload. Write intensive workload
will REQUIRE high bandwidth networks.
D2. A Group SHALL be identified by a UUID. This is the group_name.
- Provisioning
D3. A server SHALL be provisioned according to the current procedure
in place to provision a slave server for instance.
- Boostrapping
D4. When starting the group, one server in the group SHALL be
responsible to bootstrap the group. The user SHALL configure which
server is responsible for bootstrapping it.
D5. Should N servers in the group be marked as servers that
bootstrap the group, then N groups SHALL be formed.
- Adding New Server to the Group
D6. A server MUST be provisioned before it is added to the group.
D7. Servers SHALL be added to the group one by one and not all in
one go.
- Distributed Recovery
D8. Once added to the group a server SHALL fetch any missing state
from a donor before starting processing new transactions from the
user and the group itself - this is the distributed recovery.
D9. Recovery donor SHALL be picked randomly until a good donor is
found. Then recovery MUST succeed.
D10. The newly added server SHALL start in RECOVERYING state and
then moves to ONLINE after distributed recovery terminates.
D11. A server SHALL remain in read-only mode until it is declared
ONLINE.
D12. A server SHALL remain in read-only mode if recovery fails.
- Conflicts: Detection and Resolution
These are the high level requirements for Group Replication
regarding conflict detection and resolution. Note though that MySQL
is not transactional (e.g., one cannot roll back DDL), thence some
special rules apply to DDL.
In group replication, transactions executing at different servers
execute and modify their local view of the data, i.e., they execute
on their own snapshot. Before committing, they need to synchronize
with all the other servers in the group to validate that their
changes don't actually conflict with changes that happened on other
snapshots/servers concurrently. One can think of this as a
distributed multi-version concurrency control system with optimistic
locking, where transactions execute optimistically and without a
priori synchronization at each server and then, on commit, they
establish the global ordering of their operations by consulting the
group. Then if there are two transactions writing to the same row
concurrently, the first one to be totally ordered will be the one
committing successfully, the second one will roll back/abort. We can
see this as an extended definition of snapshot isolation concurrency
control, in fact this type of approach has been named generalized
snapshot isolation.
C1. Conflicts SHALL be resolved by aborting one of the transaction
that conflicts.
C2. Any two DML transactions, executing at different servers,
updating the same data row, SHALL conflict and an error returned to
the user on one of them (depending on the stage of the execution
that a transaction is at when the conflict occurs).
- ER_ERROR_DURING_COMMIT - when a local transaction is aborted
due to a high-priority transaction during the commit stage.
- ER_LOCK_DEADLOCK - when a local transaction is aborted due to
a high-priority transaction during the update stage.
- ER_TRANSACTION_ROLLBACK_DURING_COMMIT - a conflict is detected
on certification.
C3. Any two operations, one DML and one DDL, operating on different
objects (e.g., table t1 and table t2), SHALL execute concurrently
and correctly at different servers in the group.
C4. Any two operations, one DML and one DDL, operating on the same
object (e.g., table t1), SHALL execute correctly at different
servers in the group, provided that they do NOT execute
concurrently.
C5. Any two operations, one DML and one DDL, operating on the same
object (e.g., table t1), SHALL lead to data inconsistencies if they
execute concurrently on different servers in the group.
C6. Any two operations, one DML and one DDL, operating on the same
object (e.g., table t1), SHALL execute correctly if they both
execute on the same server in the group.
- Network Partitioning
At its core, Group Replication relies on a variant of Paxos to
provide distributed agreement before a transaction commits. The
algorithm implemented by XCom is very close to Mencius. This means
that the system will operate correctly and remain in a synchronized
state as long as servers in the group can reach a quorum among
themselves. Obviously, if the quorum is lost, then the system will
not allow modifications, since a majority is unable to decided on
the fate of a given operation.
On distributed systems, when a server becomes unreachable, it is
impossible to say whether a server/process has crashed or whether it
is simply unreachable from the given origin. The only thing one
knows is that the system is not reachable within a specific time
boundary. Thence, when we refer to a failed server, is a server that
is unreachable.
N1. A quorum requires a majority of nodes, thence to tolerate f
failures a group replication system MUST have 2f+1 servers in the
group.
- Assuming f=1 (3 servers deployed):
N1.1 Should 1 server crash, then the system SHALL remain
available for reads and writes.
N1.2 Should 2 servers crash at once, then the system SHALL
remain available only for reads. Write operations SHALL block.
N1.3 Should 2 servers crash consecutively, i.e., the group
notices that one server has been removed and then the
other, then the cluster SHALL reconfigure the group and
allow the group to shrink automatically.
- View Changes, Failure Detection and Reconfiguration
This section presents the requirements on failure detection and
reconfigurations of the group. A view is the status of the
membership at a given logical moment in time. Servers keep track of
the membership and a distributed failure detection mechanism is in
place to find which servers are alive. If the failure detector
suspects that a server is no longer reachable, then a
reconfiguration of the membership may be triggered
automatically. Conversely, a new server is added to the group, by
having a delegate to intercede for it, i.e., by having a delegate to
trigger a reconfiguration and thus a view change. Servers need to
agree on the new view after a suspicion and/or a reconfiguration is
triggered.
V1. Servers in the group SHALL suspect other servers every N seconds
automatically.
V2. Should a server in the group crash or become unreachable, then
the failure detection will hint that the server has failed. A server
SHALL then trigger a reconfiguration automatically, ultimately
resulting in installing a new view.
V3. Should a server a be added to the group, then a new view SHALL
be installed.
- Observability
This section presents the high level requirements for the
observability of the dynamic group replication stats.
O1. The user SHALL be able to inspect the state of members in the
group by checking performance schema tables:
O1.1 replication_group_member_stats - This table SHALL show
statistics on the certification procedure, including the view
identifier of the given group.
O1.2 replication_group_members - This table SHALL present the
membership of the group at a given point in time and the
status of the members in the group.
O1.3 replication_connection_status - This table SHALL present
status of the cluster connection status.
O1.4 replication_applier_status - This table SHALL present
status of the group replication applier.
- Performance
This section provides the high level requirements for Group
Replication performance.
P1. The performance of Group Replication deployed in with just a
single master writing and several backups copying the changes, SHALL
be comparable/competitive to Galera in Single Leader mode.
P2. The performance of Group Replication when deployed in
multi-master without hotspots does not have a boundary or a baseline
to compare to on the first release, thence the requirement is that
it performs good enough. However the stretch goal is that it is
competitive with Galera in multi-master mode.
P3. The performance of Group Replication when deployed in
multi-master with hotspots (i.e., different servers updating the
rows) SHALL be very LOW.
P4. Enabling SSL WILL present an overhead on the throughput.
- Interoperability
These are the high level requirements for interoperability of Group
Replication with other MySQL components, in particular with MySQL
async replication.
IO1. It SHALL be possible to replicate between a server in a group
and a standalone server using regular replication.
IO2. It SHALL be possible to replicate from a regular MySQL master
to a server in a group.
IO3. Group Replication SHALL use the regular replication
infrastructure - relay logs, binary logs, applier threads, receiver
thread.
- Security
These are the high level requirements regarding security in Group
Replication.
S1. It SHALL be possible to interconnect servers in the group using
SSL, thus using secure channels.
S2. If two servers are not using the same certificates, then will
fail to communicate.
S3. Recovery credentials SHALL NOT be exposed on any log or command
line option.
S4. Group Replication SHALL use the same approach as regular
replication to setup recovery credentials (CHANGE MASTER TO ...)
WL#6829: Group Replication: Generic Interface for accessing communication infrastructure
WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation)
WL#6837: Group Replication: Online distributed recovery
WL#6842: Group Replication: Transaction Applier Module
WL#7060: Group Replication: Plugin
WL#7332: Group Replication: cluster constistent view-id and node attributes propagation
WL#7334: Group Replication: Read-set free Certification Module - Part 2
WL#7732: GCS Replication: GCS API
WL#8443: Group Replication: Plugin version handshake on member join
WL#8445: Group Replication: Auto-increment configuration/handling
WL#8446: Group Replication: Implement error detection and sleep routines on donor selection
WL#8793: Group Replication: Integration with libmysql-gcs and XCom
WL#9017: Group Replication: Secure distributed recovery credentials
WL#9050: Group Replication: MySQL GCS majority loss handling integration
WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation)
WL#6837: Group Replication: Online distributed recovery
WL#6842: Group Replication: Transaction Applier Module
WL#7060: Group Replication: Plugin
WL#7332: Group Replication: cluster constistent view-id and node attributes propagation
WL#7334: Group Replication: Read-set free Certification Module - Part 2
WL#7732: GCS Replication: GCS API
WL#8443: Group Replication: Plugin version handshake on member join
WL#8445: Group Replication: Auto-increment configuration/handling
WL#8446: Group Replication: Implement error detection and sleep routines on donor selection
WL#8793: Group Replication: Integration with libmysql-gcs and XCom
WL#9017: Group Replication: Secure distributed recovery credentials
WL#9050: Group Replication: MySQL GCS majority loss handling integration
By Design Behaviour and Requirements ==================================== - Row-based Replication. Why? Write-sets are calculated at the time record extraction for RBR is performed. - InnoDB tables only. Why? Operations need to be transactional so that the group decides whether to commit or rollback an operation at commit time. Moreover, transactions that are executing during the optimistic stage can be aborted because a certified transaction is invalidating it, thence no need to continue execution of the optimistic one. - Primary Key on each table. Why? An item in the write set relies on the hash of a primary key, thence unique across the cluster. - On conflicts, abort on the commit operation. Why? Because Group Replication implements a replication protocol which is based on the concept of generalized snapshot isolation. Known Limitations ================= Multi-master ------------ - DML/DDL update everywhere is not possible. - DDL and DML for the object being altererd needs to go through the same server. - MySQL does not have transactional nor even atomic DDL. Server limitation. - Transaction savepoints are not supported. - The write-set extraction procedure does not consider savepoints, thence no way to deal with it at the moment. - Binlog event checksums are not supported. - Reduced performance on large transactions. - Transactions are cached and sent in one message atomically to all servers in the group. - GR can generate inconsistencies if there are multiple FK dependencies. - GR does not take into account Key-Range locking in the certification. - GR does not take into account table locks in the certification. Single Master ------------- - Transaction savepoints are not supported. - Binlog event checksums are not supported. - Reduced performance on large transactions.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.