WL#9053: Group Replication: Push Group Replication Plugin
Affects: Server-5.7
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog is the worklog that captures the work required to push the Group Replication plugin to mysql-trunk. Moreover, it also tracks the some of the user stories and the high level requirements for MySQL Group Replication. Background ========== MySQL Group Replication is a MySQL plugin that brings multi-master update everywhere to MySQL. This plugin ties together concepts and technologies from distributed systems, such as group communication, with traditional database replication. The ultimate result is a seamlessly distributed and replicated database over a set of MySQL servers cooperating together to keep the replicated state strongly consistent. This document lays out the user stories and the high level requirements for the project. High level requirements are then mapped into worklogs and test cases. Definitions =========== - Cluster - Throughout the following text, the word cluster means group of Group Replication servers. - Site - site or server are used interchangeable throughout the text below and refer to the same thing, a server. User Stories ============ - As a MySQL DBA I want to deploy a cluster of MySQL servers that automatically withstands server failures and rearranges the cluster configuration so that reconfiguration is done automatically and without human intervention. - As a MySQL DBA I want to be able to log into a single server in a cluster of MySQL Servers, which is responsible for a given partition of the data, so that I can discover the cluster membership and member status. - As a MySQL user I want to execute transactions on a MySQL cluster where the failure of the server where transactions execute shall not render data loss. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless Primary-Backups replication setup so that any transaction that is executed against the primary server is never lost in the event of primary failure. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless multi-master replication setup, so I don't have to deal with primary fail-over on the middleware in the event of primary server failures. - As a MySQL DBA I want to deploy a cluster of MySQL servers with a lossless multi-master replication setup, so I can load balance my write load across different servers in the cluster. - As a MySQL DBA I want to deploy a cluster of MySQL servers into which I can add and remove servers dynamically while the service is running so I don't have to deal with server synchronization. - As a MySQL DBA I want to connect two clusters across wide-area so I can deploy mutliple datacenters to address disaster recovery requirements. References ========== http://mysqlhighavailability.com/mysql-group-replication-hello-world/ http://mysqlhighavailability.com/category/replication/group-replication/
- Initialization This section presents the high level requirements for the initialization of the group replication plugin. Namely, in what form group replication is loaded into the server and when and how group replication can be loaded and started or unloaded and stopped. I1. Group Replication SHALL be a MySQL plugin that the user can load/unload while the MySQL server is online. I2. Group Replication SHALL be a MySQL plugin that the user can start and stop while the MySQL server is online. I3. Group Replication plugin SHALL be able to start during the MySQL server boot and be available when the first user session is opened. - Deployment These are the requirements for deploying a Group Replication cluster. - General D1. Group Replication SHALL support up to 9 servers in the group. This is obviously dependent on the workload. Write intensive workload will REQUIRE high bandwidth networks. D2. A Group SHALL be identified by a UUID. This is the group_name. - Provisioning D3. A server SHALL be provisioned according to the current procedure in place to provision a slave server for instance. - Boostrapping D4. When starting the group, one server in the group SHALL be responsible to bootstrap the group. The user SHALL configure which server is responsible for bootstrapping it. D5. Should N servers in the group be marked as servers that bootstrap the group, then N groups SHALL be formed. - Adding New Server to the Group D6. A server MUST be provisioned before it is added to the group. D7. Servers SHALL be added to the group one by one and not all in one go. - Distributed Recovery D8. Once added to the group a server SHALL fetch any missing state from a donor before starting processing new transactions from the user and the group itself - this is the distributed recovery. D9. Recovery donor SHALL be picked randomly until a good donor is found. Then recovery MUST succeed. D10. The newly added server SHALL start in RECOVERYING state and then moves to ONLINE after distributed recovery terminates. D11. A server SHALL remain in read-only mode until it is declared ONLINE. D12. A server SHALL remain in read-only mode if recovery fails. - Conflicts: Detection and Resolution These are the high level requirements for Group Replication regarding conflict detection and resolution. Note though that MySQL is not transactional (e.g., one cannot roll back DDL), thence some special rules apply to DDL. In group replication, transactions executing at different servers execute and modify their local view of the data, i.e., they execute on their own snapshot. Before committing, they need to synchronize with all the other servers in the group to validate that their changes don't actually conflict with changes that happened on other snapshots/servers concurrently. One can think of this as a distributed multi-version concurrency control system with optimistic locking, where transactions execute optimistically and without a priori synchronization at each server and then, on commit, they establish the global ordering of their operations by consulting the group. Then if there are two transactions writing to the same row concurrently, the first one to be totally ordered will be the one committing successfully, the second one will roll back/abort. We can see this as an extended definition of snapshot isolation concurrency control, in fact this type of approach has been named generalized snapshot isolation. C1. Conflicts SHALL be resolved by aborting one of the transaction that conflicts. C2. Any two DML transactions, executing at different servers, updating the same data row, SHALL conflict and an error returned to the user on one of them (depending on the stage of the execution that a transaction is at when the conflict occurs). - ER_ERROR_DURING_COMMIT - when a local transaction is aborted due to a high-priority transaction during the commit stage. - ER_LOCK_DEADLOCK - when a local transaction is aborted due to a high-priority transaction during the update stage. - ER_TRANSACTION_ROLLBACK_DURING_COMMIT - a conflict is detected on certification. C3. Any two operations, one DML and one DDL, operating on different objects (e.g., table t1 and table t2), SHALL execute concurrently and correctly at different servers in the group. C4. Any two operations, one DML and one DDL, operating on the same object (e.g., table t1), SHALL execute correctly at different servers in the group, provided that they do NOT execute concurrently. C5. Any two operations, one DML and one DDL, operating on the same object (e.g., table t1), SHALL lead to data inconsistencies if they execute concurrently on different servers in the group. C6. Any two operations, one DML and one DDL, operating on the same object (e.g., table t1), SHALL execute correctly if they both execute on the same server in the group. - Network Partitioning At its core, Group Replication relies on a variant of Paxos to provide distributed agreement before a transaction commits. The algorithm implemented by XCom is very close to Mencius. This means that the system will operate correctly and remain in a synchronized state as long as servers in the group can reach a quorum among themselves. Obviously, if the quorum is lost, then the system will not allow modifications, since a majority is unable to decided on the fate of a given operation. On distributed systems, when a server becomes unreachable, it is impossible to say whether a server/process has crashed or whether it is simply unreachable from the given origin. The only thing one knows is that the system is not reachable within a specific time boundary. Thence, when we refer to a failed server, is a server that is unreachable. N1. A quorum requires a majority of nodes, thence to tolerate f failures a group replication system MUST have 2f+1 servers in the group. - Assuming f=1 (3 servers deployed): N1.1 Should 1 server crash, then the system SHALL remain available for reads and writes. N1.2 Should 2 servers crash at once, then the system SHALL remain available only for reads. Write operations SHALL block. N1.3 Should 2 servers crash consecutively, i.e., the group notices that one server has been removed and then the other, then the cluster SHALL reconfigure the group and allow the group to shrink automatically. - View Changes, Failure Detection and Reconfiguration This section presents the requirements on failure detection and reconfigurations of the group. A view is the status of the membership at a given logical moment in time. Servers keep track of the membership and a distributed failure detection mechanism is in place to find which servers are alive. If the failure detector suspects that a server is no longer reachable, then a reconfiguration of the membership may be triggered automatically. Conversely, a new server is added to the group, by having a delegate to intercede for it, i.e., by having a delegate to trigger a reconfiguration and thus a view change. Servers need to agree on the new view after a suspicion and/or a reconfiguration is triggered. V1. Servers in the group SHALL suspect other servers every N seconds automatically. V2. Should a server in the group crash or become unreachable, then the failure detection will hint that the server has failed. A server SHALL then trigger a reconfiguration automatically, ultimately resulting in installing a new view. V3. Should a server a be added to the group, then a new view SHALL be installed. - Observability This section presents the high level requirements for the observability of the dynamic group replication stats. O1. The user SHALL be able to inspect the state of members in the group by checking performance schema tables: O1.1 replication_group_member_stats - This table SHALL show statistics on the certification procedure, including the view identifier of the given group. O1.2 replication_group_members - This table SHALL present the membership of the group at a given point in time and the status of the members in the group. O1.3 replication_connection_status - This table SHALL present status of the cluster connection status. O1.4 replication_applier_status - This table SHALL present status of the group replication applier. - Performance This section provides the high level requirements for Group Replication performance. P1. The performance of Group Replication deployed in with just a single master writing and several backups copying the changes, SHALL be comparable/competitive to Galera in Single Leader mode. P2. The performance of Group Replication when deployed in multi-master without hotspots does not have a boundary or a baseline to compare to on the first release, thence the requirement is that it performs good enough. However the stretch goal is that it is competitive with Galera in multi-master mode. P3. The performance of Group Replication when deployed in multi-master with hotspots (i.e., different servers updating the rows) SHALL be very LOW. P4. Enabling SSL WILL present an overhead on the throughput. - Interoperability These are the high level requirements for interoperability of Group Replication with other MySQL components, in particular with MySQL async replication. IO1. It SHALL be possible to replicate between a server in a group and a standalone server using regular replication. IO2. It SHALL be possible to replicate from a regular MySQL master to a server in a group. IO3. Group Replication SHALL use the regular replication infrastructure - relay logs, binary logs, applier threads, receiver thread. - Security These are the high level requirements regarding security in Group Replication. S1. It SHALL be possible to interconnect servers in the group using SSL, thus using secure channels. S2. If two servers are not using the same certificates, then will fail to communicate. S3. Recovery credentials SHALL NOT be exposed on any log or command line option. S4. Group Replication SHALL use the same approach as regular replication to setup recovery credentials (CHANGE MASTER TO ...)
WL#6829: Group Replication: Generic Interface for accessing communication infrastructure
WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation)
WL#6837: Group Replication: Online distributed recovery
WL#6842: Group Replication: Transaction Applier Module
WL#7060: Group Replication: Plugin
WL#7332: Group Replication: cluster constistent view-id and node attributes propagation
WL#7334: Group Replication: Read-set free Certification Module - Part 2
WL#7732: GCS Replication: GCS API
WL#8443: Group Replication: Plugin version handshake on member join
WL#8445: Group Replication: Auto-increment configuration/handling
WL#8446: Group Replication: Implement error detection and sleep routines on donor selection
WL#8793: Group Replication: Integration with libmysql-gcs and XCom
WL#9017: Group Replication: Secure distributed recovery credentials
WL#9050: Group Replication: MySQL GCS majority loss handling integration
WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation)
WL#6837: Group Replication: Online distributed recovery
WL#6842: Group Replication: Transaction Applier Module
WL#7060: Group Replication: Plugin
WL#7332: Group Replication: cluster constistent view-id and node attributes propagation
WL#7334: Group Replication: Read-set free Certification Module - Part 2
WL#7732: GCS Replication: GCS API
WL#8443: Group Replication: Plugin version handshake on member join
WL#8445: Group Replication: Auto-increment configuration/handling
WL#8446: Group Replication: Implement error detection and sleep routines on donor selection
WL#8793: Group Replication: Integration with libmysql-gcs and XCom
WL#9017: Group Replication: Secure distributed recovery credentials
WL#9050: Group Replication: MySQL GCS majority loss handling integration
By Design Behaviour and Requirements ==================================== - Row-based Replication. Why? Write-sets are calculated at the time record extraction for RBR is performed. - InnoDB tables only. Why? Operations need to be transactional so that the group decides whether to commit or rollback an operation at commit time. Moreover, transactions that are executing during the optimistic stage can be aborted because a certified transaction is invalidating it, thence no need to continue execution of the optimistic one. - Primary Key on each table. Why? An item in the write set relies on the hash of a primary key, thence unique across the cluster. - On conflicts, abort on the commit operation. Why? Because Group Replication implements a replication protocol which is based on the concept of generalized snapshot isolation. Known Limitations ================= Multi-master ------------ - DML/DDL update everywhere is not possible. - DDL and DML for the object being altererd needs to go through the same server. - MySQL does not have transactional nor even atomic DDL. Server limitation. - Transaction savepoints are not supported. - The write-set extraction procedure does not consider savepoints, thence no way to deal with it at the moment. - Binlog event checksums are not supported. - Reduced performance on large transactions. - Transactions are cached and sent in one message atomically to all servers in the group. - GR can generate inconsistencies if there are multiple FK dependencies. - GR does not take into account Key-Range locking in the certification. - GR does not take into account table locks in the certification. Single Master ------------- - Transaction savepoints are not supported. - Binlog event checksums are not supported. - Reduced performance on large transactions.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.