MySQL 5.6 Replication - Enabling the Next Generation of Web & Cloud Services
Global Transaction IDs & Utilities for Self-Healing Replication Clusters
Note: This article was originally published in April 2012, and has now been updated to include the new Binary Log Group Commit functionality introduced in June 2012
The new MySQL 5.6.5 Development Milestone Release (DMR) introduces a much anticipated feature - Global Transaction Identifiers (GTIDs) for MySQL replication.
GTIDs make it simple to track and compare replication progress between the master and slave servers. This enables much simpler recovery from failures of the master, while also introducing great flexibility in the provisioning and on-going management of multi-tier or ring (circular) replication topologies.
To complement GTIDs, two new GTID Replication Utilities have also been released. These utilities provide administration of GTID-enabled slaves, and monitoring with automatic failover and on-demand switchover, coupled with slave promotion, enabling self-healing recovery. Users no longer need to rely solely on 3rd party frameworks to provide High Availability (HA) mechanisms - these instead are integrated directly within the MySQL distribution.
While GTIDs and the associated utilities - discussed in more detail below - are significant enhancements to MySQL 5.6, they are just one of many new replication features that are designed to ensure MySQL extends its position as the leading database for the web and for the cloud.
New Demands of Web and Cloud Services
To meet the demands of new generations of services, replication development needs to be focused on delivering the following:
- Improves consistency of reads from slaves (i.e. reduces the risk of reading "stale" data)
- Reduces the risk of data loss should the master fail before replicating all events in its binary log (binlog)
- Automatic failure detection with self healing recovery (preventing unplanned downtime)
- Controlled switchover in the event of maintenance to the master (preventing planned downtime)
- Proactive monitoring of the replication topology to identify issues before they can cause an outage
- Ensuring replicated data is correct, consistent and accessible
- Developer / Operations (DevOps) Agility
- Automation to reduce administrative overhead
- Flexibility to support the rapid evolution of business requirements
- Maintaining low Total Cost of Ownership (TCO).
As illustrated in the figure below, replication in MySQL 5.6 has been designed to address these demands:
Figure 1: Summary of new replication features in MySQL 5.6
The rest of this article provides an overview of each of these new features, with resources to learn more about their implementation, and how you can get started with them today.
The goal of any performance enhancement is to replicate and apply data from the master to the slave(s) as quickly as possible while minimizing overhead on all of the servers involved in the topology. Performance enhancements have been a key design focus for replication in MySQL 5.6.
Replication performance is improved by using multiple execution threads to apply replication events to the slave(s).
The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems
To demonstrate performance benefits of Multi-Threaded Slaves, the MySQL Engineering team recently completed a benchmark that compared slave throughput when using single and multi-threaded replication. As the results below demonstrate, slave throughput was increased by ~ 5x based on a configuration with 10 databases/schemas, directly translating to improved read consistency and reduced risk of data loss in the event of an outage of the master.
Slaves are better able to keep up with the master, and so users are much less likely to need to throttle the sustained throughput of writes, just so that the slaves don't indefinitely fall further and further behind (at the moment some users have to reduce the capacity of their systems in order to reduce slave lag).
Figure 2: 5x Higher Performance with Multi-Threaded Slaves
Multi-Threaded Slaves were merged into the MySQL Server 5.6 mainline with the DMR2 release. Learn more about the implementation of Multi-Threaded Slaves.
Binary Log Group Commit
A significant improvement to replication performance has been enabled by the introduction of Binary Log Group Commit which groups writes to the Binlog, rather than applying them one at a time.
This enhancement significantly improves the performance of the replication master, as demonstrated by the benchmark results below.
Running the Sysbench R/W benchmark, we can see that the MySQL master is able to deliver close to 4x higher performance (Queries Per Second) running the MySQL 5.6 June 2012 labs release when compared to the previous MySQL 5.6.5. Even with sync_binlog=1, the latest release is able to deliver over 2x higher performance than a server without replication configured!
Additional benchmarks - documented in the blog below - show that at above 60 threads, configuring sync_binlog = 1 is able to match the performance of a server configured with sync_binlog = 0, or within a 10% delta. So users get higher data safety, without compromising performance. Read more about sync_binlog configurations
The result of Binary Log Group Commit is that MySQL replication is much better able to keep pace with the demands of write-intensive workloads. Gone are the days when configuring replication to achieve scalability and HA, resulted in a 50%+ hit to your server’s performance.
Note that Binlog Group Commit is not part of MySQL 5.6.5 DMR, but you can try it out today by downloading the latest mysql-5.6-labs-June-2012 build, available on http://labs.mysql.com/. We do plan to merge the feature with the MySQL 5.6.6 DMR.
Optimized Row Based Replication
By only replicating those elements of the row image that have changed following INSERT, UPDATE and DELETE operations, replication throughput for both the master and slave(s) can be increased while binary log disk space, network resource and server memory footprint are all reduced.
Failover and Recovery
Replication has long been used as a foundation for High Availability (HA) in MySQL environments. However, the lack of any integrated mechanism to detect and initiate failover, or reliably promote a slave to master has forced many users to either develop their own custom solutions or rely on 3rd party HA frameworks.
In cloud environments, where the underlying hardware infrastructure can be best described as "ephemeral", ensuring HA becomes even more of a priority, and a challenge.
Global Transaction Identifiers (GTIDs) and Utilities
As discussed in the introduction to this article, the combination of GTIDs and associated utilities provides a solution for failover/switchover and self-healing within the base MySQL distribution.
Global Transaction IDs are unique identifiers comprising the server UUID (of the original master) and sequence number that are automatically generated for every replication event and written with each transaction to the binary log. As Global Transaction IDs are unique across the master and slaves, it is simple to track and compare replication progress among the slaves - and if those slaves are also acting as masters in a multi-tier replication topology, then to their slaves as well.
The use of Global Transaction IDs makes it straightforward to reliably failover from the master to the most current slave automatically in the event of a failure, or the need to perform maintenance to the master (switchover). DBAs no longer need to manually analyze the status of each of their slaves to identify the most current when seeking a target to promote to the new master.
Risks of data corruption caused by applications specifying invalid binary log positions or failures when deploying multi-master ring (circular) replication topologies are also eliminated.
The power of GTIDs are fully realized when used with the new MySQL replication utilities:
Provides continuous monitoring of the replication topology, enabling failover to a slave in the event of an outage on the master.
The default behavior is to promote the first viable slave in the configured list, but this can be overridden. Therefore, a user can nominate a specific candidate slave to become the new master (i.e. because it is configured in a certain way or has better performing hardware), and the utility will automatically poll the other slaves to retrieve any events so far not replicated to the candidate slave's relay log.
This ensures no replicated transactions are lost, even if the candidate is not the most current slave when failover is initiated.
If a user needs to take a master offline for scheduled maintenance, mysqlrpladmin can perform a switchover to a slave, defaulting to the most recent slave, or to a nominated candidate slave. It can also perform a manual failover in the event a master server has gone offline.
The utility can also be used to start and stop slaves, as well as provide slave discovery and basic monitoring, including
- Status of the slave and its I/O and SQL threads;
- Status of replication processing through the topology, including any lag or errors;
- Configuration of slave promotion policies.
With either of these utilities, the user may register scripts to be called in the event of a slave promotion - for example to redirect an application to use the new master for writes and for reads that must always be consistent.
- Learn more about Global Transaction Identifiers.
- Learn more about the MySQL GTID replication utilities.
- Learn more about client-side failover with PHP connector.
Crash Safe Slaves and Binlog
Also known as Transactional Replication, Crash Safe Slaves and Binlog extend the robustness, availability and ease-of-use of MySQL replication by making both the binary log and the slaves crash safe.
The binlog and table data are transactionally consistent when using a transactional storage engine, and so the slave can automatically roll back replication to the last committed event before a crash, and resume replication without administrator intervention. Not only does this reduce operational overhead, it also eliminates the risk of data loss or corruption caused by the failure of a slave.
If a crash to the master causes corruption of the binary log, the server will automatically recover it to a position where it can be read correctly.
Data Integrity - Replication Event Checksums
Data integrity within a replication environment assures data is correct, consistent and accessible.
Replication Event Checksums ensure the integrity of data being replicated to a slave by detecting data corruption and returning an error, preventing the slave itself from becoming corrupt.
Checksums are implemented in the binary and relay logs as well as to individual replication events, allowing errors to be detected whether they are caused by memory, disk or network failures, or by the database itself. Checksums can be implemented on a per-slave basis, giving maximum flexibility in how and where it is deployed.
With growing pressure on time to market, coupled with creating a flexible environment to support increasingly rapid business evolution and the need to reduce overhead, Dev/Ops agility has become a greater priority.
MySQL has always been known for ease-of-use, and with the enhancements described above, replication also becomes more self-administering. A number of additional enhancements have been made to increase that flexibility and reduce overhead:
MySQL Replication Utilities
This article has already highlighted the new GTID utilities, but would not be complete without mentioning the existing utilities for MySQL replication, and how they can be used in a workflow for verifying, provisioning and managing replication:
Figure 3: Utility workflow for MySQL Replication
mysqlrplcheck utility: verifies configuration parameters to ensure fast and reliable deployment of replication.
mysqlreplicate utility: manages the provisioning of a new slave.
mysqlrplshow utility: auto-discovery of the replication topology.
mysqlrpladmin: manages failover and switchover from a master to a slave, and provides status monitoring of the replication topology.
These utilities are implemented as:
- Available under the GPLv2 license.
- Library of Python Scripts. Easily customizable and extensible by users.
Refer to the MySQL Utilities documentation to learn more.
Time Delayed Replication
The user can define a time delay for events to be replicated from a master to each slave, defined in increments of one second, up to a maximum of 68 years!
Time-Delayed Replication affords protection against operational errors made on the master, for example accidently dropping tables, in which event the slave can be promoted to the new master in order to restore the database to its previous state. It also becomes possible to inspect the state of a database without reloading a back up.
Time-Delayed Replication is implemented at a per-slave level (via holding execution of the SQL_THREAD), so a user could configure multiple slaves to apply replication events immediately, and another slave to apply only after a delay of 10 minutes, therefore providing deployment flexibility - as illustrated in the figure below:
Figure 4: Time Delayed Replication enables a user to control when replication events are applied to slaves
Learn more about Time Delayed Replication in the MySQL documentation.
Remote Binlog Backup
Enhances operational efficiency by using the replication channel to create real-time back-ups from the binary log.
Users can run the backup from a remote (non-production) host; the backup command will connect to the production database as if it were a slave and request the contents of the binary log. By adding a "raw" flag, the binlog is written out to remote back-up servers, without having a MySQL database instance translating it into SQL statements, and without the DBA needing SSH access to each master server. The backup command will then dump the data to a local file.
Server UUIDs enhance auto-discovery and analysis of replication topologies when using the MySQL Enterprise Monitor by automatically generating a Universally Unique Identifier (UUID) for each server.
This is especially useful in large and highly dynamic replication environments, making auto-discovery more reliable and simplifying systems management.
Informational Log Events
Enhances auditing and debugging when using Row-Based Replication by writing the original query to the binary log, which is then replicated with its associated row-based event to the slave.
Learn more about Informational Log Events.
GTIDs and the associated utilities significantly enhance MySQL replication, and they are undoubtedly the highlights of the latest MySQL 5.6.5 DMR.
When combined with those features that have already been previewed in MySQL 5.6, it is clear to see that MySQL replication is setting the new standard for next generation web and cloud services demanding high performance, high availability, data integrity and Dev/Ops agility.
You can get started with all of these capabilities today by downloading MySQL 5.6 from our MySQL Developer Zone (select the Development Releases tab)
Let us know what you think of these enhancements directly in comments for each blog, or here. We look forward to working with the community to perfect these new features.