MySQL :: Semi-synchronous Replication Performance in MySQL 5.7

With MySQL 5.7 becoming GA it’s a good time to highlight how much performance has improved in replication since the 5.6 era. A previous blog post focused on the performance of the multi-threaded slave applier and on this one the target is the semi-synchronous replication plug-in (SemiSYNC), whose performance has improved greatly.

1. Semi-synchronous replication performance

During the development of MySQL 5.7 many improvements have been introduced in semi-synchronous replication. For more information about some of the most relevant improvements please follow these links:

Better concurrency and loss-less replication;
Make data durable on more than one slave;
Separate acknowledgment receiving thread.

The great news is that the throughput gap to asynchronous replication is now relatively small. While there is the expected additional latency per transaction, the “pipeline” is now more optimized and allows an effective use of a significantly larger number of concurrent threads. As will be shown below, this translates to a reduced effect on the system’s throughput that benefits not only local network deployments, but also wide-area networks deployments when the added latency is supported by user applications.

On some circumstances the throughput from SemiSYNC is in fact higher than that of asynchronous replication when the master has durability settings set (sync-binlog=1, innodb_flush_log_at_trx_commit=1) even when fast storage systems are used. Together with loss-less SemiSYNC this allows users to move durability away from the master and into to the network (see Yoshinori Matsunobu’s suggestions here), with a significant throughput gain in the master.

2. Benchmarking SemiSYNC

The performance of SemiSYNC, as always, is dependent on the type of workload that is executed on the master, what we present below is not a replacement for proper testing on the users’ workload and system.

To test the performance of SemiSYNC we used the Sysbench 0.5 RW benchmark. The benchmark was executed on three configurations: i) asynchronous replication with durability, ii) semi-synchronous replication without durability and iii) asynchronous replication without durability. The master and slaves were executed on two, six-core, Xeon E7540 processors (for a total of 24 HW threads), in servers interconnected by a 1Gbps LAN and with SSD-based storage systems.

The number of clients (threads in this case) in the workload is important as it provides a mechanism to hide the additional latency. In our tests we used 1, 10, 30, 100, 300, 1000, 3000 and 10000, reaching a manifestly high number to show that dealing with that is no longer an issue with SemiSYNC.

2.1 Throughput

The following chart shows the number of transactions per second (TPS) that each of the three configurations is able to reach. The asynchronous replication throughput is dashed, representing the upper bound one would expect in any of the other two configurations. The chart is scaled so that 100% represents the highest throughput achieved, so that it’s easier to perceive how much is missing from those 100% in any combination of options.

Some observations:

In the system tested, the highest throughput was achieved with between 30 and 300 client sessions (the bars with explicit percentages);
In that range using the durability settings imposed a penalty to the master in excess of 20%, almost achieving 40% at the top of the range;
Using SemiSYNC the gap to the highest performance is around 10% in that same range;
In every combination of sessions tested the throughput of Semi-SYNC produced higher throughput than asynchronous replication with durability.

2.1 Transaction latency

The main price to pay for SemiSYNC is added transaction latency, but on a fast network the overhead may be insignificant for most purposes. The following chart shows the minimal, average and 95% transaction latency observed in each of the combinations tested.

Some observations:

The latency behavior is similar to what was observed in the throughput: both asynchronous replication with durability and SemiSYNC present higher latency than asynchronous replication without durability;
on the LAN tested the latency for SemiSYNC is always lower than what is incurred when syncing to disk;
In fact, while in between, the latency is always closer to asynchronous replication without durability than it is to asynchronous replication with durability.

3. Final considerations

There are many scenarios where SemiSYNC may be useful, but it used to come with some relevant performance penalty. We believe this is no longer the case, so please try it out on your own workload and maybe you will be pleasantly surprised.

Enjoy!