MySQL :: Contention-Aware Transaction Scheduling Arriving in InnoDB to Boost Performance

Authors: Sunny Bains, Jiamin Huang (University of Michigan)

What is Transaction Scheduling?

Locking is one of the most popular mechanisms for concurrency control in most database systems, including Oracle MySQL. One major question, however, seems to have been overlooked by all database vendors:

Q: When multiple transactions are waiting for a lock on the same object, which one(s) should get the lock first?

Nearly all database systems, including previous versions of MySQL, rely on some variant of the First-In-First-Out (FIFO) policy. In a nutshell, FIFO grants locks to those who requested it first (i.e., transactions that are ahead of the queue, unless they’re incompatible with the locks that are currently granted). Because of its simplicity, most database vendor have settled for FIFO and its variants without considering other strategies.

Recently, a research group at the University of Michigan pointed out that behind this seemingly simple question lies an excellent opportunity for performance improvement. Professor Mozafari and his PhD students showed that different methods for lock scheduling have significant implications on the overall performance of a transactional database. They invented a new algorithm, Contention-Aware Transaction Scheduling (CATS), which can dramatically reduce latency and increase throughput when used in place of a FIFO approach.

The MySQL team at Oracle has been working closely with the inventors of CATS (Mozafari and his team), making MySQL the first database to integrate this new technology. Starting with MySQL 8.0.3, CATS will now be the default scheduling algorithm in InnoDB, which means all MySQL users will experience a dramatic boost in performance, especially for contentious workloads.

How Does CATS Work?

The CATS algorithm is based on a simple intuition: not all transactions are equal, and not all objects are equal. When a transaction already has a lock on many popular objects, it should get priority when it requests a new lock. In other words, unblocking such a transaction will indirectly contribute to unblocking many more transactions in the system, which means higher throughput and lower latency overall.

Here’s an analogy: if a cab driver and a bus driver are waiting in a line for coffee, making coffee for the bus driver first (even if s/he arrived after the cab driver) will allow more people to arrive at their destination sooner (since there are more passengers on the bus than in the cab). While this might seem unfair for the cab driver, because of the complex inter-dependencies, this strategy would make the entire system so much faster that everyone (even the cab driver!) would benefit from a more efficient service.

Of course, we’re dealing with locks and transactions and not busy transportation drivers. Let’s use a toy example to illustrate how CATS works in the world of databases. We know that transactions must acquire locks on every data object before reading or updating it, depending on the isolation level. If there are incompatible locks already held on that object (say by other transactions), the current transaction will be blocked until those other locks are released. The transactions that already have a lock on this object may themselves get blocked on other objects (the deadlock detection/prevention mechanism will ensure that there is no loop here). Consider the following figure.

In this situation, FIFO would decide by simply looking at which transaction requested O1 first. However, the CATS algorithm does something smarter: it counts the total number of transactions that are blocked by each transaction (directly or indirectly) and grants the lock to the one that is blocking more transactions. In this case, there are four transactions blocked by t1 and three transactions blocked by t2. So, CATS would grant the lock to t1 first, which will enable more transactions (four, rather than three) to be unblocked sooner, and thus allow for greater and faster progress in the system.

For shared locks, CATS will also try to grant as many locks as possible. Here, the difference between FIFO and CATS is that FIFO stops granting locks when it encounters an exclusive lock according to their time of arrival, whereas CATS sorts transactions in the descending order of the number of transactions that they block.

Performance Improvement

Dimitri Kravtchuk from the Oracle tram has tested this new algorithm extensively against the Sysbench workload under the “OLTP_RW-pareto” scenario on both high-end and low-end hardware. Results show that, when there is contention in the system, CATS delivers significantly higher performance than FIFO in terms of transactions per second (TPS), mean latency, and 95^th percentile latency (see below). Interestingly, even when there is no contention, CATS can still deliver the same performance as FIFO. This is because when there is no contention, there is no scheduling decision to be made (i.e., there is at most one transaction waiting for each lock). In other words, by using CATS instead of FIFO, you won’t lose anything, but stand to gain a lot as soon as there is contention in the system.

CATS vs. FIFO in TPS, mean latency and 95th percentile (up to 5.05x improvement)

# of users	TPS			Mean Latency (ms)			95th Percentile (ms)
# of users	FIFO	CATS	Improvement	FIFO	CATS	Improvement	FIFO	CATS	Improvement
32	13187	13127	1.00x	2.42	2.43	1.00x	2.82	2.79	1.01x
64	20109	20155	1.00x	3.18	3.18	1.00x	5.95	5.75	1.03x
128	11705	19741	1.69x	10.93	6.49	1.68x	42.53	18.78	2.26x
256	6237	18770	3.01x	41.01	13.63	3.01x	211.44	80.83	2.62x
512	2936	14836	5.05x	174.07	34.49	5.05x	1007.32	214.89	4.69x

Conclusion

MySQL is one of the first databases in the world to adopt this state-of-the-art transaction scheduling technology, Contention-Aware Transaction Scheduling (CATS). This algorithm solves the issue of rapid performance degradation that many face when the load on their database system increases, which is one of the main problems MySQL 8.0 aims to resolve.

CATS (or VATS as it was previously called) kicks in once the number of waiting transactions exceeds 32. There is no configuration option. The value of 32 was chosen after empirical testing.

About Jiamin Huang – Jiamin Huang is a Ph.D. student from the database group of University of Michigan. His research focuses on improving the performance of database systems. He’s under the supervision of professor Barzan Mozafari.