MySQL :: MySQL Shell Dump & Load part 4: Dump Instance & Schemas

MySQL Shell 8.0.21 comes with two utilities which can be used to perform logical dumps of all the schemas from an instance (util.dumpInstance()) or selected schemas (util.dumpSchemas()). These tools offer major performance improvements over the mysqldump utility, some of our benchmarks show a throughput up to 3GB/s! Let’s see how this is achieved.

This is part 4 of the blog post series about MySQL Shell Dump & Load:

Parallelization

The dump process uses multiple threads to perform its tasks (the number of threads can be specified using the threads option), allowing for parallel execution of time-consuming operations. Each thread opens its own connection to the target server and can be working on dumping data, dumping DDL or splitting the table data into chunks.

When the consistent option is set to true (which is the default value), the dump is going to be consistent (as long as the dumped tables use the InnoDB engine). When the dump process is started, a global read lock is set by the global Shell session using the FLUSH TABLES WITH READ LOCK statement. Next, all threads establish connections with the server and start transactions using:

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
START TRANSACTION WITH CONSISTENT SNAPSHOT

Once all the threads start transactions, the instance is locked for backup and the global read lock is released.

Chunking

It’s usually the case that the size of tables in a schema varies a lot, there are a few very large tables and many smaller ones. Adding more threads will not speed up the dumping process in such a case. In order to overcome this problem, the data in tables can be divided into smaller chunks, each of which will be dumped to a separate file by one of the threads.

Our chunking algorithm strives to find the primary key (or unique index) values that partition the table rows into approximately equal size chunks. It uses EXPLAIN statements to determine the chunk sizes, acquiring row count estimates from the optimizer, as opposed to e.g. SELECT COUNT(*), which would need to scan rows one by one. These estimates are often not very accurate, but it’s significantly faster and good enough for our purposes.

The chunking is turned on by default and will result in files containing roughly 32MB of uncompressed data. These features can be set up using the chunking and bytesPerChunk options respectively, it’s usually a good idea to increase the latter if your data set is very large.

Output format

The mysqldump, mysqlpump and mydumper utilities write the data as a series of INSERT statements, while the new Shell utilities use the default format expected by the LOAD DATA statement. This results in output files that are smaller on average, meaning that they can be produced more quickly. As an additional benefit, loading such files is much faster.

Compression

By default, all data files are compressed using the zstd algorithm, which offers a good balance between compression ratio and encoding performance. The other supported algorithm is gzip. Compression can also be disabled by setting the compression option to none.

Code optimization

The code of the dumper has been heavily benchmarked. In the course of this process we identified and fixed various issues throughout the Shell’s code base, like unnecessary data copies, redundant function calls, superfluous virtual calls. The “hot” function, the one that writes data to the output files, has been carefully analyzed and optimized.

Performance

Here’s a comparison of performance of the two new utilities provided by the Shell and mysqldump, mysqlpump and mydumper:

For more benchmark results, please see MySQL Shell Dump & Load part 2: Benchmarks

What’s next?

The util.dumpInstance() and util.dumpSchemas() utilities provide some major performance boost over the existing tools, but there’s still some room for improvement. In particular, our chunking algorithm could certainly be optimized and this is something that we plan to tackle in the future. Stay tuned!