MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Atomic DDL in MySQL 8.0

The new data dictionary in MySQL 8.0 is mostly transparent to users upgrading, in that an import process is automatically run on first-start, and the semantics of DDL (creating tables, adding indexes etc) remain the same. There is however one major exception, and that is how failure cases are handled. In MySQL 8.0 DDL is now atomic.

Introduction

Some might wonder: “why has atomic DDL not been implemented before”? The matter is complicated, and parts of an answer for this is that in previous MySQL versions, persistent storage of data dictionary information have been scattered across files, non-transactional storage and transaction storage, as mentioned in MySQL 8.0 Data Dictionary: Background and Motivation .

But now that we are storing all MySQL meta data in a uniform way in  InnoDB tables,  see MySQL 8.0: Data Dictionary Architecture and Design, we are able to implement Atomic DDLs in MySQL 8.0 . This topic was presented at Percona Live 17 Dublin, in the talk: MySQL 8.0: Atomic DDLs – Implementation and Impact

Behavioral changes visible to the user

The user visible changes with the introduction of atomic DDL is tightly connected to the way crash-safeness is improved and handled in MySQL 8.0. We will discuss this later with some detailed examples below. At high level we can say that that we atomic DDL it about creating a single atomic transaction by:

  1. Do updates to the Data Dictionary (DD SE)
  2. Do necessary changes in the Storage Engine
  3. Do necessary writes to binary log

Since it is a single transaction, all updates to Data Dictionary can be rolled back and are crash safe

So for the user this will then mean that we have some changes to behavior in the following DDLs for engines supporting  atomic DDL:

  1. DROP TABLES <tables> – All tables will be dropped, or none are dropped
  2. DROP SCHEMA – All entities in the schema will be dropped, or none are dropped
  3. DROP VIEW – All views are dropped, or none are dropped
  4. CREATE USER – All users are created, or none are created
  5. DROP USER – All users are dropped, or none are dropped
  6. GRANT – All users/roles in list are granted privileges, or none

There are also other user visible changes, but they are connected to the crash-safeness handling in MySQL 8.0, that should not leave any orphaned files or index-trees. In MySQL 8.0 this is achieved by  InnoDB implementing a special internal  DDL_LOG, that during DDL execution keeps track of creation of  files and structures, which is then used at commit/rollback to clean up properly.  See below for more details on DDL_LOG.

NOTE: All of the above do require that the storage engine that is storing this information has implemented atomic DDL support. So storage engines that do not support atomic DDL, are exempted from  DDL atomicity, and partial updates to tables may still happen, as you will see in examples below. The probability for partial updates for storage engines that do NOT support atomic DDL is also significantly reduced.

The DDL_LOG in InnoDB

The DDL_LOG is a protected table in mysql tablespace to which  no DDL and no user DML is allowed. For all entries added to the  DDL_LOG a trx Id is attached, so at commit/rollback, these entried can be identified with the transaction, and proper actions can be taken. After a commit/rollback and appropriate actions by InnoDB, all entries for the transaction will be deleted from the DDL_LOG.

Changes to DDL_LOG are persisted ASAP. This prevents data files updates and a situation at a potential crash scenario where the DDL_LOG entries are not flushed. Also note that flushing of DDL_LOG is exempted from innodb_flush_log_at_trx_commit control for delayed flush.

Why is this a big deal, and how is it connected to user visible behavioral changes? Let us consider the following, related to orphaned files and index trees after a crash scenario:

Before MySQL 8.0 and pre atomic and crash-safe DDL implementation, a potential MySQL server crash could leave “orphan” InnoDB ibd files, “orphan” index trees/pages. Orphaned files/trees could then potentially cause problems for subsequent DDLs and they would also waste space in the tablespace.

With MySQL 8.0, and the new atomic and crash-safe DDL implementation, no ibd files/index trees will be orphaned if there is a crash and there is no commit of a transaction.  At commit, the old ibd files/index trees will be dropped. See examples below for more details.

Examples of implementation of Atomic and Crash Safe DDL

Let us first look at the implementation details for  CREATE TABLE as an example where there are no user visible changes:

  1. At SQL layer,  create table object, and then make an initial function call to the SE, so the SE can initiate it’s handling of the DDL.
  2. The SE adds the SE private data it has at this point in time, and returns the control to the SQL layer.
  3. Then the TABLE object is stored in the DD tables. Note that for SEs supporting atomic DDL, there is no commit at this point in time.
  4. The SQL layer can now do a an “Open table” operation, that will construct all internal structures, like TABLE_SHARE etc. The SQL layer then calls the SE handler::create(name,…).
  5. The SE creates the physical tablespaces/files(file-per-table) or Cluster index tree and other index trees. The SE then logs the above physical file and indexes creations in the DDL_LOG, and updates SE private data, and returns control to the SQL layer. Note: No separate SE transaction, all info for new tablespace/indexes are passed back to server with DD objects.
  6. The SQL layer now writes the statement to the binary log, and based on the status of the execution, it will commit or rollback the transaction.
  7. After commit/rollback, the SQL layer will call a post_ddl() hook in the SE, which enables the SE to do a proper cleanup of files and trees, and will also remove entries in the DDL_LOG for the transaction. If the transaction is rolled back, the post_ddl() physically delete the tablespace/ibd (file-per-table) and drop the index trees for the table. Note: If a crash occurs during the DDL execution, during recovery,  the SE will use the  info in DDL_LOG to delete the ibd/index trees to ensure a consistent state.

 

Then let us look at a more complex example DROP TABLES, which also has user visible changes, both in scenarios under normal operation, but also for crash safeness. This implementation is rather complex, as there are many parameters that influences the execution:

  1. The SQL layer for examines the list of tables in the DROP TABLES statements, and categorizes the in the following categories:
    1. Non-existent table
    2. Base table in non-atomic SE
    3. Base table in atomic SE
    4. Temporary table in a non-transactional SE
    5. Temporary table in a transactional SE
  2. If there are tables in category 1, regardless of number of files in category 2 and 3, and no IF EXISTS clause is involved, the DDL statement will err out and give a proper error message. This means there is a change in behavior even for storage engines that do not support atomic DDL.
  3. The SQL layer will then start to handle tables in category 2, if any.
    1. Call to SE handler:ha_delete_table() and the SE will the delete the necessary files, and return to SQL layer
    2. The SQL layer will drop the information about the table in the data dictionary tables and commit (this intermediate commit is to help out in a potential crash scenario)
    3. If there are no GTID assigned to the statement, a DROP TABLE statement is written to the binlog. This is done to improve crash safety for engines that do not support atomic DROP TABLE. Unlike in 5.7 this means that the difference between binary log and DD/SE contents in case of crash will be limited to at most 1 table.
  4. This is THE MAIN usecase, where the DROP TABLE statement will drop tables in an engine that supports atomic DROP TABLE, and in combination with 5 and 6 below, ensures that the statement is atomic and crash safe. The SQL layer will handle tables in category 3, if any
    1. Call to SE handler:ha_delete_table(), where InnoDB only logs actions to delete ibd file or index trees in DDL_LOG. So there is no physical action at this pint in time (No InnoDB system tables needs to be updated), return to SQL layer
    2. The SQL layer will drop the information about the table in the data dictionary tables, but will not commit at this point in time
  5. If the statement is not assigned GTID or it is assigned GTID but we have only tables from categories 1 and 3 we write DROP TABLES statement for tables from categories 1 and 3 to the binary log (category 1 can be non-empty at this point only if DROP TABLES has IF EXISTS clause).  The latter is to ensure atomicity for the case when we execute on slave in GTID mode and all tables belong to SEs supporting atomic DDL.
  6. The SQL layer then decides if the statements should be committed or rolled back, and the calls the post_ddl() hook in the SE. Note that concurrent DDL on the tables is blocked now. Note2, if a crash happens during this phase,  the post_ddl() hook actions below will be executed during recovery.
    1. In the post_ddl() hook: If commit, InnoDB physically deletes ibd and drops index trees. If rollback, delete entries in DDL_LOG, files and trees are left as they are. Note: If crash and DROP transaction is committed, the recovery process will retrieve info from the DDL_log, and delete files/trees during crash recovery.
  7. If the statement is assigned GTID and there are tables in category 2, so write to binlog is therefore postponed from item 5 above,  DROP TABLES statement with all tables we dropped is written to binlog. This is done now to handle GTID mode from pre-8.0 servers.  Note that this a compromise that is not crash safe wrt binlog/GTID update, since we cannot split a statement with GTID assigned.
  8. The SQL layer will then handle tables in category 4, if any
    1. For each temp table call SE handler::delete_table(), and the SE will delete necessary files.
    2. The SQL layer will remove the information about the temporary table from it’s internal structures
  9. Now the SQL layer will write DROP TEMPORARY TABLES for all temp tables above to the binary log
  10. The SQL layer will then handle tables in category 5, if any
    1. For each temp table call SE handler::delete_table(), and the SE will delete necessary files and trees.
    2. The SQL layer will remove the information about the temporary table from it’s internal structures
  11. Then at last the SQL layer will write DROP TEMPORARY TABLES for all temp tables above to the binary log

In the presentation MySQL 8.0: Atomic DDLs – Implementation and Impact you will find a comparison on how CREATE TABLE and DROP TABLES are handled in MySQL 5.7 . In this presentation you will also see examples on how user management atomic DDL differs from the user management DDLs in MySQL 5.7 .

Summary

As we can see from the descriptions and examples above, atomic DDL and improved crash safeness in MySQL 8.0 is a great improvement wrt to intuitive behavior and consistency. In a replicated environment this is especially important since the probability for slave drift caused by crashes during execution of DDL on master/slave is eliminated if only InnoDB is used, and  significantly reduced for SEs not supporting atomic DDL. With this new implementation in MySQL 8.0 we have eliminated possibilities for internal consistencies in the server and SE after crash.