WL#7488: InnoDB startup refactoring

Affects: Server-8.0   —   Status: Complete

The InnoDB startup code is currently performing the equivalent of "create if not exists" for a number of objects in its internal data dictionary.

We may introduce 3 fundamentally different modes of startup:

  1. Create a new instance
  2. Normal startup (existing instance with matching dictionary version)
  3. Upgrading the data dictionary from an older version to the newest supported

version (not implemented yet)

This worklog will split the InnoDB startup code a little, so that the individual parts of it can be invoked by the server-layer changes later.

On Windows, setting innodb_flush_method will be decoupled from innodb_use_native_aio. The value innodb_flush_method=async_unbuffered will be removed; innodb_flush_method=unbuffered can be used instead.

This work is mainly a non-functional change. We are refactoring the startup code by creating a clear framework that will be used by future tasks.

The only functional changes should be related to two areas:

Startup parameter validation

Some erroneous parameters could be detected earlier, and MYSQL_SYSVAR_ENUM will be used for validating innodb_flush_method and innodb_change_buffering. Among other things, this means the following:

  • Both parameters will allow numeric arguments, starting from 0.
  • The default value of innodb_flush_method will no longer be NULL, but unbuffered on Windows, and fsync on other systems.

Parameter validation on Windows

On Windows, setting innodb_flush_method will no longer affect innodb_use_native_aio. The values of innodb_flush_method used to be interpreted as follows:

(empty by default)
like async_unbuffered
async_unbuffered
unbuffered I/O, AIO can be enabled by innodb_use_native_aio=ON (which is the default)
normal
buffered I/O, AIO will be disabled
unbuffered
AIO will be disabled

With this work, there will be only 2 values of innodb_flush_method on Windows, both detached from innodb_use_native_aio:

unbuffered (default)
unbuffered I/O
normal
buffered I/O

So, with this work it is possible to enable asynchronous I/O with buffered I/O. This new combination (innodb_use_native_aio=ON and innodb_flush_method=normal) was tested by running mtr --big-test --mysqld=--innodb-flush-method=normal --bootstrap=--innodb-undo-tablespaces=2 --mysqld=--innodb-undo-tablespaces=2 --suite=innodb_undo.

Crash recovery

Undo log processing is a little cleaner. Tables will not be looked up before the entire data dictionary is fully accessible.

InnoDB crash recovery and start-up consists of several phases, which are currently part of the plugin->init() aka innobase_init().

WL#7488 will split or remove some functions, and move code around a little. This will allow us to split the InnoDB initialization by introducing a few handlerton methods. Later, innobase_init() would only parse the command-line parameters and set up some data structures. The actual file I/O would be done in new handlerton methods introduced later.

With this work innobase_init() will be calling all refactored steps, with almost no functional change.

Executive summary:

srv_dict_recover_on_restart()
New function, to execute crash recovery after the redo log has been applied. This used to be part of recv_recovery_from_checkpoint_finish().
trx_resurrect_table_locks()
Replaced with trx_resurrect_table_ids(), which will buffer the IDs in a new data structure resurrected_trx_tables. The tables will be looked up and locked in trx_resurrect_locks(), called by srv_dict_recover_on_restart().
srv_start_threads()
New function, to be invoked as a last step before accepting user connections.
recv_recovery_rollback_active()
Removed. The trx_rollback_or_clean_all_recovered thread will be created by srv_start_threads().

Contents


Bootstrap, Startup, Recovery and Upgrade with the Global Data Dictionary

The description below is a forward-looking statement, showing how the pieces will fall in place after the Global DD worklogs have been implemented.

PHASE 0: Recover the data dictionary and all undo logs

innobase_hton->dict_init=innobase_dict_init;

innobase_dict_init() will also hard-code the definitions of its internal tables.

STEP 0: Scan all redo log since the latest checkpoint

This used to be executed in innobase_init() which is the the plugin_init method for InnoDB. The new entry point is innobase_init_files(dict_init_mode).

If there is no MLOG_CHECKPOINT marker, the redo log is corrupted and we must refuse startup.

If there are missing or duplicate *.ibd files referred to by the redo log, refuse startup. The DBA can delete or rename files, and restart, or use --innodb-force-recovery=1 to discard the redo log records for the affected tablespaces.

If the redo log files are missing, depending on dict_init_mode we can issue a warning and start up assuming and hoping that the data files are in consistent state.

STEP 1: Apply all redo log records

STEP 2: Recover uncommitted transactions from undo logs

trx_resurrect_table_locks() will have to be split, because it depends on PHASE 1 below. Note: this will recover both XA PREPARE and incomplete transactions.

PHASE 1: Bootstrap/startup the data dictionary

This will be driven by the server layer. ha_innobase methods will be invoked for accessing DD tables.

We will hardcode the information of some tables, including storage-engine private data. The first one is a version discriminator table that will contain a single version number:

CREATE TABLE dd.version (number BIGINT UNSIGNED PRIMARY KEY);

In InnoDB, this table will be located in page 3 of the DD tablespace, so we will hardcode dd.tables.se_private_data='root=3' in the InnoDB code for this table when this have been implemented.

If the version number is too high, we will refuse startup at this point.

The bare minimum is that InnoDB hard-codes the root page number of the clustered index of dd.indexes, indexed by ID. This could be page 4 in the DD tablespace. We would then be able to do the equivalent of:

SELECT * FROM dd.indexes ORDER BY id;

to get the dd.indexes.se_private_data for some of the DD tables.

For now, the following InnoDB table definitions will be hard-coded:

innodb_index_stats
index-level persistent statistics
innodb_table_stats
table-level persistent statistics

PHASE 2: Roll back incomplete transactions that performed DDL

srv_dict_recover_on_restart(), called by innobase_dict_recover() will cover this.

STEP 4: Resurrect InnoDB table locks

This is the second part of what used to be trx_resurrect_table_locks(). The new function trx_resurrect_locks() will look up the tables based on the IDs that were collected by trx_resurrect_table_ids() during STEP 2 (and optionally STEP 3).

This will have to perform READ COMMITTED of the DD tables in order to look up each table definition by dd::Table::se_private_id and to look up the tablespace file names.

STEP 5: Roll back any incomplete DDL-only or DDL+DML transactions

NOTE: The tables affected by rollback will have been looked up in STEP 4.

Traditionally, the DML-only transactions (intentionally skipped here) have been rolled back in a background thread (STEP 8 in PHASE 4 below), to allow user connections to be accepted sooner.

STEP 6: Apply any operations from the DDL_LOG

NOTE: The DDL_LOG table is a layer on top of transactions that execute DDL. For example, CREATE TABLE would write DDL_LOG records in subtransactions, and then do DELETE FROM DDL_LOG in the main DDL transaction. If the server was killed before the CREATE TABLE was committed, STEP 5 would roll back any changes to the DD tables, and it would roll back the deletion from DDL_LOG. The DDL_LOG apply in STEP 6 would execute the actual deletion of the incomplete structures.

STEP 7: Update dd::Tablespace_file entries

If a rewrite of file names has been requested on startup, the contents of dd::Tablespace_file objects and underlying data dictionary records will be updated at this point.

PHASE 3: Upgrade of DD tables in case of dd.version mismatch

This will be driven by the server layer using the handler API.

NOTE: At this point, there may exist transactions that are holding locks on data dictionary objects:

  1. Recovered DML-only transactions (until STEP 8 covers them)
  2. Transactions that are in XA PREPARE state

When implementing upgrade between Global DD based versions of MySQL, we may want to modify STEP 5 so that all recovered transactions will be rolled back in case an upgrade is going to be performed.

Note that when we start to support arbitrary DML+DDL transactions, there could be XA PREPARE transactions that are holding locks on DD tables. These locks could conflict with the upgrade. So, at this point we should probably also require that all such transactions be terminated with either XA COMMIT or XA ROLLBACK by the operator.

PHASE 4: Start non-critical background processes

Currently, innodb_hton->dict_recover will cover this by invoking srv_start_threads() and optionally before it, srv_dict_recover_on_restart().

STEP 8: Start the background tasks on transactions

This is part of srv_dict_recover_on_restart(). The most essential changes are:

  1. Start the rollback of incomplete DML-only transactions (skipped in STEP 5).
  2. Start the purge of delete-marked records and undo logs.

NOTE: Rollback and purge will have to perform READ COMMITTED of the DD tables in order to look up the table definitions by dd::Table::se_private_id.

PHASE 5: Start accepting normal connections

The changes will both adapt to the API changes and refactor the InnoDB startup.

innobase_init() will perform all validation of start-up parameters, initialize some main-memory data structures, and will not access or open any files.

Contents


Hard-coding internal tables

We will invoke a new method innobase_dict_init() that is mapped to innobase_hton->dict_init at startup. It will hard-code the metadata for some InnoDB-internal tables, and then access the InnoDB files by invoking the new function innobase_init_files(dict_init_mode). The dict_init_mode parameter of innobase_dict_init() takes one of the following values:

DICT_INIT_CREATE_FILES
Create all required SE files
DICT_INIT_CREATE_MISSING_FILES
Use files that already exist
DICT_INIT_CHECK_FILES
Verify existence of expected files
DICT_INIT_IGNORE_FILES
Don't care about files at all

Decoupling undo log and redo log processing

Now that InnoDB will depend on a separate subsystem for populating the data dictionary cache, the InnoDB undo log processing will have to be detached from the redo log based recovery.

A new function srv_dict_recover_on_restart() will be invoked after the Global Data Dictionary subsystem has been started up, by innobase_dict_recover(DICT_RECOVERY_RESTART_SERVER, version) which is mapped to innobase_hton->dict_recover.

The srv_dict_recover_on_restart() will conduct a few tasks:

  1. Resurrect table IX locks for recovered transactions (table ID lookup).
  2. Special crash recovery for DDL operations.
  3. Initiate background roll back of any incomplete DML transactions.

recv_recovery_from_checkpoint_finish() must not initiate rollback, because the data dictionary will not have been started up yet. Instead, srv_dict_recover_on_restart() will invoke trx_rollback_or_clean_recovered().

recv_recovery_rollback_active() will be removed, and the the background rollback thread will be created in srv_start_threads().

The procedure trx_resurrect_table_locks() will be split into trx_resurrect_table_ids() and trx_resurrect_locks(). During the undo log scan before starting up the data dictionary, trx_resurrect_table_ids() will record the transactions and table IDs in a new structure resurrected_trx_tables. When the data dictionary is available, srv_dict_recover_on_restart() will invoke trx_resurrect_locks.

Thread creation at startup

Because InnoDB startup will no longer implement any ‘create if not exists’ semantics, we will need a special startup code path when initializing a new server instance with innobase_dict_recover(DICT_RECOVERY_INITIALIZE_SERVER, version). In this mode, srv_dict_recover_on_restart() will not be invoked, because we know that all internal data structures will be empty, or initialized to a predefined state.

In both startup modes (initialize or restart), at the end of innobase_dict_recover() a new procedure srv_start_threads() will start InnoDB maintenance threads. Only the I/O and page cleaner threads were started before this:

trx_rollback_or_clean_all_recovered
rollback of recovered transactions
buf_resize_thread
buffer pool resizer
srv_master_thread
the master background thread
srv_purge_coordinator_thread
the purge coordinator thread
srv_purge_worker_thread
the purge worker threads
buf_dump_thread
buffer pool dumper or loader
dict_stats_thread
statistics gatherer
fts_optimize_thread
fulltext index optimizer

Both srv_dict_recover_on_restart() and srv_start_threads() are split out from innobase_start_or_create_for_mysql(), which will only open the files, apply the redo log and scan the undo log. (PHASE 0: Recover the data dictionary and all undo logs.)

Shutdown

Because the call to srv_start_threads() may be omitted for example when mysql-test-run invokes the server with mysqld --verbose --help, the InnoDB shutdown has to tolerate partially initialized data structures. The flag srv_was_started will be removed, and the shutdown of subsystems will be idempotent (‘close if not closed’).

Startup cleanup

  1. Move all startup parameter checks from innobase_start_or_create_for_mysql() to innobase_init(), except those that cannot be checked before accessing the file system.
  2. Rename innobase_start_or_create_for_mysql() to srv_start(bool create_new_db.