WL#7488: InnoDB startup refactoring
The InnoDB startup code is currently performing the equivalent of "create if not exists" for a number of objects in its internal data dictionary.
We may introduce 3 fundamentally different modes of startup:
- Create a new instance
- Normal startup (existing instance with matching dictionary version)
- Upgrading the data dictionary from an older version to the newest supported
version (not implemented yet)
This worklog will split the InnoDB startup code a little, so that the individual parts of it can be invoked by the server-layer changes later.
On Windows, setting
innodb_flush_method will be decoupled from
innodb_use_native_aio. The value
innodb_flush_method=async_unbuffered will be removed;
innodb_flush_method=unbuffered can be used instead.
This work is mainly a non-functional change. We are refactoring the startup code by creating a clear framework that will be used by future tasks.
The only functional changes should be related to two areas:
Startup parameter validation
Some erroneous parameters could be detected earlier, and
MYSQL_SYSVAR_ENUM will be used for validating
Among other things, this means the following:
- Both parameters will allow numeric arguments, starting from 0.
- The default value of
innodb_flush_methodwill no longer be NULL, but
unbufferedon Windows, and
fsyncon other systems.
Parameter validation on Windows
On Windows, setting
innodb_flush_method will no longer affect
innodb_use_native_aio. The values of
innodb_flush_method used to be interpreted as follows:
- (empty by default)
- unbuffered I/O, AIO can be enabled by
innodb_use_native_aio=ON(which is the default)
- buffered I/O, AIO will be disabled
- AIO will be disabled
With this work, there will be only 2 values of
innodb_flush_method on Windows, both detached from
- unbuffered I/O
- buffered I/O
So, with this work it is possible to enable asynchronous I/O with buffered I/O.
This new combination
was tested by running
mtr --big-test --mysqld=--innodb-flush-method=normal --bootstrap=--innodb-undo-tablespaces=2 --mysqld=--innodb-undo-tablespaces=2 --suite=innodb_undo.
Undo log processing is a little cleaner. Tables will not be looked up before the entire data dictionary is fully accessible.
InnoDB crash recovery and start-up consists of several phases, which
are currently part of the
WL#7488 will split or remove some functions, and move code around a little.
This will allow us to split the InnoDB initialization by introducing
handlerton methods. Later,
innobase_init() would only parse the command-line
parameters and set up some data structures. The actual file I/O would
be done in new
handlerton methods introduced later.
With this work
innobase_init() will be calling all refactored steps, with almost no functional change.
- New function, to execute crash recovery after the redo log has been applied. This used to be part of
- Replaced with
trx_resurrect_table_ids(), which will buffer the IDs in a new data structure
resurrected_trx_tables. The tables will be looked up and locked in
trx_resurrect_locks(), called by
- New function, to be invoked as a last step before accepting user connections.
- Removed. The
trx_rollback_or_clean_all_recoveredthread will be created by
Bootstrap, Startup, Recovery and Upgrade with the Global Data Dictionary
The description below is a forward-looking statement, showing how the pieces will fall in place after the Global DD worklogs have been implemented.
PHASE 0: Recover the data dictionary and all undo logs
innobase_dict_init() will also hard-code the definitions of
its internal tables.
STEP 0: Scan all redo log since the latest checkpoint
This used to be executed in
which is the the
plugin_init method for InnoDB.
The new entry point is
If there is no
MLOG_CHECKPOINT marker, the redo log is corrupted
and we must refuse startup.
If there are missing or duplicate
*.ibd files referred to by
the redo log, refuse startup. The DBA can delete or rename files, and restart,
--innodb-force-recovery=1 to discard the redo log records
for the affected tablespaces.
If the redo log files are missing, depending on
we can issue a warning and start up assuming and hoping that the data files
are in consistent state.
STEP 1: Apply all redo log records
STEP 2: Recover uncommitted transactions from undo logs
trx_resurrect_table_locks() will have to be split, because
it depends on PHASE 1 below.
Note: this will recover both
XA PREPARE and incomplete
PHASE 1: Bootstrap/startup the data dictionary
This will be driven by the server layer.
ha_innobase methods will be invoked for accessing DD tables.
We will hardcode the information of some tables, including storage-engine private data. The first one is a version discriminator table that will contain a single version number:
CREATE TABLE dd.version (number BIGINT UNSIGNED PRIMARY KEY);
In InnoDB, this table will be located in page 3 of the DD tablespace, so we will hardcode dd.tables.se_private_data='root=3' in the InnoDB code for this table when this have been implemented.
If the version number is too high, we will refuse startup at this point.
The bare minimum is that InnoDB hard-codes the root page number of the clustered index of dd.indexes, indexed by ID. This could be page 4 in the DD tablespace. We would then be able to do the equivalent of:
SELECT * FROM dd.indexes ORDER BY id;
to get the dd.indexes.se_private_data for some of the DD tables.
For now, the following InnoDB table definitions will be hard-coded:
- index-level persistent statistics
- table-level persistent statistics
PHASE 2: Roll back incomplete transactions that performed DDL
srv_dict_recover_on_restart(), called by
innobase_dict_recover() will cover this.
STEP 4: Resurrect InnoDB table locks
This is the second part of what used to be
trx_resurrect_table_locks(). The new function
trx_resurrect_locks() will look up the tables based on
the IDs that were collected by
during STEP 2 (and optionally STEP 3).
This will have to perform
READ COMMITTED of the DD tables
in order to look up each table definition by
dd::Table::se_private_id and to look up the tablespace
STEP 5: Roll back any incomplete DDL-only or DDL+DML transactions
NOTE: The tables affected by rollback will have been looked up in STEP 4.
Traditionally, the DML-only transactions (intentionally skipped here) have been rolled back in a background thread (STEP 8 in PHASE 4 below), to allow user connections to be accepted sooner.
STEP 6: Apply any operations from the DDL_LOG
DDL_LOG table is a layer on top of transactions
that execute DDL. For example,
CREATE TABLE would write
DDL_LOG records in subtransactions, and then do
DDL_LOG in the main DDL transaction. If the server was killed
CREATE TABLE was committed, STEP 5 would roll
back any changes to the DD tables, and it would roll back the deletion
DDL_LOG apply in STEP 6
would execute the actual deletion of the incomplete structures.
STEP 7: Update
If a rewrite of file names has been requested on startup, the contents
dd::Tablespace_file objects and underlying data dictionary
records will be updated at this point.
PHASE 3: Upgrade of DD tables in case of dd.version mismatch
This will be driven by the server layer using the handler API.
NOTE: At this point, there may exist transactions that are holding locks on data dictionary objects:
- Recovered DML-only transactions (until STEP 8 covers them)
- Transactions that are in
When implementing upgrade between Global DD based versions of MySQL, we may want to modify STEP 5 so that all recovered transactions will be rolled back in case an upgrade is going to be performed.
Note that when we start to support arbitrary DML+DDL transactions, there could be
XA PREPARE transactions that are holding locks on DD tables. These locks could conflict with the upgrade. So, at this point we should probably also require that all such transactions be terminated with either
XA COMMIT or
XA ROLLBACK by the operator.
PHASE 4: Start non-critical background processes
innodb_hton->dict_recover will cover this by
srv_start_threads() and optionally before it,
STEP 8: Start the background tasks on transactions
This is part of
The most essential changes are:
- Start the rollback of incomplete DML-only transactions (skipped in STEP 5).
- Start the purge of delete-marked records and undo logs.
NOTE: Rollback and purge will have to perform
COMMITTED of the DD tables in order to look up the table
PHASE 5: Start accepting normal connections
The changes will both adapt to the API changes and refactor the InnoDB startup.
innobase_init() will perform all validation of start-up
parameters, initialize some main-memory data structures, and will not
access or open any files.
Hard-coding internal tables
We will invoke a new method
innobase_dict_init() that is
innobase_hton->dict_init at startup. It
will hard-code the metadata for some InnoDB-internal tables, and then
access the InnoDB files by invoking the new function
dict_init_mode parameter of
innobase_dict_init() takes one of the following values:
- Create all required SE files
- Use files that already exist
- Verify existence of expected files
- Don't care about files at all
Decoupling undo log and redo log processing
Now that InnoDB will depend on a separate subsystem for populating the data dictionary cache, the InnoDB undo log processing will have to be detached from the redo log based recovery.
A new function
srv_dict_recover_on_restart() will be
invoked after the Global Data Dictionary subsystem has been started
version) which is mapped to
srv_dict_recover_on_restart() will conduct a few tasks:
- Resurrect table IX locks for recovered transactions (table ID lookup).
- Special crash recovery for DDL operations.
- Initiate background roll back of any incomplete DML transactions.
recv_recovery_from_checkpoint_finish() must not initiate
rollback, because the data dictionary will not have been started up
srv_dict_recover_on_restart() will invoke
recv_recovery_rollback_active() will be removed, and the
the background rollback thread will be created in
trx_resurrect_table_locks() will be split
trx_resurrect_locks(). During the undo log scan before
starting up the data dictionary,
trx_resurrect_table_ids() will record the transactions
and table IDs in a new structure
resurrected_trx_tables. When the data dictionary is
srv_dict_recover_on_restart() will invoke
Thread creation at startup
Because InnoDB startup will no longer implement any ‘create if not
exists’ semantics, we will need a special startup code path when
initializing a new server instance with
In this mode,
srv_dict_recover_on_restart() will not be
invoked, because we know that all internal data structures will be
empty, or initialized to a predefined state.
In both startup modes (initialize or restart), at the end of
innobase_dict_recover() a new procedure
srv_start_threads() will start
InnoDB maintenance threads. Only the I/O and page cleaner threads
were started before this:
- rollback of recovered transactions
- buffer pool resizer
- the master background thread
- the purge coordinator thread
- the purge worker threads
- buffer pool dumper or loader
- statistics gatherer
- fulltext index optimizer
srv_start_threads() are split out from
innobase_start_or_create_for_mysql(), which will
only open the files, apply the redo log and scan the undo log.
(PHASE 0: Recover the data dictionary and all undo logs.)
Because the call to
srv_start_threads() may be omitted
for example when
mysql-test-run invokes the server with
mysqld --verbose --help, the InnoDB shutdown has to tolerate
partially initialized data structures. The flag
will be removed, and the shutdown of subsystems will be idempotent
(‘close if not closed’).
- Move all startup parameter checks from
innobase_init(), except those that cannot be checked before accessing the file system.