WL#7488: InnoDB startup refactoring
The InnoDB startup code is currently performing the equivalent of "create if not exists" for a number of objects in its internal data dictionary.
We may introduce 3 fundamentally different modes of startup:
- Create a new instance
- Normal startup (existing instance with matching dictionary version)
- Upgrading the data dictionary from an older version to the newest supported
version (not implemented yet)
This worklog will split the InnoDB startup code a little, so that the individual parts of it can be invoked by the server-layer changes later.
On Windows, setting innodb_flush_method
will be decoupled from innodb_use_native_aio
. The value innodb_flush_method=async_unbuffered
will be removed; innodb_flush_method=unbuffered
can be used instead.
This work is mainly a non-functional change. We are refactoring the startup code by creating a clear framework that will be used by future tasks.
The only functional changes should be related to two areas:
Startup parameter validation
Some erroneous parameters could be detected earlier, and MYSQL_SYSVAR_ENUM
will be used for validating innodb_flush_method
and innodb_change_buffering
.
Among other things, this means the following:
- Both parameters will allow numeric arguments, starting from 0.
- The default value of
innodb_flush_method
will no longer be NULL, butunbuffered
on Windows, andfsync
on other systems.
Parameter validation on Windows
On Windows, setting innodb_flush_method
will no longer affect innodb_use_native_aio
. The values of innodb_flush_method
used to be interpreted as follows:
- (empty by default)
- like
async_unbuffered
async_unbuffered
- unbuffered I/O, AIO can be enabled by
innodb_use_native_aio=ON
(which is the default) normal
- buffered I/O, AIO will be disabled
unbuffered
- AIO will be disabled
With this work, there will be only 2 values of innodb_flush_method
on Windows, both detached from innodb_use_native_aio
:
unbuffered
(default)- unbuffered I/O
normal
- buffered I/O
So, with this work it is possible to enable asynchronous I/O with buffered I/O.
This new combination
(innodb_use_native_aio=ON
and innodb_flush_method=normal
)
was tested by running
mtr --big-test --mysqld=--innodb-flush-method=normal --bootstrap=--innodb-undo-tablespaces=2 --mysqld=--innodb-undo-tablespaces=2 --suite=innodb_undo
.
Crash recovery
Undo log processing is a little cleaner. Tables will not be looked up before the entire data dictionary is fully accessible.
InnoDB crash recovery and start-up consists of several phases, which
are currently part of the plugin->init()
aka
innobase_init()
.
WL#7488 will split or remove some functions, and move code around a little.
This will allow us to split the InnoDB initialization by introducing
a few handlerton
methods. Later,
innobase_init()
would only parse the command-line
parameters and set up some data structures. The actual file I/O would
be done in new handlerton
methods introduced later.
With this work innobase_init()
will be calling all refactored steps, with almost no functional change.
Executive summary:
srv_dict_recover_on_restart()
- New function, to execute crash recovery after the redo log has been applied. This used to be part of
recv_recovery_from_checkpoint_finish()
. trx_resurrect_table_locks()
- Replaced with
trx_resurrect_table_ids()
, which will buffer the IDs in a new data structureresurrected_trx_tables
. The tables will be looked up and locked intrx_resurrect_locks()
, called bysrv_dict_recover_on_restart()
. srv_start_threads()
- New function, to be invoked as a last step before accepting user connections.
recv_recovery_rollback_active()
- Removed. The
trx_rollback_or_clean_all_recovered
thread will be created bysrv_start_threads()
.
Bootstrap, Startup, Recovery and Upgrade with the Global Data Dictionary
The description below is a forward-looking statement, showing how the pieces will fall in place after the Global DD worklogs have been implemented.
PHASE 0: Recover the data dictionary and all undo logs
innobase_hton->dict_init=innobase_dict_init;
innobase_dict_init()
will also hard-code the definitions of
its internal tables.
STEP 0: Scan all redo log since the latest checkpoint
This used to be executed in innobase_init()
which is the the plugin_init
method for InnoDB.
The new entry point is innobase_init_files(dict_init_mode)
.
If there is no MLOG_CHECKPOINT
marker, the redo log is corrupted
and we must refuse startup.
If there are missing or duplicate *.ibd
files referred to by
the redo log, refuse startup. The DBA can delete or rename files, and restart,
or use --innodb-force-recovery=1
to discard the redo log records
for the affected tablespaces.
If the redo log files are missing, depending on dict_init_mode
we can issue a warning and start up assuming and hoping that the data files
are in consistent state.
STEP 1: Apply all redo log records
STEP 2: Recover uncommitted transactions from undo logs
trx_resurrect_table_locks()
will have to be split, because
it depends on PHASE 1 below.
Note: this will recover both XA PREPARE
and incomplete
transactions.
PHASE 1: Bootstrap/startup the data dictionary
This will be driven by the server layer.
ha_innobase
methods will be invoked for accessing DD tables.
We will hardcode the information of some tables, including storage-engine private data. The first one is a version discriminator table that will contain a single version number:
CREATE TABLE dd.version (number BIGINT UNSIGNED PRIMARY KEY);
In InnoDB, this table will be located in page 3 of the DD tablespace, so we will hardcode dd.tables.se_private_data='root=3' in the InnoDB code for this table when this have been implemented.
If the version number is too high, we will refuse startup at this point.
The bare minimum is that InnoDB hard-codes the root page number of the clustered index of dd.indexes, indexed by ID. This could be page 4 in the DD tablespace. We would then be able to do the equivalent of:
SELECT * FROM dd.indexes ORDER BY id;
to get the dd.indexes.se_private_data for some of the DD tables.
For now, the following InnoDB table definitions will be hard-coded:
innodb_index_stats
- index-level persistent statistics
innodb_table_stats
- table-level persistent statistics
PHASE 2: Roll back incomplete transactions that performed DDL
srv_dict_recover_on_restart()
, called by
innobase_dict_recover()
will cover this.
STEP 4: Resurrect InnoDB table locks
This is the second part of what used to be
trx_resurrect_table_locks()
. The new function
trx_resurrect_locks()
will look up the tables based on
the IDs that were collected by trx_resurrect_table_ids()
during STEP 2 (and optionally STEP 3).
This will have to perform READ COMMITTED
of the DD tables
in order to look up each table definition by
dd::Table::se_private_id
and to look up the tablespace
file names.
STEP 5: Roll back any incomplete DDL-only or DDL+DML transactions
NOTE: The tables affected by rollback will have been looked up in STEP 4.
Traditionally, the DML-only transactions (intentionally skipped here) have been rolled back in a background thread (STEP 8 in PHASE 4 below), to allow user connections to be accepted sooner.
STEP 6: Apply any operations from the DDL_LOG
NOTE: The DDL_LOG
table is a layer on top of transactions
that execute DDL. For example, CREATE TABLE
would write
DDL_LOG records in subtransactions, and then do DELETE FROM
DDL_LOG
in the main DDL transaction. If the server was killed
before the CREATE TABLE
was committed, STEP 5 would roll
back any changes to the DD tables, and it would roll back the deletion
from DDL_LOG
. The DDL_LOG
apply in STEP 6
would execute the actual deletion of the incomplete structures.
STEP 7: Update dd::Tablespace_file
entries
If a rewrite of file names has been requested on startup, the contents
of dd::Tablespace_file
objects and underlying data dictionary
records will be updated at this point.
PHASE 3: Upgrade of DD tables in case of dd.version mismatch
This will be driven by the server layer using the handler API.
NOTE: At this point, there may exist transactions that are holding locks on data dictionary objects:
- Recovered DML-only transactions (until STEP 8 covers them)
- Transactions that are in
XA PREPARE
state
When implementing upgrade between Global DD based versions of MySQL, we may want to modify STEP 5 so that all recovered transactions will be rolled back in case an upgrade is going to be performed.
Note that when we start to support arbitrary DML+DDL transactions, there could be XA PREPARE
transactions that are holding locks on DD tables. These locks could conflict with the upgrade. So, at this point we should probably also require that all such transactions be terminated with either XA COMMIT
or XA ROLLBACK
by the operator.
PHASE 4: Start non-critical background processes
Currently, innodb_hton->dict_recover
will cover this by
invoking srv_start_threads()
and optionally before it,
srv_dict_recover_on_restart()
.
STEP 8: Start the background tasks on transactions
This is part of srv_dict_recover_on_restart()
.
The most essential changes are:
- Start the rollback of incomplete DML-only transactions (skipped in STEP 5).
- Start the purge of delete-marked records and undo logs.
NOTE: Rollback and purge will have to perform READ
COMMITTED
of the DD tables in order to look up the table
definitions by dd::Table::se_private_id
.
PHASE 5: Start accepting normal connections
The changes will both adapt to the API changes and refactor the InnoDB startup.
innobase_init()
will perform all validation of start-up
parameters, initialize some main-memory data structures, and will not
access or open any files.
Contents |
Hard-coding internal tables
We will invoke a new method innobase_dict_init()
that is
mapped to innobase_hton->dict_init
at startup. It
will hard-code the metadata for some InnoDB-internal tables, and then
access the InnoDB files by invoking the new function
innobase_init_files(dict_init_mode)
. The
dict_init_mode
parameter of
innobase_dict_init()
takes one of the following values:
DICT_INIT_CREATE_FILES
- Create all required SE files
DICT_INIT_CREATE_MISSING_FILES
- Use files that already exist
DICT_INIT_CHECK_FILES
- Verify existence of expected files
DICT_INIT_IGNORE_FILES
- Don't care about files at all
Decoupling undo log and redo log processing
Now that InnoDB will depend on a separate subsystem for populating the data dictionary cache, the InnoDB undo log processing will have to be detached from the redo log based recovery.
A new function srv_dict_recover_on_restart()
will be
invoked after the Global Data Dictionary subsystem has been started
up, by innobase_dict_recover(DICT_RECOVERY_RESTART_SERVER,
version)
which is mapped to
innobase_hton->dict_recover
.
The srv_dict_recover_on_restart()
will conduct a few tasks:
- Resurrect table IX locks for recovered transactions (table ID lookup).
- Special crash recovery for DDL operations.
- Initiate background roll back of any incomplete DML transactions.
recv_recovery_from_checkpoint_finish()
must not initiate
rollback, because the data dictionary will not have been started up
yet. Instead, srv_dict_recover_on_restart()
will invoke
trx_rollback_or_clean_recovered()
.
recv_recovery_rollback_active()
will be removed, and the
the background rollback thread will be created in
srv_start_threads()
.
The procedure trx_resurrect_table_locks()
will be split
into trx_resurrect_table_ids()
and
trx_resurrect_locks()
. During the undo log scan before
starting up the data dictionary,
trx_resurrect_table_ids()
will record the transactions
and table IDs in a new structure
resurrected_trx_tables
. When the data dictionary is
available, srv_dict_recover_on_restart()
will invoke
trx_resurrect_locks
.
Thread creation at startup
Because InnoDB startup will no longer implement any ‘create if not
exists’ semantics, we will need a special startup code path when
initializing a new server instance with
innobase_dict_recover(DICT_RECOVERY_INITIALIZE_SERVER, version)
.
In this mode, srv_dict_recover_on_restart()
will not be
invoked, because we know that all internal data structures will be
empty, or initialized to a predefined state.
In both startup modes (initialize or restart), at the end of
innobase_dict_recover()
a new procedure
srv_start_threads()
will start
InnoDB maintenance threads. Only the I/O and page cleaner threads
were started before this:
trx_rollback_or_clean_all_recovered
- rollback of recovered transactions
buf_resize_thread
- buffer pool resizer
srv_master_thread
- the master background thread
srv_purge_coordinator_thread
- the purge coordinator thread
srv_purge_worker_thread
- the purge worker threads
buf_dump_thread
- buffer pool dumper or loader
dict_stats_thread
- statistics gatherer
fts_optimize_thread
- fulltext index optimizer
Both srv_dict_recover_on_restart()
and
srv_start_threads()
are split out from
innobase_start_or_create_for_mysql()
, which will
only open the files, apply the redo log and scan the undo log.
(PHASE 0: Recover the data dictionary and all undo logs.)
Shutdown
Because the call to srv_start_threads()
may be omitted
for example when mysql-test-run
invokes the server with
mysqld --verbose --help
, the InnoDB shutdown has to tolerate
partially initialized data structures. The flag srv_was_started
will be removed, and the shutdown of subsystems will be idempotent
(‘close if not closed’).
Startup cleanup
- Move all startup parameter checks from
innobase_start_or_create_for_mysql()
toinnobase_init()
, except those that cannot be checked before accessing the file system. - Rename
innobase_start_or_create_for_mysql()
tosrv_start(bool create_new_db
.