WL#7743: New data dictionary: changes to DDL-related parts of SE API
Affects: Server-8.0
—
Status: Complete
As part of the New Data-Dictionary project, in order to allow InnoDB to get rid of its internal data-dictionary and support atomic/crash-safe DDL, we need to extend SQL-layer code and parts of SE API which are related to opening tables and DDL. The following needs to be supported: 1) Auxiliary columns and keys (hidden system columns and keys which InnoDB adds to the tables implicitly). 2) Access to se_private_* values for DD objects during opening tables and updating them during DDL. 3) Atomic/crash-safe DDL. While implementing this item it also makes sense to change DDL user-visible behavior to more atomic. InnoDB also needs to be able to store info about auxiliary tables (needed for FTS) in the DD. However it seems that existing support for this in DD API is sufficient for now.
F-1) DROP TABLES statement which fails due to missing table should not
have any side-effects. DROP TABLES statement which fails due to
other errors is allowed to have side-effects (but see FutureF-1).
NF-1) Crash-safety of multi-table DROP TABLES should be improved. I.e.
discrepancy between SEs, the DD and binary log should be limited
to one table at most.
F-2) It should be possible to replicate successful DROP TABLES statement
from the older servers even in GTID mode.
F-3) DROP DATABASE should be mostly atomic regarding stored routines and
events. I.e. it should either succeed and drop all of them, or fail
and drop neither of them (except cases when problem is related to
removal of database directory). Failing DROP DATABASE is allowed to
have side effects on tables and files (but see FutureF-2).
NF-2) Crash-safety of DROP DATABASE should be improved. I.e. discrepancy
between SEs, the DD and binary log should be limited to one table at
most.
F-4) It should be possible to replicate DROP DATABASE statement from older
servers even in GTID mode.
F-5) It should be still possible to use DROP TABLES IF EXISTS to delete
tables for which there are entries in the data-dictionary, but which
for some reason are absent from SE. Note that this functionality is
currently undocumented.
Requirements to be supported once InnoDB implements WL#7016:
FutureF-1) DROP TABLES which involves only tables in SEs supporting atomic
DDL should be fully atomic from user point of view. I.e. either
succeed and drop all the tables or fail and do not have any
side-effects.
FutureF-2) DROP DATABASE on database which contains only tables in SEs
supporting atomic DDL should be mostly atomic towards database
objects from user point of view. I.e. either succeed and drop the
whole database or fail doesn't have any side-effects on database
objects (except cases when problem is related to removal of
database directory). Some effects on ignorable objects in database
directory like .TMD files are allowed in the latter case.
FutureF-3) Other table-related DDL which concerns only tables in SEs
supporting atomic DDL should be fully atomic from user point of
view.
FutureN-1) Table-related DDL statements concerning tables in SEs supporting
atomic DDL should be fully crash-safe. I.e. there should be no
discrepancy between the SE, the DD and binary log in case of
crashes.
Support for auxiliary columns and keys
======================================
Auxiliary columns are special columns which are automatically added by
SE to table for its internal purposes and are not visible to users (e.g.
DB_TRX_ID, DB_ROW_ID for InnoDB). Similarly auxiliary keys are hidden,
internal keys which are automatically added to table by SE (e.g. implicit
primary key for InnoDB).
Even though such columns/keys are not visible to users and SQL-layer it
might be still convenient for SE to store information about them in DD.
We support this by allowing SE adjust table definition represented by
dd::Table object and add such hidden columns/keys to it during CREATE/ALTER
TABLE operations.
New
int handler::add_extra_columns_and_keys(const HA_CREATE_INFO *create_info,
const List *create_fields,
const KEY *key_info, uint key_count,
dd::Table *table_def)
method is introduced which is called during dd::create_table() execution.
SE adds hidden columns/keys by adjusting dd::Table object passed as method
in/out parameter and then adjusted table definition is saved to the DD.
Note that SQL-layer won't do any validation for columns/keys added, it
is responsibility of SE to ensure that they are valid.
QQ) Marko says he doesn't need any arguments except dd::Table, should we
omit them?
QQ) At which point exactly to call this method? Currently it is called
before handling partitions, because we want to create Partition_index
objects for hidden indexes too. But this feels wrong as the dd::Table
object passed to add_extra_columns_and_keys() is half-constructed.
When table is opened SE is passed dd::Table object as argument to
handler::open() method and can get information about these hidden
objects.
Providing access to se_private_* values and methods to update them during DDL
=============================================================================
In order to get rid of its internal DD InnoDB needs to be able store some
engine-private information and identifiers associated with various objects
in the DD. The DD has provision for this in the form of se_private_data
and se_private_id attributes associated with these objects and methods
for reading/setting these attributes.
We extend SE API methods related to DDL to allow SEs to adjust these
attributes for tables (and its subobjects) and tablespaces.
We also add "const dd::Table *" argument to handler::open() call so SE
will be able to get values of these attributes when table is opened.
Similar arguments are to be added to DDL-related methods which operate
on tables without opening them.
Let us sum up changes described above:
int handlerton::alter_tablespace(handlerton *hton, THD *thd,
st_alter_tablespace *ts_info,
+ const dd::Tablespace *old_ts_def,
+ dd::Tablespace *new_ts_def);
New old_ts_def in-argument contains pointer to dd::Tablespace object
describing old version of tablespace being altered (NULL in case
when tablespace is being created).
New new_ts_def in/out-argument contains pointer to dd::Tablespace object
describing new version of tablespace being altered (NULL in case
when tablespace is being dropped). SE is allowed to adjust this
object (e.g. set se_private_data/id attributes for table).
int handler::open(TABLE *table, const char *name, int mode,
int test_if_locked,
+ const dd::Table *table_def);
New table_def in-argument contains pointer to dd::Table object
describing the table being opened.
Note that optimizer uses custom code for creation of its temporary tables,
as result such tables do not have proper dd::Table objects associated with
them. Therefore handler::open() will get NULL as table_def argument for
them.
int handler::truncate(
+ dd::Table *table_def);
New table_def in/out-argument points to dd::Table object describing
the table being truncated. SE is allowed to adjust this object.
int handler::rename_table(const char *from, const char *to,
+ const dd::Table *from_table_def,
+ dd::Table *to_table_def);
New from_table_def in-argument contains pointer to dd::Table object
describing the table prior to rename. to_table_def is in/out-parameter
which points to dd::Table object describing the table after rename.
The latter object can be adjusted by SE.
int handler::delete_table(const char *name,
+ const dd::Table *table_def);
New table_def in-argument contains pointer to dd::Table object
for table being dropped.
int handler::create(const char *name, TABLE *form, HA_CREATE_INFO *info,
+ dd::Table *table_def);
New table_def in/out-argument contains pointer to dd::Table object
for table being created. SE is allowed to adjust this object.
bool handler::prepare_inplace_alter_table(TABLE *altered_table,
Alter_inplace_info *ha_alter_info,
+ const dd::Table *old_table_def,
+ dd::Table *new_table_def);
bool handler::inplace_alter_table(TABLE *altered_table,
Alter_inplace_info *ha_alter_info,
+ const dd::Table *old_table_def,
+ dd::Table *new_table_def)
bool handler::commit_inplace_alter_table(TABLE *altered_table,
Alter_inplace_info *ha_alter_info,
bool commit,
+ const dd::Table *old_table_def,
+ dd::Table *new_table_def)
For the above 3 methods new old_table_def in-arguments point to
dd::Table describing old version of table being altered.
New new_table_def in/out-argument point to dd::Table for the new
version. The latter can be adjusted by SE.
int Partition_handler::truncate_partition_low(
+ dd::Table *table_def);
New table_def in/out-argument points to dd::Table object describing
the table which partition is being truncated. SE is allowed to adjust
this object.
Also we introduce new method to Partition_handler to encapsulate
SE-specific details of partition exchange in SE:
+ int Partition_handler::exchange_partition_low(const char *part_table_path,
+ const char *swap_table_path,
+ uint part_id,
+ dd::Table *part_table_def,
+ dd::Table *swap_table_def)
This new method has part_table_def and swap_table_def in/out-parameters
which point to dd::Table describing partitioned and table being swapped
with partition correspondingly. SEs are allowed to adjust these objects.
After calling any of the above methods which allow adjustement of
table definition SQL-layer will save updated definition to the DD.
To avoid problems with the DD and SE information getting out of sync
we will allow such adjustments only to engines which carry out DD
update and changes in SE as a single atomic transaction long-term
(i.e. to engines supporting atomic DDL). SQL-layer will enforce this
by simply not storing adjusted objects in the DD for other engines.
However, to make it possible to work on WL#7141 independently of
WL#7016 we might allow such adjustements for all SEs short-term.
Storing info about auxiliary tables and other similar objects
=============================================================
InnoDB needs to be able to store information about auxiliary tables,
special internal tables created to support FTS, in the DD.
It also might need to store information about other implicitly
created objects, which are not tightly coupled to the main table,
e.g. about tablespace created for the table when innodb-file-per-table
mode is on.
It is possible to do so using existing DD API by:
1) acquiring X MDL on the object using dd::acquire_exclusive_tablespace_mdl(),
2) creating appropriate DD object using dd::create_object<>,
3) filling the object according to SE needs and marking it as hidden
if necessary,
4) calling dd::get_dd_client()->store() to save it in the DD
during execution of appropriate SE method (like handler::create()).
Then SQL-layer will commit this change along with the adjusted table
definition. Other operations like deletion or updates of auxiliary/
implicit objects can be handled in similar fashion.
Supporting atomic/crash-safe DDL
================================
On high-level we can say that to make DDL atomic/crash-safe we need to
pack its updates to the DD, changes in SE and writes to binary log into
single atomic transaction (i.e. it should either commit and have its
effect properly reflected in DD, SE and binary log or rollback and
doesn't have any effect at all).
To implement this we need to ensure that:
1) There are no intermediate commits on SQL-layer during DDL (to be
addressed by this WL)
2) There are no intermediate commits in SE methods called by DDL.
Also SEs should register themselves as part of ongoing transaction.
(Both items to be addressed by WL#7016 in InnoDB.)
3) SE can do redo/rollback of DDL (to be addressed by WL#7016)
This WL supports this capability by introducing new handlerton
post_ddl() hook to be called after DDL is committed or rolled
back to let SE do necessary post-commit/rollback work (see
examples below).
4) Write to binary log happens as part of the DDL transaction
(addressed by this WL. WL#9175 is necessary to ensure that
binlog supports correct crash recovery for DDL statements).
Also while adding atomicity/crash-safeness to DDL from implementation
point of view, it also makes sense to:
5) Change behavior of some DDL statements (e.g. DROP TABLES) to make
user-visible behavior more atomic (e.g. try to avoid side-effects
from failed statements when possible).
We also need to keep in mind that not all SEs will support DDL atomicity.
Such SEs should be accounted for while implementing the above changes.
To differentiate SEs which support and which doesn't support atomic DDL
new handlerton flag HTON_SUPPORTS_ATOMIC_DDL is introduced.
Let us discuss changes for each of DDL statements in details.
Note that we don't discuss ALTER TABLE variants related to partitioning
which are currently implemented through "fast alter partitioning" code
path below as this code is to be removed soon.
A) CREATE TABLE (including CREATE TABLE LIKE)
---------------------------------------------
Currently process of table creation looks like:
1) Create dd::Table object describing table to be created
2) Store dd::Table object in DD tables and commit this change.
3) "Open" table, construct TABLE_SHARE, TABLE and handler objects for
the table.
4) Call handler::create(name, TABLE, HA_CREATE_INFO) method to create table
5) Statement is written to binary log
This is to be replaced with:
1) Create dd::Table object describing table to be created
2) Use dummy handler object to call new handler::add_extra_columns_and_keys()
method to add additional hidden columns and keys which will be created
by SE to the dd::Table object.
3) Store dd::Table object in DD tables.
4) Commit this change if engine is not capable of atomic DDL.
The latter is necessary to ensure that in case of crash we
won't get "orphan" tables in SE which do not have entries
in DD.
5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it.
6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method
for the table.
Note that this method can update se_private_* fields in in-memory DD
object. It also can create additional objects in DD like dd::Tablespace
for file-per-table tablespaces or hidden dd::Table for auxiliary tables
needed for FTS. These additional changes are not to be committed yet.
Long-term such updates will be allowed only for engines which support
atomic DDL.
7) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Long-term this step will be executed only for engines
supporting atomic DDL. Short-term, for engines not capable of atomic
DDL this change will be committed.
8) Write statement to the binary log (to the cache if SE supports atomic DDL).
9) Transaction is committed or rolled back.
10) Call new handlerton post_ddl hook to let engines which are capable
of atomic DDL do necessary post-commit changes (e.g. we might want
to remove files in SE on rollback).
Note that for engines supporting atomic DDL the above steps 1) ... 9)
are going to be part of the single transaction, i.e. will be atomic
even if crash occurs.
This also means that such engines should not commit the transaction
internally during DDL until SQL-layer requests to do so.
For engines which are incapable of atomic DDL we still try to execute
statement in a manner which reduces risk of ugly side-effects in case
of crash - e.g. DD and SE getting out of sync, having "orphan" tables
in SE but not in DD,...
Long-term the plan is to get rid of "name", TABLE and HA_CREATE_INFO
parameters in handler::create() call and be able to create table only
from its DD representation.
B. DROP TABLES
--------------
Current approach to dropping tables looks like (simplified):
1) For each table in the table list:
1.1) Try to drop table in SE by calling handler::ha_delete_table(),
if error either proceed to next table or goto 2) depending
on error type.
1.2) Remove table from the DD and commit this change.
2) Write up-to 3 artificial DROP TABLES statements for tables which were
successfully dropped to binary log - we write separate statements for
all transactional temporary tables, all non-transactional temporary
tables and all base tables we have managed to drop.
While this schema is not crash safe it is at least ensures that
we get correct binary log in case when DROP TABLES statement cannot
be completed fully due to inability drop some table (e.g. due to
foreign keys or some other error).
It works OK in cases when we are executing DROP TEMPORARY TABLES
statement in the middle of transaction.
It also works correctly in GTID mode. We never split DROP TABLES
statement into several statements in binary log in it, because
in this mode we prohibit DROP TABLES statement which mix temporary
and non-temporary tables, or temporary transactional and temporary
non-transactional tables.
Of course, the above means that statement user-visible behavior is
not atomic, i.e. that it can be partially executed and fail still
have some side-effect. This is counter-intuitive for many users
and doesn't play well with replication.
With advent of atomic DDL it becomes possible to improve DROP TABLES
implementation. Some important points to consider while working on this
are:
a) DROP TABLES should be atomic both from crash-safety and user-visible
behavior points of view when all tables which are dropped are in SEs
which support atomic DDL.
b) When we have a mix of engines we still should try to be as crash-safe
as possible. Atomicity from user-visible perspective is also nice.
It is probably a bad idea to have side effect from failed DROP TABLES
on tables in SEs which support atomic DDL.
c) It should be possible to replicate DROP TABLES statements even from
older servers, possibly sacrificing some corner cases and/or
crash-safety for them.
d) GTID mode should work, even when we replicate from older servers.
Again some compromises are possible.
After discussion with Replication Team the following algorithm for
improved DROP TABLES was suggested (somewhat simplified):
1) For each table in the table list check to which one of 5 classes
it belongs:
a) non-existent table
b) base table in SE which doesn't support atomic DDL
c) base table in SE which supports atomic DDL
d) temporary non-transactional
e) temporary transactional
In the process check if temporary tables to be dropped are used by
some outer statement. Report and abort execution if there are any.
2) If this DROP TABLES doesn't have IF EXISTS clause and there are
non-existent tables report appropriate error. This could have been
done of previous step if we wouldn't need to include list of all
missing tables in the error message.
2') Once WL#6929 is implemented we can check if we trying to drop parent
tables in some FK without dropping child in the same statement and
report an error here.
Note that this way DROP TABLES will be able to handle most common error
cases without having any side-effect.
3) For each table from class b) (base table in SE which doesn't support
atomic DDL).
3.1) Call the handler::delete_table(const dd::Table) to delete
the table in SE.
3.2) Remove table description from DD and commit the change
3.3) If {we are not in GTID mode} OR
{we are in GTID mode AND there is only one table in class b) AND
classes a) and c) are empty}
write DROP TABLE statement for the table to binary log.
Else we need to construct a single DROP TABLES statement for the
GTID and write it to the binary log later.
In case of error during any of the above steps, report it and abort
statement execution.
4) For each table from class c) (in SE supporting atomic DDL) and class a)
(non-existent).
4.1) Call the handler::delete_table(const dd::Table) method to mark
table as dropped in SE.
4.2) Update DD to remove the table from it. Do not commit this change.
5) If {we are not in GTID mode} OR
{we are in GTID mode AND class b) is empty}
write DROP TABLES statement including all tables from to binary log
Else we will need to construct a single DROP TABLES statement for the
GTID and write it to the binary log later.
6) Commit or rollback
7) Call new handlerton post_ddl() method in order to wait until SE
completes real removal of table supporting atomic DDL. Concurrent DDL
operations on the table should be blocked at this stage.
If any error occurs on steps 4) .. 6) report it and abort statement
execution immediately.
Note that we handle non-existent tables in the same way as supporting
atomic DDL in order to have single nice DROP TABLES statement for the
"main" InnoDB-only case.
Note that we handle non-atomic tables first and then tables in SEs
supporting atomic DDL in order to avoid situation when DROP TABLE
fails while dropping non-atomic table and also drops some atomic
tables as side-effect.
Also note that with exception of problems with writing to binary log
the DROP TABLES statement can't really fail after this point (see
comments explaining why below).
8) If we are in GTID mode and had to postpone writing to binary log
on steps 3.3) and 5) because of this, write DROP TABLES statement
containing all tables we have managed to drop to the binary log.
The above is necessary to handle replication in GTID mode from older
servers or in cases when master and slave have different SEs for the
same tables. Obviously we sacrifice crash-safety to compatibility here.
9) For each table from class d) (non-transactional temporary) call
close_temporary_table() function to drop the table (this function
will call handler::delete_table() in SE).
Note that close_temporary_table() can't fail if the check which was done
on step 1) was successfull.
10) Construct DROP TEMPORARY TABLES statement for tables from class d)
and write it to binary log.
Note that we don't have problem with GTIDs here since DROP TABLES statement
doesn't allow mixing tables from class d) with any others in GTID mode.
11) For each table from class e) (transactional temporary) call
close_temporary_table() function to drop the table (this function
will call handler::delete_table() in SE).
Again close_temporary_table() can't fail here if the check which was done
on step 1) was successfull.
12) Construct DROP TEMPORARY TABLES statement for tables from class d)
and write it to binary log (actually to its transaction cache).
Same comment about absence of problem with GTIDs as above applies here.
C) CREATE TABLE ... SELECT
--------------------------
New approach to implementing this statement:
1) Create dd::Table object describing table to be created
2) Use dummy handler object to call new handler::add_extra_columns_and_keys()
method to add additional hidden columns and keys which will be created
by SE to the dd::Table object.
3) Store dd::Table object in DD tables.
4) Commit this change if engine is not capable of atomic DDL.
The latter is necessary to ensure that in case of crash we
won't get "orphan" tables in SE which do not have entries
in DD.
5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it.
6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method
for the table.
Note that this method can update se_private_* fields in in-memory DD
object. It also can create additional objects in DD like dd::Tablespace
for file-per-table tablespaces or hidden dd::Table for auxiliary tables
needed for FTS. These additional changes are not to be committed yet.
Long-term such updates will be allowed only for engines which support
atomic DDL.
7) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Long-term this step will be executed only for engines
supporting atomic DDL. Short-term, for engines not capable of atomic
DDL this change will be committed.
The above steps are the same as for simple CREATE TABLE.
8) If we are in RBR mode write CREATE TABLE statement describing table
structure into binary log (note that in reality at this point statement
should end-up in transactional cache and not in on-disk binary log).
9) Insert data into table (by reading from source tables and doing
handler::write_row() on newly created table).
This should be part of the same transaction as above calls to
handler::create() and upcoming writes to the binary log.
In RBR mode this also writes events to binary log transactional
cache.
10) In SBR mode write statement in binary log (for engines supporting
atomic DDL to transactional cache).
11) Transaction is committed or rolled back. (Once support for atomic DDL
in InnoDB is implemented handler::create() call, changes to on-disk DD,
writing to binary log are going to be part of the same transaction,
i.e. will be atomic even if crash occurs).
12) Handlerton post_ddl() hook is called to let SE do the necessary
steps which should happen after transaction commit (e.g. in case
of rollback we might want to wait for deletion of files belonging
to table we failed to create).
Note that to handle an error (e.g. during row insertion phase) for
engines supporting atomic DDL it is enough to rollback the transaction.
For engines without such support table needs to be dropped explicitly
by calling handler::delete_table(), removing it from the DD and committing
this change.
D) ALTER TABLE ALGORITHM=COPY
-----------------------------
New approach to implementing this statement:
1) Create dd::Table object describing new version of the table
2) Use dummy handler object to call new handler::add_extra_columns_and_keys()
method to add additional hidden columns and keys which will be created
by SE for new table version to the dd::Table object.
3) Store dd::Table object in DD tables.
4) Commit this change if engine of new version is not capable of atomic
DDL. The latter is necessary to ensure that in case of crash we
won't get "orphan" tables in SE which do not have entries in DD.
5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it.
6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method
for the new version of the table.
Note that this method can update se_private_* fields in in-memory DD
object. It also can create additional objects in DD like dd::Tablespace
for file-per-table tablespaces or hidden dd::Table for auxiliary tables
needed for FTS. These additional changes are not to be committed yet.
Long-term such updates will be allowed only for engines which support
atomic DDL.
7) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Long-term this step will be executed only if engine
of new version suppors atomic DDL. Short-term, for engines not capable
of atomic DDL this change will be committed.
Again the above is pretty similar to the first part of CREATE TABLE
implementation.
8) Copy the contents from old version of table to new version of table
note that unlike in current code this should not commit the transaction
if engine of new version supports atomic DDL.
Note that if engine of new table version supports atomic DDL the
error on any of the above steps can be handled by simply rolling back
transaction. For other engines explicit deletion will be required.
9) Replace old table version with a new table version. To do this
we need:
9.1) Commit the transaction if engine of the old version of the
table is not capable of atomic DDL.
9.2) Inform engines about old version of table being replaced with
new version. This is done through a series of
handler::rename_table() calls. Update data in DD tables in the
process accordingly.
9.3) If engine of old version or new version don't support atomic
DDL commit changes after each rename operation during step 9.2).
10) Call handler::delete_table() for old version of the table. Remove it
from DD.
11) Again if either of engines doesn't support atomic DDL it makes sense
to commit the above change to minimize DD <-> SE discrepancy in case
of crash.
12) Write to binary log
Again if both engines of old and new table versions support atomic DDL
it is possible to handle errors during the above steps by simple rollback.
If at least one of them is not, then we need to take explicit actions,
like reverting renames and deleting the new version. Moreover after
point 10) totally correct error handling becomes impossible.
13) Commit or rollback (with advent of atomic DD all the above
should be part of single atomic transaction)
14) Call handlerton post_ddl() hook to wait until SE completes real
removal of old version of the table (or new version if rollback
has happened). Concurrent DDL on the table should be blocked at
this stage.
E) ALTER TABLE ALGORITHM=INPLACE
--------------------------------
1) Create dd::Table object describing new version of the table
2) Use dummy handler object to call new handler::add_extra_columns_and_keys()
method to add additional hidden columns and keys which will be created
by SE for new table version to the dd::Table object.
3) Store dd::Table object in DD tables.
4) Commit this change if engine of new version is not capable of atomic
DDL. The latter is necessary to ensure that in case of crash we
won't get "orphan" tables in SE which do not have entries in DD.
5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it.
6) Construct Alter_inplace_info object by comparing old and new versions
of table.
7) Call handler::check_if_inplace_alter_supported() to figure out if
in-place algorithm is applicable.
8) Call handler::ha_prepare_inplace_alter_table(Alter_inplace_info).
This method can adjust se_private_* fields in dd::Table object
describing new version of the table and do other modifications to DD
if necessary. Long-term this will be allowed only if SE supports
atomic DDL.
9) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Long-term this step will be executed only if engine
supports atomic DDL and should not commit transaction. Short-term, for
engines not capable of atomic DDL this change will be committed.
10) Call handler::inplace_alter_table() method for the table.
11) Call handler::commit_inplace_alter_table() method for the table.
Similarly to step 8) this method can adjust dd::Table object and DD
in general (long-term only for engines supporting atomic DDL).
12) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Again long-term this step will be executed only if engine
supports atomic DDL and should not commit transaction.
Short-term, for engines not capable of atomic DDL this change will be
committed immediately.
13) Replace old table version in DD with a new version.
14) If table engine doesn't support atomic DDL commit the above change
needs to be committed to reduce chances of DD and SE getting out of sync.
15) Inform storage engine about possibly required table rename by calling
handler::rename_table().
16) Update DD accordingly. Again if SE is not atomic-DDL-capable this
change should be commited.
17) Write statement to binary log
18) Commit or rollback transaction. Note that once atomic DDL is supported
for InnoDB all of the above steps will be part of one atomic transaction.
19) Call handlerton post_ddl() method to wait until SE completes real removal
of indexes which were dropped and other similar operations which should
happen post commit. Concurrent DDL on the table should be blocked at
this stage.
F. TRUNCATE TABLE
-----------------
There are two paths in TRUNCATE TABLE implementation, one for
HTON_CAN_RECREATE engines and another for other engines.
Here we will cover the latter as it is the only which is relevant
for engines which will support atomic DDL/InnoDB:
1) Call handler::truncate() for the table. SE is allowed to adjust
"se_private_*" attributes for the table and do other DD modifications
during this call. Long-term this will be allowed only for SEs which
support atomic DDL.
2) Store dd::Table object (which was possibly adjusted on previous step)
into DD tables. Long-term this step will be executed only if engine
supports atomic DDL and should not commit transaction. Short-term, for
engines not capable of atomic DDL this change will be committed.
3) Write statement to the binary log
4) Commit transaction or rollback it.
5) Call handlerton post_ddl() method in order to wait until SE will
really finish truncation (e.g. remove old tablespace in case of
commit, remove new tablespace in case of rollback). Concurrent
DDL operations on the table should be blocked at this stage.
As in previous cases once support for atomic DDL is implemented in InnoDB
steps 1) .. 4) will become part of single atomic and crash-safe
transaction from SQL-layer point of view.
Note that new implementation of TRUNCATE PARTITION will be pretty similar
to the one described above.
E. RENAME TABLES
----------------
1) For each element in rename list
1.1) Call handler::rename_table(). Again SE is allowed to adjust
dd::Table object describing new version of table and do other
DD modifications during this call. And again long-term this
will be allowed for engines supporting atomic DDL only.
1.2) Store dd::Table object describing new version of table in the DD
(including updates to it on the previous stage). If engine
doesn't support atomic DDL or we have met such engine on
previous iterations of the loop commit the transaction.
2) Write to binary log
3) Commit or rollback transaction (this is only relevant if all
engines participating in RENAME support atomic DDL).
4) Use handlerton post_ddl() method to complete renaming in the storage
engine (might be no-op).
Note that if all engines involved in RENAME TABLE support atomic DDL
steps 1) - 3) become part of single atomic and crash-safe transaction
from SQL-layer point of view. Also error handling in such case boils
down to simple transaction rollback.
If at least one engine involved doesn't support atomic DDL RENAME TABLE
becomes non-atomic. Handling of error requires renaming of tables
in reverse order by calling handler::rename_table() and updating
DD accordingly.
F. CREATE/ALTER/DROP TABLESPACE
-------------------------------
1) Prepare DD objects for operation:
1.1) If we are processing CREATE TABLESPACE construct dd::Tablespace
object for tablespace being created. Save the object in the DD.
Do not commit this change if SE supports atomic DDL. Commit the
change otherwise.
1.2) If we are processing ALTER TABLESPACE prepare dd::Tablespace
objects describing new and old versions of tablespace.
1.3) In case of DROP TABLESPACE prepare dd::Tablespace object
describing tablespace to be dropped.
2) Call handlerton::alter_tablespace() method. SE is allowed to adjust
attributes of tablespace being created/altered during it. Long-term
this will be allowed only for SEs which support atomic DDL.
3) Store updated version of dd::Tablespace object (this includes
adjustments during step 2)). Delete the tablespace from the DD
if it is DROP TABLESPACE. Commit the changes right away if SE
doesn't support atomic DDL.
4) Write statement to the binary log.
5) Commit or rollback transaction.
6) Use handlerton post_ddl() method to complete operation in SE
(e.g. to remove files of tablespace being dropped).
G. DROP DATABASE
----------------
Similarly to DROP TABLES it makes sense to change user-visible behavior
of DROP DATABASE to more atomic one. And indeed replication compatibility
considerations are important for DROP DATABASE as well.
Here is the description of new DROP DATABASE implementation:
1) Check if database directory contains any extra files which are not
safe to remove directly and which will not be removed by dropping
tables, fail if it does.
Check if server has enough privileges to remove database directory,
fail if it does not.
1') Once WL#6929 is implemented we can check if we will be trying to drop
parent tables in some FK without dropping child and report an error
here.
2) Remove files which do not belong to tables and which are known to
be safe to delete.
3) Drop all tables in SEs which don't support atomic DDL one-by-one:
3.1) Call handler::delete_table() to remove table in SE
3.2) Remove table from the DD and commit the change immediately.
3.3) Unless we are in GTID mode write DROP TABLES IF EXISTS statement
for the table dropped to binary log.
Note that the goal of item 3.3) is to improve crash-safety. One possible
alternative which sacrifices it but makes binary log more compact is to
delay write to the binary log until we can write successfull DROP DATABASE
to it, or when we know that there was some error during it and can write
artificial DROP TABLES IF EXISTS statement for all tables which we have
managed to drop.
4) In a single atomic transaction:
4.1) Drop all tables belonging to SE supporting atomic DDL by calling
for each table handler::delete_table() and then removing it from
the DD.
4.2) Remove all stored functions and procedures in the database.
4.3) Remove all events in the database.
4.4) Write DROP DATABASE statement to the binary log
4.5) Commit or rollback the transaction
Any error in the process is handled by rolling back the transaction.
If this happens and we have delayed writing to the binary log deletion
of some atomic-DDL-non-capable table because of GTID mode report a
special error (this is what happens now in similar situation).
5) Call post_ddl() handlerton method to let SEs finalize deletion of the
tables.
6) Delete database directory from the filesystem.
Of course, the above means that there is hole in atomicity if crash occurs
after 4.5) and before 6). This problem requires introduction of redo log
for database directory removal and will be solved outside of this WL.
H. ALTER TABLE EXCHANGE PARTITION
---------------------------------
There is additional problem with current implementation of this statement.
It breaks encapsulation of partitioning support in SEs since it swaps table
and partitions by simple rename of tables in SE and thus disclosing the
fact that partitions are just another kind of tables.
We solve this problem by introducing new
Partition_handler::exchange_partition[_low](const char *part_table_path,
const char *swap_table_path,
uint part_id,
dd::Table *part_table_def,
dd::Table *swap_table_def)
method. SEs which support native partitioning need to implement this
method. Non-native partitioning will be no longer supported thanks to
WL#8971.
After that new implementation of this statement starts looking like:
1) Check if table and partition have compatible metadata and can be
exchanged.
2) Call Partition_handler::exchange_partition() method to exchange
table and partition. SE can adjust dd::Table objects for both
non-partitioned and partitioned table as well as do other DD
modifications during this step. Long-term this will be allowed
only for SEs which support atomic DDL.
3) Save adjusted table definitions to the DD. Long-term this will be
done only for SEs which support atomic DDL. Short-term for other
SEs we will commit these changes immedeately.
4) Write statement to the binary log
5) Commit or rollback the transaction
6) Call handlerton post_ddl() method to let SE finalize
exchange (might be no-op).
Similarly to other statements if SE supports atomic DDL any error can be
handled by simple rollback. For SEs which do not support it, exchange of
table and partition in opposite direction might be required to do this.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.