WL#7743: New data dictionary: changes to DDL-related parts of SE API
Affects: Server-8.0
—
Status: Complete
As part of the New Data-Dictionary project, in order to allow InnoDB to get rid of its internal data-dictionary and support atomic/crash-safe DDL, we need to extend SQL-layer code and parts of SE API which are related to opening tables and DDL. The following needs to be supported: 1) Auxiliary columns and keys (hidden system columns and keys which InnoDB adds to the tables implicitly). 2) Access to se_private_* values for DD objects during opening tables and updating them during DDL. 3) Atomic/crash-safe DDL. While implementing this item it also makes sense to change DDL user-visible behavior to more atomic. InnoDB also needs to be able to store info about auxiliary tables (needed for FTS) in the DD. However it seems that existing support for this in DD API is sufficient for now.
F-1) DROP TABLES statement which fails due to missing table should not have any side-effects. DROP TABLES statement which fails due to other errors is allowed to have side-effects (but see FutureF-1). NF-1) Crash-safety of multi-table DROP TABLES should be improved. I.e. discrepancy between SEs, the DD and binary log should be limited to one table at most. F-2) It should be possible to replicate successful DROP TABLES statement from the older servers even in GTID mode. F-3) DROP DATABASE should be mostly atomic regarding stored routines and events. I.e. it should either succeed and drop all of them, or fail and drop neither of them (except cases when problem is related to removal of database directory). Failing DROP DATABASE is allowed to have side effects on tables and files (but see FutureF-2). NF-2) Crash-safety of DROP DATABASE should be improved. I.e. discrepancy between SEs, the DD and binary log should be limited to one table at most. F-4) It should be possible to replicate DROP DATABASE statement from older servers even in GTID mode. F-5) It should be still possible to use DROP TABLES IF EXISTS to delete tables for which there are entries in the data-dictionary, but which for some reason are absent from SE. Note that this functionality is currently undocumented. Requirements to be supported once InnoDB implements WL#7016: FutureF-1) DROP TABLES which involves only tables in SEs supporting atomic DDL should be fully atomic from user point of view. I.e. either succeed and drop all the tables or fail and do not have any side-effects. FutureF-2) DROP DATABASE on database which contains only tables in SEs supporting atomic DDL should be mostly atomic towards database objects from user point of view. I.e. either succeed and drop the whole database or fail doesn't have any side-effects on database objects (except cases when problem is related to removal of database directory). Some effects on ignorable objects in database directory like .TMD files are allowed in the latter case. FutureF-3) Other table-related DDL which concerns only tables in SEs supporting atomic DDL should be fully atomic from user point of view. FutureN-1) Table-related DDL statements concerning tables in SEs supporting atomic DDL should be fully crash-safe. I.e. there should be no discrepancy between the SE, the DD and binary log in case of crashes.
Support for auxiliary columns and keys ====================================== Auxiliary columns are special columns which are automatically added by SE to table for its internal purposes and are not visible to users (e.g. DB_TRX_ID, DB_ROW_ID for InnoDB). Similarly auxiliary keys are hidden, internal keys which are automatically added to table by SE (e.g. implicit primary key for InnoDB). Even though such columns/keys are not visible to users and SQL-layer it might be still convenient for SE to store information about them in DD. We support this by allowing SE adjust table definition represented by dd::Table object and add such hidden columns/keys to it during CREATE/ALTER TABLE operations. New int handler::add_extra_columns_and_keys(const HA_CREATE_INFO *create_info, const List*create_fields, const KEY *key_info, uint key_count, dd::Table *table_def) method is introduced which is called during dd::create_table() execution. SE adds hidden columns/keys by adjusting dd::Table object passed as method in/out parameter and then adjusted table definition is saved to the DD. Note that SQL-layer won't do any validation for columns/keys added, it is responsibility of SE to ensure that they are valid. QQ) Marko says he doesn't need any arguments except dd::Table, should we omit them? QQ) At which point exactly to call this method? Currently it is called before handling partitions, because we want to create Partition_index objects for hidden indexes too. But this feels wrong as the dd::Table object passed to add_extra_columns_and_keys() is half-constructed. When table is opened SE is passed dd::Table object as argument to handler::open() method and can get information about these hidden objects. Providing access to se_private_* values and methods to update them during DDL ============================================================================= In order to get rid of its internal DD InnoDB needs to be able store some engine-private information and identifiers associated with various objects in the DD. The DD has provision for this in the form of se_private_data and se_private_id attributes associated with these objects and methods for reading/setting these attributes. We extend SE API methods related to DDL to allow SEs to adjust these attributes for tables (and its subobjects) and tablespaces. We also add "const dd::Table *" argument to handler::open() call so SE will be able to get values of these attributes when table is opened. Similar arguments are to be added to DDL-related methods which operate on tables without opening them. Let us sum up changes described above: int handlerton::alter_tablespace(handlerton *hton, THD *thd, st_alter_tablespace *ts_info, + const dd::Tablespace *old_ts_def, + dd::Tablespace *new_ts_def); New old_ts_def in-argument contains pointer to dd::Tablespace object describing old version of tablespace being altered (NULL in case when tablespace is being created). New new_ts_def in/out-argument contains pointer to dd::Tablespace object describing new version of tablespace being altered (NULL in case when tablespace is being dropped). SE is allowed to adjust this object (e.g. set se_private_data/id attributes for table). int handler::open(TABLE *table, const char *name, int mode, int test_if_locked, + const dd::Table *table_def); New table_def in-argument contains pointer to dd::Table object describing the table being opened. Note that optimizer uses custom code for creation of its temporary tables, as result such tables do not have proper dd::Table objects associated with them. Therefore handler::open() will get NULL as table_def argument for them. int handler::truncate( + dd::Table *table_def); New table_def in/out-argument points to dd::Table object describing the table being truncated. SE is allowed to adjust this object. int handler::rename_table(const char *from, const char *to, + const dd::Table *from_table_def, + dd::Table *to_table_def); New from_table_def in-argument contains pointer to dd::Table object describing the table prior to rename. to_table_def is in/out-parameter which points to dd::Table object describing the table after rename. The latter object can be adjusted by SE. int handler::delete_table(const char *name, + const dd::Table *table_def); New table_def in-argument contains pointer to dd::Table object for table being dropped. int handler::create(const char *name, TABLE *form, HA_CREATE_INFO *info, + dd::Table *table_def); New table_def in/out-argument contains pointer to dd::Table object for table being created. SE is allowed to adjust this object. bool handler::prepare_inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info, + const dd::Table *old_table_def, + dd::Table *new_table_def); bool handler::inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info, + const dd::Table *old_table_def, + dd::Table *new_table_def) bool handler::commit_inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info, bool commit, + const dd::Table *old_table_def, + dd::Table *new_table_def) For the above 3 methods new old_table_def in-arguments point to dd::Table describing old version of table being altered. New new_table_def in/out-argument point to dd::Table for the new version. The latter can be adjusted by SE. int Partition_handler::truncate_partition_low( + dd::Table *table_def); New table_def in/out-argument points to dd::Table object describing the table which partition is being truncated. SE is allowed to adjust this object. Also we introduce new method to Partition_handler to encapsulate SE-specific details of partition exchange in SE: + int Partition_handler::exchange_partition_low(const char *part_table_path, + const char *swap_table_path, + uint part_id, + dd::Table *part_table_def, + dd::Table *swap_table_def) This new method has part_table_def and swap_table_def in/out-parameters which point to dd::Table describing partitioned and table being swapped with partition correspondingly. SEs are allowed to adjust these objects. After calling any of the above methods which allow adjustement of table definition SQL-layer will save updated definition to the DD. To avoid problems with the DD and SE information getting out of sync we will allow such adjustments only to engines which carry out DD update and changes in SE as a single atomic transaction long-term (i.e. to engines supporting atomic DDL). SQL-layer will enforce this by simply not storing adjusted objects in the DD for other engines. However, to make it possible to work on WL#7141 independently of WL#7016 we might allow such adjustements for all SEs short-term. Storing info about auxiliary tables and other similar objects ============================================================= InnoDB needs to be able to store information about auxiliary tables, special internal tables created to support FTS, in the DD. It also might need to store information about other implicitly created objects, which are not tightly coupled to the main table, e.g. about tablespace created for the table when innodb-file-per-table mode is on. It is possible to do so using existing DD API by: 1) acquiring X MDL on the object using dd::acquire_exclusive_tablespace_mdl(), 2) creating appropriate DD object using dd::create_object<>, 3) filling the object according to SE needs and marking it as hidden if necessary, 4) calling dd::get_dd_client()->store() to save it in the DD during execution of appropriate SE method (like handler::create()). Then SQL-layer will commit this change along with the adjusted table definition. Other operations like deletion or updates of auxiliary/ implicit objects can be handled in similar fashion. Supporting atomic/crash-safe DDL ================================ On high-level we can say that to make DDL atomic/crash-safe we need to pack its updates to the DD, changes in SE and writes to binary log into single atomic transaction (i.e. it should either commit and have its effect properly reflected in DD, SE and binary log or rollback and doesn't have any effect at all). To implement this we need to ensure that: 1) There are no intermediate commits on SQL-layer during DDL (to be addressed by this WL) 2) There are no intermediate commits in SE methods called by DDL. Also SEs should register themselves as part of ongoing transaction. (Both items to be addressed by WL#7016 in InnoDB.) 3) SE can do redo/rollback of DDL (to be addressed by WL#7016) This WL supports this capability by introducing new handlerton post_ddl() hook to be called after DDL is committed or rolled back to let SE do necessary post-commit/rollback work (see examples below). 4) Write to binary log happens as part of the DDL transaction (addressed by this WL. WL#9175 is necessary to ensure that binlog supports correct crash recovery for DDL statements). Also while adding atomicity/crash-safeness to DDL from implementation point of view, it also makes sense to: 5) Change behavior of some DDL statements (e.g. DROP TABLES) to make user-visible behavior more atomic (e.g. try to avoid side-effects from failed statements when possible). We also need to keep in mind that not all SEs will support DDL atomicity. Such SEs should be accounted for while implementing the above changes. To differentiate SEs which support and which doesn't support atomic DDL new handlerton flag HTON_SUPPORTS_ATOMIC_DDL is introduced. Let us discuss changes for each of DDL statements in details. Note that we don't discuss ALTER TABLE variants related to partitioning which are currently implemented through "fast alter partitioning" code path below as this code is to be removed soon. A) CREATE TABLE (including CREATE TABLE LIKE) --------------------------------------------- Currently process of table creation looks like: 1) Create dd::Table object describing table to be created 2) Store dd::Table object in DD tables and commit this change. 3) "Open" table, construct TABLE_SHARE, TABLE and handler objects for the table. 4) Call handler::create(name, TABLE, HA_CREATE_INFO) method to create table 5) Statement is written to binary log This is to be replaced with: 1) Create dd::Table object describing table to be created 2) Use dummy handler object to call new handler::add_extra_columns_and_keys() method to add additional hidden columns and keys which will be created by SE to the dd::Table object. 3) Store dd::Table object in DD tables. 4) Commit this change if engine is not capable of atomic DDL. The latter is necessary to ensure that in case of crash we won't get "orphan" tables in SE which do not have entries in DD. 5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it. 6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method for the table. Note that this method can update se_private_* fields in in-memory DD object. It also can create additional objects in DD like dd::Tablespace for file-per-table tablespaces or hidden dd::Table for auxiliary tables needed for FTS. These additional changes are not to be committed yet. Long-term such updates will be allowed only for engines which support atomic DDL. 7) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Long-term this step will be executed only for engines supporting atomic DDL. Short-term, for engines not capable of atomic DDL this change will be committed. 8) Write statement to the binary log (to the cache if SE supports atomic DDL). 9) Transaction is committed or rolled back. 10) Call new handlerton post_ddl hook to let engines which are capable of atomic DDL do necessary post-commit changes (e.g. we might want to remove files in SE on rollback). Note that for engines supporting atomic DDL the above steps 1) ... 9) are going to be part of the single transaction, i.e. will be atomic even if crash occurs. This also means that such engines should not commit the transaction internally during DDL until SQL-layer requests to do so. For engines which are incapable of atomic DDL we still try to execute statement in a manner which reduces risk of ugly side-effects in case of crash - e.g. DD and SE getting out of sync, having "orphan" tables in SE but not in DD,... Long-term the plan is to get rid of "name", TABLE and HA_CREATE_INFO parameters in handler::create() call and be able to create table only from its DD representation. B. DROP TABLES -------------- Current approach to dropping tables looks like (simplified): 1) For each table in the table list: 1.1) Try to drop table in SE by calling handler::ha_delete_table(), if error either proceed to next table or goto 2) depending on error type. 1.2) Remove table from the DD and commit this change. 2) Write up-to 3 artificial DROP TABLES statements for tables which were successfully dropped to binary log - we write separate statements for all transactional temporary tables, all non-transactional temporary tables and all base tables we have managed to drop. While this schema is not crash safe it is at least ensures that we get correct binary log in case when DROP TABLES statement cannot be completed fully due to inability drop some table (e.g. due to foreign keys or some other error). It works OK in cases when we are executing DROP TEMPORARY TABLES statement in the middle of transaction. It also works correctly in GTID mode. We never split DROP TABLES statement into several statements in binary log in it, because in this mode we prohibit DROP TABLES statement which mix temporary and non-temporary tables, or temporary transactional and temporary non-transactional tables. Of course, the above means that statement user-visible behavior is not atomic, i.e. that it can be partially executed and fail still have some side-effect. This is counter-intuitive for many users and doesn't play well with replication. With advent of atomic DDL it becomes possible to improve DROP TABLES implementation. Some important points to consider while working on this are: a) DROP TABLES should be atomic both from crash-safety and user-visible behavior points of view when all tables which are dropped are in SEs which support atomic DDL. b) When we have a mix of engines we still should try to be as crash-safe as possible. Atomicity from user-visible perspective is also nice. It is probably a bad idea to have side effect from failed DROP TABLES on tables in SEs which support atomic DDL. c) It should be possible to replicate DROP TABLES statements even from older servers, possibly sacrificing some corner cases and/or crash-safety for them. d) GTID mode should work, even when we replicate from older servers. Again some compromises are possible. After discussion with Replication Team the following algorithm for improved DROP TABLES was suggested (somewhat simplified): 1) For each table in the table list check to which one of 5 classes it belongs: a) non-existent table b) base table in SE which doesn't support atomic DDL c) base table in SE which supports atomic DDL d) temporary non-transactional e) temporary transactional In the process check if temporary tables to be dropped are used by some outer statement. Report and abort execution if there are any. 2) If this DROP TABLES doesn't have IF EXISTS clause and there are non-existent tables report appropriate error. This could have been done of previous step if we wouldn't need to include list of all missing tables in the error message. 2') Once WL#6929 is implemented we can check if we trying to drop parent tables in some FK without dropping child in the same statement and report an error here. Note that this way DROP TABLES will be able to handle most common error cases without having any side-effect. 3) For each table from class b) (base table in SE which doesn't support atomic DDL). 3.1) Call the handler::delete_table(const dd::Table) to delete the table in SE. 3.2) Remove table description from DD and commit the change 3.3) If {we are not in GTID mode} OR {we are in GTID mode AND there is only one table in class b) AND classes a) and c) are empty} write DROP TABLE statement for the table to binary log. Else we need to construct a single DROP TABLES statement for the GTID and write it to the binary log later. In case of error during any of the above steps, report it and abort statement execution. 4) For each table from class c) (in SE supporting atomic DDL) and class a) (non-existent). 4.1) Call the handler::delete_table(const dd::Table) method to mark table as dropped in SE. 4.2) Update DD to remove the table from it. Do not commit this change. 5) If {we are not in GTID mode} OR {we are in GTID mode AND class b) is empty} write DROP TABLES statement including all tables from to binary log Else we will need to construct a single DROP TABLES statement for the GTID and write it to the binary log later. 6) Commit or rollback 7) Call new handlerton post_ddl() method in order to wait until SE completes real removal of table supporting atomic DDL. Concurrent DDL operations on the table should be blocked at this stage. If any error occurs on steps 4) .. 6) report it and abort statement execution immediately. Note that we handle non-existent tables in the same way as supporting atomic DDL in order to have single nice DROP TABLES statement for the "main" InnoDB-only case. Note that we handle non-atomic tables first and then tables in SEs supporting atomic DDL in order to avoid situation when DROP TABLE fails while dropping non-atomic table and also drops some atomic tables as side-effect. Also note that with exception of problems with writing to binary log the DROP TABLES statement can't really fail after this point (see comments explaining why below). 8) If we are in GTID mode and had to postpone writing to binary log on steps 3.3) and 5) because of this, write DROP TABLES statement containing all tables we have managed to drop to the binary log. The above is necessary to handle replication in GTID mode from older servers or in cases when master and slave have different SEs for the same tables. Obviously we sacrifice crash-safety to compatibility here. 9) For each table from class d) (non-transactional temporary) call close_temporary_table() function to drop the table (this function will call handler::delete_table() in SE). Note that close_temporary_table() can't fail if the check which was done on step 1) was successfull. 10) Construct DROP TEMPORARY TABLES statement for tables from class d) and write it to binary log. Note that we don't have problem with GTIDs here since DROP TABLES statement doesn't allow mixing tables from class d) with any others in GTID mode. 11) For each table from class e) (transactional temporary) call close_temporary_table() function to drop the table (this function will call handler::delete_table() in SE). Again close_temporary_table() can't fail here if the check which was done on step 1) was successfull. 12) Construct DROP TEMPORARY TABLES statement for tables from class d) and write it to binary log (actually to its transaction cache). Same comment about absence of problem with GTIDs as above applies here. C) CREATE TABLE ... SELECT -------------------------- New approach to implementing this statement: 1) Create dd::Table object describing table to be created 2) Use dummy handler object to call new handler::add_extra_columns_and_keys() method to add additional hidden columns and keys which will be created by SE to the dd::Table object. 3) Store dd::Table object in DD tables. 4) Commit this change if engine is not capable of atomic DDL. The latter is necessary to ensure that in case of crash we won't get "orphan" tables in SE which do not have entries in DD. 5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it. 6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method for the table. Note that this method can update se_private_* fields in in-memory DD object. It also can create additional objects in DD like dd::Tablespace for file-per-table tablespaces or hidden dd::Table for auxiliary tables needed for FTS. These additional changes are not to be committed yet. Long-term such updates will be allowed only for engines which support atomic DDL. 7) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Long-term this step will be executed only for engines supporting atomic DDL. Short-term, for engines not capable of atomic DDL this change will be committed. The above steps are the same as for simple CREATE TABLE. 8) If we are in RBR mode write CREATE TABLE statement describing table structure into binary log (note that in reality at this point statement should end-up in transactional cache and not in on-disk binary log). 9) Insert data into table (by reading from source tables and doing handler::write_row() on newly created table). This should be part of the same transaction as above calls to handler::create() and upcoming writes to the binary log. In RBR mode this also writes events to binary log transactional cache. 10) In SBR mode write statement in binary log (for engines supporting atomic DDL to transactional cache). 11) Transaction is committed or rolled back. (Once support for atomic DDL in InnoDB is implemented handler::create() call, changes to on-disk DD, writing to binary log are going to be part of the same transaction, i.e. will be atomic even if crash occurs). 12) Handlerton post_ddl() hook is called to let SE do the necessary steps which should happen after transaction commit (e.g. in case of rollback we might want to wait for deletion of files belonging to table we failed to create). Note that to handle an error (e.g. during row insertion phase) for engines supporting atomic DDL it is enough to rollback the transaction. For engines without such support table needs to be dropped explicitly by calling handler::delete_table(), removing it from the DD and committing this change. D) ALTER TABLE ALGORITHM=COPY ----------------------------- New approach to implementing this statement: 1) Create dd::Table object describing new version of the table 2) Use dummy handler object to call new handler::add_extra_columns_and_keys() method to add additional hidden columns and keys which will be created by SE for new table version to the dd::Table object. 3) Store dd::Table object in DD tables. 4) Commit this change if engine of new version is not capable of atomic DDL. The latter is necessary to ensure that in case of crash we won't get "orphan" tables in SE which do not have entries in DD. 5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it. 6) Call handler::create(name, TABLE, HA_CREATE_INFO, dd::Table*) method for the new version of the table. Note that this method can update se_private_* fields in in-memory DD object. It also can create additional objects in DD like dd::Tablespace for file-per-table tablespaces or hidden dd::Table for auxiliary tables needed for FTS. These additional changes are not to be committed yet. Long-term such updates will be allowed only for engines which support atomic DDL. 7) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Long-term this step will be executed only if engine of new version suppors atomic DDL. Short-term, for engines not capable of atomic DDL this change will be committed. Again the above is pretty similar to the first part of CREATE TABLE implementation. 8) Copy the contents from old version of table to new version of table note that unlike in current code this should not commit the transaction if engine of new version supports atomic DDL. Note that if engine of new table version supports atomic DDL the error on any of the above steps can be handled by simply rolling back transaction. For other engines explicit deletion will be required. 9) Replace old table version with a new table version. To do this we need: 9.1) Commit the transaction if engine of the old version of the table is not capable of atomic DDL. 9.2) Inform engines about old version of table being replaced with new version. This is done through a series of handler::rename_table() calls. Update data in DD tables in the process accordingly. 9.3) If engine of old version or new version don't support atomic DDL commit changes after each rename operation during step 9.2). 10) Call handler::delete_table() for old version of the table. Remove it from DD. 11) Again if either of engines doesn't support atomic DDL it makes sense to commit the above change to minimize DD <-> SE discrepancy in case of crash. 12) Write to binary log Again if both engines of old and new table versions support atomic DDL it is possible to handle errors during the above steps by simple rollback. If at least one of them is not, then we need to take explicit actions, like reverting renames and deleting the new version. Moreover after point 10) totally correct error handling becomes impossible. 13) Commit or rollback (with advent of atomic DD all the above should be part of single atomic transaction) 14) Call handlerton post_ddl() hook to wait until SE completes real removal of old version of the table (or new version if rollback has happened). Concurrent DDL on the table should be blocked at this stage. E) ALTER TABLE ALGORITHM=INPLACE -------------------------------- 1) Create dd::Table object describing new version of the table 2) Use dummy handler object to call new handler::add_extra_columns_and_keys() method to add additional hidden columns and keys which will be created by SE for new table version to the dd::Table object. 3) Store dd::Table object in DD tables. 4) Commit this change if engine of new version is not capable of atomic DDL. The latter is necessary to ensure that in case of crash we won't get "orphan" tables in SE which do not have entries in DD. 5) "open" table, construct TABLE_SHARE, TABLE and handler objects for it. 6) Construct Alter_inplace_info object by comparing old and new versions of table. 7) Call handler::check_if_inplace_alter_supported() to figure out if in-place algorithm is applicable. 8) Call handler::ha_prepare_inplace_alter_table(Alter_inplace_info). This method can adjust se_private_* fields in dd::Table object describing new version of the table and do other modifications to DD if necessary. Long-term this will be allowed only if SE supports atomic DDL. 9) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Long-term this step will be executed only if engine supports atomic DDL and should not commit transaction. Short-term, for engines not capable of atomic DDL this change will be committed. 10) Call handler::inplace_alter_table() method for the table. 11) Call handler::commit_inplace_alter_table() method for the table. Similarly to step 8) this method can adjust dd::Table object and DD in general (long-term only for engines supporting atomic DDL). 12) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Again long-term this step will be executed only if engine supports atomic DDL and should not commit transaction. Short-term, for engines not capable of atomic DDL this change will be committed immediately. 13) Replace old table version in DD with a new version. 14) If table engine doesn't support atomic DDL commit the above change needs to be committed to reduce chances of DD and SE getting out of sync. 15) Inform storage engine about possibly required table rename by calling handler::rename_table(). 16) Update DD accordingly. Again if SE is not atomic-DDL-capable this change should be commited. 17) Write statement to binary log 18) Commit or rollback transaction. Note that once atomic DDL is supported for InnoDB all of the above steps will be part of one atomic transaction. 19) Call handlerton post_ddl() method to wait until SE completes real removal of indexes which were dropped and other similar operations which should happen post commit. Concurrent DDL on the table should be blocked at this stage. F. TRUNCATE TABLE ----------------- There are two paths in TRUNCATE TABLE implementation, one for HTON_CAN_RECREATE engines and another for other engines. Here we will cover the latter as it is the only which is relevant for engines which will support atomic DDL/InnoDB: 1) Call handler::truncate() for the table. SE is allowed to adjust "se_private_*" attributes for the table and do other DD modifications during this call. Long-term this will be allowed only for SEs which support atomic DDL. 2) Store dd::Table object (which was possibly adjusted on previous step) into DD tables. Long-term this step will be executed only if engine supports atomic DDL and should not commit transaction. Short-term, for engines not capable of atomic DDL this change will be committed. 3) Write statement to the binary log 4) Commit transaction or rollback it. 5) Call handlerton post_ddl() method in order to wait until SE will really finish truncation (e.g. remove old tablespace in case of commit, remove new tablespace in case of rollback). Concurrent DDL operations on the table should be blocked at this stage. As in previous cases once support for atomic DDL is implemented in InnoDB steps 1) .. 4) will become part of single atomic and crash-safe transaction from SQL-layer point of view. Note that new implementation of TRUNCATE PARTITION will be pretty similar to the one described above. E. RENAME TABLES ---------------- 1) For each element in rename list 1.1) Call handler::rename_table(). Again SE is allowed to adjust dd::Table object describing new version of table and do other DD modifications during this call. And again long-term this will be allowed for engines supporting atomic DDL only. 1.2) Store dd::Table object describing new version of table in the DD (including updates to it on the previous stage). If engine doesn't support atomic DDL or we have met such engine on previous iterations of the loop commit the transaction. 2) Write to binary log 3) Commit or rollback transaction (this is only relevant if all engines participating in RENAME support atomic DDL). 4) Use handlerton post_ddl() method to complete renaming in the storage engine (might be no-op). Note that if all engines involved in RENAME TABLE support atomic DDL steps 1) - 3) become part of single atomic and crash-safe transaction from SQL-layer point of view. Also error handling in such case boils down to simple transaction rollback. If at least one engine involved doesn't support atomic DDL RENAME TABLE becomes non-atomic. Handling of error requires renaming of tables in reverse order by calling handler::rename_table() and updating DD accordingly. F. CREATE/ALTER/DROP TABLESPACE ------------------------------- 1) Prepare DD objects for operation: 1.1) If we are processing CREATE TABLESPACE construct dd::Tablespace object for tablespace being created. Save the object in the DD. Do not commit this change if SE supports atomic DDL. Commit the change otherwise. 1.2) If we are processing ALTER TABLESPACE prepare dd::Tablespace objects describing new and old versions of tablespace. 1.3) In case of DROP TABLESPACE prepare dd::Tablespace object describing tablespace to be dropped. 2) Call handlerton::alter_tablespace() method. SE is allowed to adjust attributes of tablespace being created/altered during it. Long-term this will be allowed only for SEs which support atomic DDL. 3) Store updated version of dd::Tablespace object (this includes adjustments during step 2)). Delete the tablespace from the DD if it is DROP TABLESPACE. Commit the changes right away if SE doesn't support atomic DDL. 4) Write statement to the binary log. 5) Commit or rollback transaction. 6) Use handlerton post_ddl() method to complete operation in SE (e.g. to remove files of tablespace being dropped). G. DROP DATABASE ---------------- Similarly to DROP TABLES it makes sense to change user-visible behavior of DROP DATABASE to more atomic one. And indeed replication compatibility considerations are important for DROP DATABASE as well. Here is the description of new DROP DATABASE implementation: 1) Check if database directory contains any extra files which are not safe to remove directly and which will not be removed by dropping tables, fail if it does. Check if server has enough privileges to remove database directory, fail if it does not. 1') Once WL#6929 is implemented we can check if we will be trying to drop parent tables in some FK without dropping child and report an error here. 2) Remove files which do not belong to tables and which are known to be safe to delete. 3) Drop all tables in SEs which don't support atomic DDL one-by-one: 3.1) Call handler::delete_table() to remove table in SE 3.2) Remove table from the DD and commit the change immediately. 3.3) Unless we are in GTID mode write DROP TABLES IF EXISTS statement for the table dropped to binary log. Note that the goal of item 3.3) is to improve crash-safety. One possible alternative which sacrifices it but makes binary log more compact is to delay write to the binary log until we can write successfull DROP DATABASE to it, or when we know that there was some error during it and can write artificial DROP TABLES IF EXISTS statement for all tables which we have managed to drop. 4) In a single atomic transaction: 4.1) Drop all tables belonging to SE supporting atomic DDL by calling for each table handler::delete_table() and then removing it from the DD. 4.2) Remove all stored functions and procedures in the database. 4.3) Remove all events in the database. 4.4) Write DROP DATABASE statement to the binary log 4.5) Commit or rollback the transaction Any error in the process is handled by rolling back the transaction. If this happens and we have delayed writing to the binary log deletion of some atomic-DDL-non-capable table because of GTID mode report a special error (this is what happens now in similar situation). 5) Call post_ddl() handlerton method to let SEs finalize deletion of the tables. 6) Delete database directory from the filesystem. Of course, the above means that there is hole in atomicity if crash occurs after 4.5) and before 6). This problem requires introduction of redo log for database directory removal and will be solved outside of this WL. H. ALTER TABLE EXCHANGE PARTITION --------------------------------- There is additional problem with current implementation of this statement. It breaks encapsulation of partitioning support in SEs since it swaps table and partitions by simple rename of tables in SE and thus disclosing the fact that partitions are just another kind of tables. We solve this problem by introducing new Partition_handler::exchange_partition[_low](const char *part_table_path, const char *swap_table_path, uint part_id, dd::Table *part_table_def, dd::Table *swap_table_def) method. SEs which support native partitioning need to implement this method. Non-native partitioning will be no longer supported thanks to WL#8971. After that new implementation of this statement starts looking like: 1) Check if table and partition have compatible metadata and can be exchanged. 2) Call Partition_handler::exchange_partition() method to exchange table and partition. SE can adjust dd::Table objects for both non-partitioned and partitioned table as well as do other DD modifications during this step. Long-term this will be allowed only for SEs which support atomic DDL. 3) Save adjusted table definitions to the DD. Long-term this will be done only for SEs which support atomic DDL. Short-term for other SEs we will commit these changes immedeately. 4) Write statement to the binary log 5) Commit or rollback the transaction 6) Call handlerton post_ddl() method to let SE finalize exchange (might be no-op). Similarly to other statements if SE supports atomic DDL any error can be handled by simple rollback. For SEs which do not support it, exchange of table and partition in opposite direction might be required to do this.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.