WL#9525: InnoDB_New_DD: update InnoDB specific metadata into new DD for DDL
Affects: Server-8.0 — Status: Complete
This is the worklog to propogate InnoDB specific metadata into new DD for DDL operations. More specifically, filling the se_private_data of different system tables for create table, create tablespace and alter table etc. All these metadata, which should be written for both non-partitioned tables and partitioned tables, would be filled during create table and other alter table time, so they will be propagated into DD tables. This worklog also includes creating the new DD tablespace object, such as dd::tablespaces and dd::Tablespace_files for "implicit" (file per table) tables during create table time or alter table time. This worklog would also "fix" the newDD system tables' se_private_data so their root page/space id points etc. to the right place. So that the system can fetch these basic information when the newDD system table themselves are opened. The whole work depends on WL#7743, which implements proper DDL APIs for InnoDB by passing dd::Table/dd::Tablespace objects needed to InnoDB. There is an issue that partitioned table doesn't support its native ha_innopart::delete_table/rename_table APIs, but leverages the ha_innobase::delete_table/rename_table. Since server is not able to pass the valid dd::Table for partitioned table to these two ha_innobase APIs, WL#7743 should also support ha_innopart::delete_table/rename_table and pass valid dd::Table to them. This worklog implements these two new APIs. To verify this worklog works, opening table by accessing the passed in dd::Table should also be implemented in this worklog, so once the InnoDB internal table object is not in the memory(evicted, or after restart), it can be constructed from the dd::Table, etc.
FR-1: Following metadata should be filled in when they're created or changed dd::Table 1. se_private_data: 1.1 "autoinc" for current last used AUTO_INCREMENT counter, and a "version" of the "autoinc" may be needed when coming to crash-safe DDL. 1.2 "data_directory" to indicate if user specified DATA DIRECTORY. 2. se_private_id: 2.1 InnoDB internal table id 3. options: 3.1 "key_block_size" for compressed page size 3.2 "row_type" for record format dd::Index (for each index) 4. dd::Index::tablespace_id: 4.1 The tablespace id generated by Server(not internal one) 5. se_private_data: 5.1 "id" for internal index id 5.2 "root" for index root page no 5.3 "trx_id" for the transaction id with which the transaction just modified the index 6. options: 6.1 "block_size" would exist if key_block_size for the table is specified, however, it's possible that sometimes key_block_size is ignored totally, in this case, every this options "block_size" should be removed too. Note: Setting "block_size" to 0 doesn't help, server assumes this either is non-zero or doesn't exist. dd::Tablespace (for file_per_table tablespace) 7. dd::Tablespace::name: 7.1 If it's a file-per-table tablespace, this would be "innodb_file_per_table.x", where x is the number of the internal tablespace id 7.2 If it's a shared tablespace, then this would be the tablespace name specified at create tablespace time 8. se_private_data: 8.1 "id" for internal tablespace id; 8.2 "flags" for current fil_space_t::flags dd::Tablespace_files (for file_per_table tablespace files) 9. dd::Tablespace_files::filename: 9.1 The path of corresponding tablespace FR-2: All these metadata should be kept update-to-date after DDLs. FR-3: Correct se_private_data, se_private_id and internal tablespace id should be calculated for all DD tables. With this feature, all non SYS_* tables should be opened by accessing dd::* tables instead of SYS_* tables. FR-4: Hidden columns like DB_ROW_ID, DB_TRX_ID and DB_ROLL_PTR will be appended by ha_innobase::get_extra_columns_and_keys(). FR-5: This worklog only write metadata for all tables except temporary tables and to be deprecated SYS_* tables. FR-6: Support ha_innopart::delete_table/rename_table(). Notes: When opening tables, all necessary metadata should be got from DD tables as before.
Basically, on every DDL, metadata can(should) be read from InnoDB internal dict_table_t and dict_index_t objects etc. and filled into global DD objects, such as dd::Table, dd::Index and dd::Tablespace. On opening table, metadata should be read from dd::Table, dd::Index and dd::Tablespace, then filled into dict_table_t and dict_index_t etc. This should apply to all tables, except original SYS_* tables, because they're created internally only and will be removed in the future. Partitioned table's metadata should be in dd::Partition and dd::Partition_index. dd::Partition(s) have to be get from dd::Table, so APIs for partitioned table have to be changed accordingly too, for example, in create(), it's necessary to iterate over all dd::Partition(s) in the dd::Table, and store metadata in proper dd::Partition for every partition. There is no need to update metadata for all temporary tables. This is because 1. Temporary tables would be dropped after crash/shutdown, no need to persist their metadata 2. Temporary table objects in InnoDB would not be evicted, so it's possible to get them via name searching 3. Server will handle the dd::Table for temporary table, which will be created during table creation and keep until shutdown, when the same object would be deleted along with table InnoDB has to create dd::Tablespace itself if a new file_per_table table would be created, because Server doesn't do so. In this case, the tablespace name is desribed in FR-1 6.1. If user creates a shared tablespace, the tablespace name should be exact the one specified. Any operation which has to access DD tables should be done without holding any InnoDB mutex and lock, because accessing DD tables will go through InnoDB too, it's necessary to prevent deadlock. Let's go through the DDL operations 1. ha_innobase::create() 1.1 Write back AUTOINC counter to dd::Table 1.2 If it's file-per-table table, create the dd::Tablespace accordingly 1.3 If the table is in a shared tablespace, check if the dd::Tablespace exists 1.4 If not a temporary table, write back table options, as said in FR-1 3 1.5 If not a temporary table, Write back metadata to dd::Table and dd::Index 2. ha_innobase::delete_table() 2.1 If this is a file-per-table table and not a temporary table, drop the dd::Tablespace object accordingly 2.2 Other metadata of this table would be dropped by Server 3. ha_innobase::rename_table() 3.1 If this is a file-per-table table, rename the datafile name in dd::Tablespace_files 3.2 No need to change the dd::Tablespace name, because the internal tablespace id doesn't get changed 4. ha_innobase::truncate() 4.1 Write back the AUTOINC counter 4.2 Since the table would be re-created, old metadata like se_private_id and se_private_data of dd::Index have to be cleared 5. ha_innobase::*inplace_alter_table(), both old and new dd::Table would be passed in 5.1 If this is a no-op operation, just copy the metadata from old dd::Table to new dd::Table, then nothing to do further 5.2 If rebuild is necessary 5.2.1 If old table is file-per-table, drop the dd::Tablespace 5.2.2 If new table is file-per-table, create the new dd::Tablespace 5.2.3 If this is not temporary table, set dd::Table options 5.2.4 If this is not temporary table, Write metadata to dd::Table and dd::Index 5.3 If no rebuild 5.3.1 Copy the se_private_id from old dd::Table 5.3.2 Write metadata for all dd::Index of new dd::Table, which needs to search proper index first 5.4 In commit_inplace_alter_table(), write back the AUTOINC counter 6. ha_innopart::create() 6.1 This is nearly the same as is, except data_file_name and index_file_name can be got from dd::Table options 6.2 Original for-loop on partition_element should be replaced by loop on dd::Partition(s) of dd::Table 6.3 Write back AUTOINC counter to dd::Table 6.4 To write back other metadata for each partition, do the same as 1.2~1.5. 7. ha_innopart::delete_table() 7.1 Iterate over all dd::Partition(s) of the dd::Table, and do 2.1-2.2 for each partition 8. ha_innopart:;rename_table() 8.1 Iterate over all dd::Partition(s) of the dd::Table, and do 3.1-3.2 for each partition 9. ha_innopart::truncate_table() 9.1 Iterate over all dd::Partition(s) of the dd::Table, and do 4.2 for each partition 9.2 Write back the AUTOINC counter if necessary 10. ha_innopart::*inplace_alter_table() 10.1 Iterate over all dd::Partition(s) of both old and new dd::Table, and do nearly the same of 5 for each partition 11. To open a table, we will 11.1 Search if the table is already in in-memory cache, if so, return this 11.2 If not, create it by extracting metadata from dd::Table, etc. 11.3 Re-check if this table is already cached in memory, if so, still use this one, if not, use the new created one. This is because in step 11.2, dict_sys_t::mutex would be released 12. Hidden columns should be added to dd::Table when creating a table, which includes: 12.1 If fulltext index exists, add the FTS_DOC_ID column if necessary 12.2 Add hidden unique index with the hidden column DB_ROW_ID if necessary 12.3 Add proper PRIMARY KEY columns to each secondary index 12.4 Add InnoDB system columns as hidden column, including DB_TRX_ID and DB_ROLL_PTR 12.5 Add all non-virtual columns to the clustered index unless they are already part of the PRIMARY KEY To get correct se_private_data for DD tables, in ha_innobase::get_se_private_data(), se_private_id for the dd::Table and se_private_data for every dd::Index should be filled in for every DD table. Some index root pages need to be adjusted for different page sizes.
Copyright (c) 2000, 2023, Oracle Corporation and/or its affiliates. All rights reserved.