WL#9525: InnoDB_New_DD: update InnoDB specific metadata into new DD for DDL

Affects: Server-8.0   —   Status: Complete

This is the worklog to propogate InnoDB specific metadata into new DD for DDL
operations. More specifically, filling the se_private_data of different system
tables for create table, create tablespace and alter table etc.

All these metadata, which should be written for both non-partitioned tables and
partitioned tables, would be filled during create table and other alter table
time, so they will be propagated into DD tables.

This worklog also includes creating the new DD tablespace object, such as
dd::tablespaces and dd::Tablespace_files for "implicit" (file per table) tables
during create table time or alter table time.

This worklog would also "fix" the newDD system tables' se_private_data so their
root page/space id points etc. to the right place. So that the system can fetch
these basic information when the newDD system table themselves are opened.

The whole work depends on WL#7743, which implements proper DDL APIs for InnoDB 
by passing dd::Table/dd::Tablespace objects needed to InnoDB. There is an issue 
that partitioned table doesn't support its native 
ha_innopart::delete_table/rename_table APIs, but leverages the 
ha_innobase::delete_table/rename_table. Since server is not able to pass the 
valid dd::Table for partitioned table to these two ha_innobase APIs, WL#7743 
should also support ha_innopart::delete_table/rename_table and pass valid 
dd::Table to them. This worklog implements these two new APIs.

To verify this worklog works, opening table by accessing the passed in dd::Table 
should also be implemented in this worklog, so once the InnoDB internal table 
object is not in the memory(evicted, or after restart), it can be constructed 
from the dd::Table, etc.
FR-1: Following metadata should be filled in when they're created or changed


1. se_private_data:
1.1 "autoinc" for current last used AUTO_INCREMENT counter, and a "version" of
the "autoinc" may be needed when coming to crash-safe DDL.
1.2 "data_directory" to indicate if user specified DATA DIRECTORY.

2. se_private_id:
2.1 InnoDB internal table id

3. options:
3.1 "key_block_size" for compressed page size
3.2 "row_type" for record format

dd::Index (for each index)

4. dd::Index::tablespace_id:
4.1 The tablespace id generated by Server(not internal one)

5. se_private_data:
5.1 "id" for internal index id
5.2 "root" for index root page no
5.3 "trx_id" for the transaction id with which the transaction just modified
the index

6. options:
6.1 "block_size" would exist if key_block_size for the table is specified,
however, it's possible that sometimes key_block_size is ignored totally,
in this case, every this options "block_size" should be removed too.
Note: Setting "block_size" to 0 doesn't help, server assumes this either is
non-zero or doesn't exist.

dd::Tablespace (for file_per_table tablespace)

7. dd::Tablespace::name:
7.1 If it's a file-per-table tablespace, this would be 
"innodb_file_per_table.x", where x is the number of the internal tablespace id
7.2 If it's a shared tablespace, then this would be the tablespace name
specified at create tablespace time

8. se_private_data:
8.1 "id" for internal tablespace id;
8.2 "flags" for current fil_space_t::flags

dd::Tablespace_files (for file_per_table tablespace files)

9. dd::Tablespace_files::filename:
9.1 The path of corresponding tablespace

FR-2: All these metadata should be kept update-to-date after DDLs.

FR-3: Correct se_private_data, se_private_id and internal tablespace id should
be calculated for all DD tables. With this feature, all non SYS_* tables should
be opened by accessing dd::* tables instead of SYS_* tables.

FR-4: Hidden columns like DB_ROW_ID, DB_TRX_ID and DB_ROLL_PTR will be appended
by ha_innobase::get_extra_columns_and_keys().

FR-5: This worklog only write metadata for all tables except temporary tables
and to be deprecated SYS_* tables.

FR-6: Support ha_innopart::delete_table/rename_table().

Notes: When opening tables, all necessary metadata should be got from DD tables
as before.
Basically, on every DDL, metadata can(should) be read from InnoDB internal 
dict_table_t and dict_index_t objects etc. and filled into global DD objects, 
such as dd::Table, dd::Index and dd::Tablespace. On opening table, metadata 
should be read from dd::Table, dd::Index and dd::Tablespace, then filled into 
dict_table_t and dict_index_t etc. This should apply to all tables, except 
original SYS_* tables, because they're created internally only and will be 
removed in the future.

Partitioned table's metadata should be in dd::Partition and dd::Partition_index. 
dd::Partition(s) have to be get from dd::Table, so APIs for partitioned table 
have to be changed accordingly too, for example, in create(), it's necessary to 
iterate over all dd::Partition(s) in the dd::Table, and store metadata in proper 
dd::Partition for every partition.

There is no need to update metadata for all temporary tables. This is because
1. Temporary tables would be dropped after crash/shutdown, no need to persist
their metadata
2. Temporary table objects in InnoDB would not be evicted, so it's possible to
get them via name searching
3. Server will handle the dd::Table for temporary table, which will be created
during table creation and keep until shutdown, when the same object would be
deleted along with table

InnoDB has to create dd::Tablespace itself if a new file_per_table table would 
be created, because Server doesn't do so. In this case, the tablespace name is 
desribed in FR-1 6.1. If user creates a shared tablespace, the tablespace name 
should be exact the one specified.

Any operation which has to access DD tables should be done without holding any 
InnoDB mutex and lock, because accessing DD tables will go through InnoDB too, 
it's necessary to prevent deadlock.

Let's go through the DDL operations

1. ha_innobase::create()
1.1 Write back AUTOINC counter to dd::Table
1.2 If it's file-per-table table, create the dd::Tablespace accordingly
1.3 If the table is in a shared tablespace, check if the dd::Tablespace exists
1.4 If not a temporary table, write back table options, as said in FR-1 3
1.5 If not a temporary table, Write back metadata to dd::Table and dd::Index

2. ha_innobase::delete_table()
2.1 If this is a file-per-table table and not a temporary table, drop the 
dd::Tablespace object accordingly
2.2 Other metadata of this table would be dropped by Server

3. ha_innobase::rename_table()
3.1 If this is a file-per-table table, rename the datafile name in 
3.2 No need to change the dd::Tablespace name, because the internal tablespace 
id doesn't get changed

4. ha_innobase::truncate()
4.1 Write back the AUTOINC counter
4.2 Since the table would be re-created, old metadata like se_private_id and 
se_private_data of dd::Index have to be cleared

5. ha_innobase::*inplace_alter_table(), both old and new dd::Table would be 
passed in
5.1 If this is a no-op operation, just copy the metadata from old dd::Table to 
new dd::Table, then nothing to do further
5.2 If rebuild is necessary
5.2.1 If old table is file-per-table, drop the dd::Tablespace
5.2.2 If new table is file-per-table, create the new dd::Tablespace
5.2.3 If this is not temporary table, set dd::Table options
5.2.4 If this is not temporary table, Write metadata to dd::Table and dd::Index
5.3 If no rebuild
5.3.1 Copy the se_private_id from old dd::Table
5.3.2 Write metadata for all dd::Index of new dd::Table, which needs to search 
proper index first
5.4 In commit_inplace_alter_table(), write back the AUTOINC counter

6. ha_innopart::create()
6.1 This is nearly the same as is, except data_file_name and index_file_name can 
be got from dd::Table options
6.2 Original for-loop on partition_element should be replaced by loop on 
dd::Partition(s) of dd::Table
6.3 Write back AUTOINC counter to dd::Table
6.4 To write back other metadata for each partition, do the same as 1.2~1.5.

7. ha_innopart::delete_table()
7.1 Iterate over all dd::Partition(s) of the dd::Table, and do 2.1-2.2 for each 

8. ha_innopart:;rename_table()
8.1 Iterate over all dd::Partition(s) of the dd::Table, and do 3.1-3.2 for each 

9. ha_innopart::truncate_table()
9.1 Iterate over all dd::Partition(s) of the dd::Table, and do 4.2 for each 
9.2 Write back the AUTOINC counter if necessary

10. ha_innopart::*inplace_alter_table()
10.1 Iterate over all dd::Partition(s) of both old and new dd::Table, and do 
nearly the same of 5 for each partition

11. To open a table, we will
11.1 Search if the table is already in in-memory cache, if so, return this
11.2 If not, create it by extracting metadata from dd::Table, etc.
11.3 Re-check if this table is already cached in memory, if so, still use this 
one, if not, use the new created one. This is because in step 11.2, 
dict_sys_t::mutex would be released

12. Hidden columns should be added to dd::Table when creating a table, which 
12.1 If fulltext index exists, add the FTS_DOC_ID column if necessary
12.2 Add hidden unique index with the hidden column DB_ROW_ID if necessary
12.3 Add proper PRIMARY KEY columns to each secondary index
12.4 Add InnoDB system columns as hidden column, including DB_TRX_ID and 
12.5 Add all non-virtual columns to the clustered index unless they are already 
part of the PRIMARY KEY

To get correct se_private_data for DD tables, in 
ha_innobase::get_se_private_data(), se_private_id for the dd::Table and 
se_private_data for every dd::Index should be filled in for every DD table. Some 
index root pages need to be adjusted for different page sizes.