WL#9534: InnoDB_New_DD: Instantiate InnoDB in-memory metadata with newDD objects

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

This worklog is to instantiate InnoDB in-memory metadata from the
newDD objects. The work is used to be performed by dict_load_* code to read
metadata from InnoDB system tables, now with "InnoDB system table" being
replaced by newDD system tables, the instantiation work would now be done
through reading the newDD metadata objects.

There are two scenario such instantiation happens:

1. Requested by ha_innobase::open for SQL operation
2. Requested by InnoDB purge thread, rollback etc. internal operation

In short, this is the work mainly serverd as a translation layer that translates
DD in-memory metadata to that of InnoDB.

1. Tables/indexes newly created continue to work after server reboot

2. Purge operation continues to work as expected, espcially during crash recovery

3. DML rollback continues to work as expected

4. Proper MDL lock are taken for all DML/DDL and background operation, and
DML/DDL works as expected (except FK & partition)
Essentially, this work is to replace dict_table_open_* (on_id/on_name) with
their newDD counterpart. It will replaces metadata instantiation
(dict_load_table etc.) with newDD counterpart.

InnoDB metadata structures (dict_table_t) are engine wide, this means they are
used by multiple sessions, and live in memory for a longer period of time than
their server counterpart. There are limited scenario the InnoDB metadata needs
to be intantiated from scratch:

1) with ha_innobase::open() for fulfilling SQL query requests

2) directly table open for internal operation, such as purge, and crash recovery

In other scenario, such as those used by rollback, the in-memory metadata should
already be instantiated and only need to locate them from Hash structures.

The normal procedure to open a table would be:

1) Check if the InnoDB table metadata is already cached, if so, use it

2) Check if MDL needs to be placed on the table, if so, request MDL on the table.

3) If dict_table_t is not found in-memory , fetch information from MySQL TABLE,
TABLE_SHARE or dd::table objects and instantiate it (it used to be done through
read InnoDB system tables).

The last step is essentially a process of creating a set of in-memory
metadata(for InnoDB) from another set of in-memory metadata (from server). It
never directly pass from the new DD system tables.

Some examples are:

1) InnoDB table row format is now getting from TABLE_SHARE::row_type and
TABLE_SHARE::real_row_type (ROW_TYPE_DYNAMIC, ROW_TYPE_COMPRESSED etc.)

2) key_block_size is now getting from TABLE_SHARE::key_block_size

3) encryption is now getting from TABLE::encrypt_type

4) Index info from TABLE::key_info

5) InnoDB table ID from dd::Table::se_private_id()

6) temp table is from dd::Table::is_persistent()

7) Column info is from  TABLE::field[]

8) Virtual column info would also extracted from TABLE::field[]

9) Index Id is from Index::se_private_data (DD_INDEX_ID)

10) Index spaceid is also from Index::se_private_data (DD_SPACE_ID)

11) Index root page is also from Index::se_private_data (DD_INDEX_ROOT)

we will enumerate these mappings more in detail in LLD.

Another notable difference with 5.7 is that InnoDB can request proper MDL lock
when the table is opened directly from InnoDB. Shared MDL is taken by background
operations such as purge when they opens a table. So with the MDL lock, we will
no long need dict_operation_lock to synchronize work in purge and drop table.

One exception is the SDI internal tables/indexes, they are InnoDB specific, so
they are currently still uses dict_operation_lock.

This worklog also makes virtual column/index work as expected. The Sys_virtuals
table is no longer present. And all virtual column/index related information are
parsed directly from server TABLE structure, when the table is opened in the
first time.
A major observation on the project is that server/runtime has kept the TABLE and
TABLE_SHARE objects, so have InnoDB kept its dict_table_t and other dict_*_t
objects. The reason is mainly:
1. The project time constraint
2. Try to minimize the impact on other modules(such as optimizer) as much as
possible.

For server, new DD classes are added to accomodate new information to be stored
in this project, noteablely, the dd::Table, dd::Column etc. classes. In the
future, these objects could be merged with TABLE and TABLE_SHARE classes.

For this project, the main purpose is to translate the dictionary information
from server objects to InnoDB in-memory objects, so we will get information from
dd::Table, TABLE and TABLE_SHARE and then fill the dict_table_t etc.

Following is a simple list and examples on how we
extract each in memory objects from DD/server in-memory classes:

==================================================================
Section 1: Mapping between DD in-memory objects/funcs with InnoDB in-memory
structure members:

1) dict_table_t:

dict_table_t::id <== dd::Table::se_private_id()

dict_table_t::n_cols <== TABLE_SHARE::fields

dict_table_t::cols <== TABLE::field[]

dict_table_t::flags <== TABLE_SHARE::row_type, TABLE_SHARE::key_block_size and
TABLE_SHARE::key_block_size

dict_table_t::flags2  
     DICT_TF2_TEMPORARY <== dd::Table::is_persistent()
     DICT_HAS_DOC_ID <== Look for FTS_DOC_ID in with dd_find_column() in
dd::TABLE, and check if it is hidden
     DICT_TF2_FTS <== Look for Fulltext index in dd::TABLE (dd::Index::type() ==
dd::Index::IT_FULLTEXT)
     DICT_TF2_DISCARD <== dd_part->table().options()dd::Index
     DICT_TF2_ENCRYPTION <== dd_part->table().options()


dict_table_t::data_dir_path <== dd::Table::se_private_data (data_directory)

dict_table_t::space  <== dd::Tablespace::se_praivate_data

dict_table_t::autoinc <== dd::Table::se_private_data (autoinc)

2) dict_index_t
dict_index_t::name <== TABLE::key_info::name

dict_index_t::type <== TABLE::key_info::flags

dict_index_t::id <== dd::Index::se_private_data() (DD_INDEX_ID)

dict_index_t::space <== dd::Tablespace::se_private_data() (DD_SPACE_ID)

dict_index_t::root <== dd::Index::se_private_data()  (DD_INDEX_ROOT)

dict_index_t::merge_threshold <== TABLE_SHARE::comment::str()

dict_index_t::n_fields <== TABLE::key_info::user_defined_key_parts

dict_field_t::prefix_len <== TABLE::key_info::key_part::length,
TABLE::key_info::key_part::key_part_flag (HA_PART_KEY_SEG)

3) dict_col_t

dict_col_t::prtype <==> TABLE::field::type(), TABLE::field::real_maybe_null()

dict_col_t::len <==> TABLE::field::pack_length(), TABLE::field::length_bytes

dict_col_t::mtype <==> TABLE::field::type(), TABLE::field::real_type(),
TABLE::field::binary(), TABLE::field::flags

dict_v_col_t::base_col <=== TABLE::field::gcol_info

4) dict_foreign_t

dict_foreign_t::type <==   dd::Foreign_key::update_rule(),
dd::Foreign_key::delete_rule()

dict_foreign_t::n_fields <==  dd::Foreign_key::elements().size()

dict_foreign_t::referenced_col_names <== 
dd::Foreign_key::elements::referenced_column_name()

dict_foreign_t::foreign_col_names <== dd::Foreign_key::elements::column()

dict_foreign_t::referenced_table_name <==
dd::Foreign_key::referenced_table_schema_name(),
dd::Foreign_key::referenced_table_name()

So as shown in those examples, the in-memory metadata info can be extracted from
TABLE, TABLE_SHARE or dd::Table, dd:Index objects.

==================================================================
Section 2: Steps for dd_table_open_on_[id,name]

As mentioned earlier in HLS, there are a couple of ways to open table,
1) ha_innobase::open for SQL queries
2) dd_open_table_on_[id,name] for internal operation, DDL, FTS tables and other
bk operations.

For 2), if it is dd_open_table_on_id, then we will use following APIs to fetch
the table name and open table

a) Get the table name by one of following, depending on if it is partition table
dd::cache::Dictionary_client::get_table_name_by_se_private_id()
dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id()

b) Once we have the name, call dd_mdl_acquire()-> dd::acquire_shared_table_mdl()
to acquire MDL on the table (note, to acquire dd::Table, tehre must be a MDL on
the table))

c) Then use dd::get_dd_client(thd)->acquire() to get dd::Table with table names.

d) Call dd_table_open_on_dd_obj() with dd::Table, in this function, TABLE and
TABLE_SHARE are also fetched with following APIs:

open_table_def() to get TABLE_SHARE
open_table_from_share() to get TABLE object.

Then with all three objects available (dd::Table, TABLE, TABLE_SHARE), we can
now get all information needed to fill dict_table_t and dict_index_t etc.

For dd_table_opne_on_name(), it will skip the step a) and directly go to step b)
and onwards.

==================================================================
Section 3: Mapping from InnoDB System Table to DD System table
 
Following is a list of where each InnoDB system table column end with newDD
system tables

1. InnoDB SYS Tables Mapping
TABLE: SYS_TABLES

1.1 SYS_TABLES::NAME

InnoDB: Stores database name & table name. For example like "test/t1"

New DD: mysql.tables.name
stores only table name ("t1"). Database name is stored in mysql.schemata.

Name is usually passed from server in the case of opening a table for SQL.
However, during DDL, queries involving FTS etc. and internal operations, we
might also need to open the table with ID. In these cases, APIs to fetch Table
Name by InnoDB Object ID (se_private_id):

dd::cache::Dictionary_client::get_table_name_by_se_private_id()
dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id()

And Vice Versa, to we can obtain a dd::Table using following API:

dd::cache::Dictionary_client::acquire()

1.2 SYS_TABLES::ID

InnoDB: known as InnoDB table id. Uniquely identifies a table. (Also used in
purge, import, etc)

New DD: mysql.tables.se_private_id

The se_private_id is obtained from dd::Table with following API:

dd::Table::se_private_id()

1.3 SYS_TABLES::N_COLS

InnoDB: Stores the number of cols in a table. Also encodes the virtual columns
31st bit of this field is used to determine ROW_FORMAT, 1 -COMPACT, 0 - REDUNDANT

New DD: Equivalent doesn't exist. Instead we have to iterate over mysql.columns
for a given mysql table id.

Obtained from TABLE_SHARE::fields

1.4 SYS_TABLES::TYPE

InnoDB: Stores the table flags.
Determines: Compact or redundnant, zip size (compressed page size),
atomic blobs (768 byte prefix in-page or not),
has_data_dir (DATA-DIRECTORY remote tablespace), has_shared_space (General
tablespace).   

New DD:
a) row_type, Compact or redundant : mysql.table.row_format  dd::Table::RF_DYNAMIC,
dd::Table::RF_COMPACT etc.
This is obtained from TABLE_SHARE::row_type

b) Zip size             : mysql.table.options key_block_size
This is obtained from TABLE_SHARE::key_block_size

datadir              : the exact data directory is stored in
dd::Table::se_private_data, shown as
mysql.tables.se_private_data.data_directory; if a partitioned table, it's in
dd::Partition::options
has_data_dir         : removed
has_shared_space     : FIL_TYPE_SHARED or FIL_TYPE_IMPLICIT,

1.5 SYS_TABLES::MIX_ID

InnoDB: Unused

1.6 SYS_TABLES::MIX_LEN

InnoDB: Stores DICT_TF2 flags
- temporary, has_doc_id, has_fts_index, use_file_per_table, fts aux format,
is_intrinsic, encryption

New DD:
is_temporary       : removed, temporary table metadata will only be in cache,
not on system tables
has_doc_id         : Not yet implemented
use_file_per_table : This now can be found in mysqql.tablespaces.name, if it's
innodb_file_per_table.x
fts_aux-format     : Not yet implimented
is_intrinsic       : removed
encryption         : mysql.table.options encrypt_type,
mysql.tablespaces.se_private_data flags

1.7 SYS_TABLES::CLUSTER_NAME (unused)
1.8 SYS_TABLES::SPACE

InnoDB: table space id. The id of the tablespace where table resides.

New DD: m_tablespace (not the same as InnoDB tablespace id)

==================================================================
2. TABLE: SYS_COLUMNS


2.1 SYS_COLUMNS::TABLE_ID

InnoDB: table_id that uniquely identifies a InnoDB table

New DD: mysql.columns.table_id (but this is not InnoDB table_id). No such equivalent
exist and not necessary

2.2 SYS_COLUMNS::POS

InnoDB: ordinal position of a column in table, also encodes vcol

New DD: mysql.columns.oridinal_position
Vcol info: - virtual column number &  virtual column sequence (the "nth" virtual
column) ?

2.3 SYS_COLUMNS::NAME

InnoDB: column name

New DD: mysql.columns.name

In-Memory: TABLE::field::name

2.4 SYS_COLUMNS::MTYPE

InnoDB: main type:1-varchar, 2- char, 3- fixbinary etc

New DD: mysql.columns.type

2.5 SYS_COLUMNS::PRTYPE

InnoDB: determines mysql data type,charset, nullability, precision

New DD: combination of mysql.columns.is_nullable, is_zerofill,
is_unsigned,numeric_precision etc

2.6 SYS_COLUMNS::LEN

InnoDB: col length, 4-int, 8-bigint, 	for multi-byte includes the charset len
(2*N, 3*N , 4*N etc)

New DD: mysql.columns.char_length

==================================================================
3. TABLE: SYS_INDEXES
3.1 SYS_INDEXES::TABLE_ID

InnoDB: table_id of the table where index belongs

New DD: mysql.indexes.table_id (this is not innodb table id)

3.2 SYS_INDEXES::ID

InnoDB: index_id that is unique within a tablespace

New DD: mysql.indexes.se_private_data: Ex: id=111;root=3;trx_id=1803; 

3.3 SYS_INDEXES::NAME

InnoDB: PRIMARY, GEN_CLUST_INDEX, index name

New DD: mysql.indexes.name

3.4 SYS_INDEXES::N_FIELDS

InnoDB: Number of fields in the index (0 for GEN_CLUST_INDEX)

New DD: counting the number of rows from mysql.index_column_usage with specific
index id

Obtained from TABLE_SHARE::keys

3.5 SYS_INDEXES::TYPE

InnoDB:  clustered, secondary, unique, primary, FTS, spatial, vcol

New DD: derive from mysql.indexes.type + mysql.indexes.algorithm

Obtained from TABLE::key_info::flags

3.6 SYS_INDEXES::SPACE

InnoDB: space_id where the index resides

New DD: Can be only found in the se_private_data of corresponding dd::Tablespace

3.7 SYS_INDEXES::PAGE_NO

InnoDB: Root page number of the index

New DD: Stored as part of se_private_data. id=111;root=3;trx_id=1803

3.8 SYS_INDEXES::MERGE_THRESHOLD

InnoDB: thresold for merging pages

New DD: mysql.indexes.comment (ex. MERGE_THRESHOLD=40)

==================================================================
4. TABLE: SYS_FIELDS
4.1 SYS_FIELDS::INDEX_ID

InnoDB: index_id of the index where this index field belongs to

New DD: mysql.index_column_usage.index_id (note, this is not innodb index_id)

4.2 SYS_FIELDS::POS

InnoDB: position of key field in index.

New DD: mysql.index_column_usage.ordinal_position

4.3 SYS_FIELDS::COL_NAME

InnoDB: field name

New DD: mysql.index_column_usage.column_id (which in turn can be used to get the
name from mysql.columns)

Note: mysql.index_column_usage.order stores the ASC/DESC property of index field


==================================================================
5. TABLE: SYS_TABLESPACES
5.1 SYS_TABLESPACES::SPACE

InnoDB: tablespace id (space_id)

New DD: mysql.tablespaces.se_private_data "id" is the space_id. (flags=353;id=5;)
        Note: mysql.tablespaces.id is not InnoDB space_id.

5.2 SYS_TABLESPACES::NAME

InnoDB: Tablespace name

New DD: mysql.tablespaces.name. For implicit tablespaces (file-per-table
tablespaces): innodb_file_per_table.6

5.3 SYS_TABLESPACES::FLAGS

InnoDB: tablespace flags

New DD: mysql.tablesapces.se_private_data "flags" (flags=353;id=5;)

==================================================================
6. TABLE: SYS_DATAFILES
6.1 SYS_DATAFILES::SPACE

InnoDB: tablespace id

New DD: mysql.tablespace_files.tablespace_id (equivalent. This is not innodb
space_id)
The InnoDB tablespace id is stored in dd::Tablespace::se_private_data, shown as
mysql.tablespaces.se_private_data.

6.2 SYS_DATAFILES::PATH

InnoDB: path of the tablespace file

New DD: mysql.tablespace_files.file_name

==================================================================
7. TABLE: SYS_VIRTUAL

There is no corresponding system table for SYS_VIRTUAL in DD. Metadata are
parsed from mysql.columns.generation_expression. There is prebuilt::vcols and
prebuilt::bcols cache the indexed virtual column and base column. If properly
done, the relationship between virtual column and base column should be stored
in a system table after parsing, rather than keeping a generation clause.