WL#9534: InnoDB_New_DD: Instantiate InnoDB in-memory metadata with newDD objects
Affects: Server-8.0
—
Status: Complete
This worklog is to instantiate InnoDB in-memory metadata from the newDD objects. The work is used to be performed by dict_load_* code to read metadata from InnoDB system tables, now with "InnoDB system table" being replaced by newDD system tables, the instantiation work would now be done through reading the newDD metadata objects. There are two scenario such instantiation happens: 1. Requested by ha_innobase::open for SQL operation 2. Requested by InnoDB purge thread, rollback etc. internal operation In short, this is the work mainly serverd as a translation layer that translates DD in-memory metadata to that of InnoDB.
1. Tables/indexes newly created continue to work after server reboot 2. Purge operation continues to work as expected, espcially during crash recovery 3. DML rollback continues to work as expected 4. Proper MDL lock are taken for all DML/DDL and background operation, and DML/DDL works as expected (except FK & partition)
Essentially, this work is to replace dict_table_open_* (on_id/on_name) with their newDD counterpart. It will replaces metadata instantiation (dict_load_table etc.) with newDD counterpart. InnoDB metadata structures (dict_table_t) are engine wide, this means they are used by multiple sessions, and live in memory for a longer period of time than their server counterpart. There are limited scenario the InnoDB metadata needs to be intantiated from scratch: 1) with ha_innobase::open() for fulfilling SQL query requests 2) directly table open for internal operation, such as purge, and crash recovery In other scenario, such as those used by rollback, the in-memory metadata should already be instantiated and only need to locate them from Hash structures. The normal procedure to open a table would be: 1) Check if the InnoDB table metadata is already cached, if so, use it 2) Check if MDL needs to be placed on the table, if so, request MDL on the table. 3) If dict_table_t is not found in-memory , fetch information from MySQL TABLE, TABLE_SHARE or dd::table objects and instantiate it (it used to be done through read InnoDB system tables). The last step is essentially a process of creating a set of in-memory metadata(for InnoDB) from another set of in-memory metadata (from server). It never directly pass from the new DD system tables. Some examples are: 1) InnoDB table row format is now getting from TABLE_SHARE::row_type and TABLE_SHARE::real_row_type (ROW_TYPE_DYNAMIC, ROW_TYPE_COMPRESSED etc.) 2) key_block_size is now getting from TABLE_SHARE::key_block_size 3) encryption is now getting from TABLE::encrypt_type 4) Index info from TABLE::key_info 5) InnoDB table ID from dd::Table::se_private_id() 6) temp table is from dd::Table::is_persistent() 7) Column info is from TABLE::field[] 8) Virtual column info would also extracted from TABLE::field[] 9) Index Id is from Index::se_private_data (DD_INDEX_ID) 10) Index spaceid is also from Index::se_private_data (DD_SPACE_ID) 11) Index root page is also from Index::se_private_data (DD_INDEX_ROOT) we will enumerate these mappings more in detail in LLD. Another notable difference with 5.7 is that InnoDB can request proper MDL lock when the table is opened directly from InnoDB. Shared MDL is taken by background operations such as purge when they opens a table. So with the MDL lock, we will no long need dict_operation_lock to synchronize work in purge and drop table. One exception is the SDI internal tables/indexes, they are InnoDB specific, so they are currently still uses dict_operation_lock. This worklog also makes virtual column/index work as expected. The Sys_virtuals table is no longer present. And all virtual column/index related information are parsed directly from server TABLE structure, when the table is opened in the first time.
A major observation on the project is that server/runtime has kept the TABLE and
TABLE_SHARE objects, so have InnoDB kept its dict_table_t and other dict_*_t
objects. The reason is mainly:
1. The project time constraint
2. Try to minimize the impact on other modules(such as optimizer) as much as
possible.
For server, new DD classes are added to accomodate new information to be stored
in this project, noteablely, the dd::Table, dd::Column etc. classes. In the
future, these objects could be merged with TABLE and TABLE_SHARE classes.
For this project, the main purpose is to translate the dictionary information
from server objects to InnoDB in-memory objects, so we will get information from
dd::Table, TABLE and TABLE_SHARE and then fill the dict_table_t etc.
Following is a simple list and examples on how we
extract each in memory objects from DD/server in-memory classes:
==================================================================
Section 1: Mapping between DD in-memory objects/funcs with InnoDB in-memory
structure members:
1) dict_table_t:
dict_table_t::id <== dd::Table::se_private_id()
dict_table_t::n_cols <== TABLE_SHARE::fields
dict_table_t::cols <== TABLE::field[]
dict_table_t::flags <== TABLE_SHARE::row_type, TABLE_SHARE::key_block_size and
TABLE_SHARE::key_block_size
dict_table_t::flags2
DICT_TF2_TEMPORARY <== dd::Table::is_persistent()
DICT_HAS_DOC_ID <== Look for FTS_DOC_ID in with dd_find_column() in
dd::TABLE, and check if it is hidden
DICT_TF2_FTS <== Look for Fulltext index in dd::TABLE (dd::Index::type() ==
dd::Index::IT_FULLTEXT)
DICT_TF2_DISCARD <== dd_part->table().options()dd::Index
DICT_TF2_ENCRYPTION <== dd_part->table().options()
dict_table_t::data_dir_path <== dd::Table::se_private_data (data_directory)
dict_table_t::space <== dd::Tablespace::se_praivate_data
dict_table_t::autoinc <== dd::Table::se_private_data (autoinc)
2) dict_index_t
dict_index_t::name <== TABLE::key_info::name
dict_index_t::type <== TABLE::key_info::flags
dict_index_t::id <== dd::Index::se_private_data() (DD_INDEX_ID)
dict_index_t::space <== dd::Tablespace::se_private_data() (DD_SPACE_ID)
dict_index_t::root <== dd::Index::se_private_data() (DD_INDEX_ROOT)
dict_index_t::merge_threshold <== TABLE_SHARE::comment::str()
dict_index_t::n_fields <== TABLE::key_info::user_defined_key_parts
dict_field_t::prefix_len <== TABLE::key_info::key_part::length,
TABLE::key_info::key_part::key_part_flag (HA_PART_KEY_SEG)
3) dict_col_t
dict_col_t::prtype <==> TABLE::field::type(), TABLE::field::real_maybe_null()
dict_col_t::len <==> TABLE::field::pack_length(), TABLE::field::length_bytes
dict_col_t::mtype <==> TABLE::field::type(), TABLE::field::real_type(),
TABLE::field::binary(), TABLE::field::flags
dict_v_col_t::base_col <=== TABLE::field::gcol_info
4) dict_foreign_t
dict_foreign_t::type <== dd::Foreign_key::update_rule(),
dd::Foreign_key::delete_rule()
dict_foreign_t::n_fields <== dd::Foreign_key::elements().size()
dict_foreign_t::referenced_col_names <==
dd::Foreign_key::elements::referenced_column_name()
dict_foreign_t::foreign_col_names <== dd::Foreign_key::elements::column()
dict_foreign_t::referenced_table_name <==
dd::Foreign_key::referenced_table_schema_name(),
dd::Foreign_key::referenced_table_name()
So as shown in those examples, the in-memory metadata info can be extracted from
TABLE, TABLE_SHARE or dd::Table, dd:Index objects.
==================================================================
Section 2: Steps for dd_table_open_on_[id,name]
As mentioned earlier in HLS, there are a couple of ways to open table,
1) ha_innobase::open for SQL queries
2) dd_open_table_on_[id,name] for internal operation, DDL, FTS tables and other
bk operations.
For 2), if it is dd_open_table_on_id, then we will use following APIs to fetch
the table name and open table
a) Get the table name by one of following, depending on if it is partition table
dd::cache::Dictionary_client::get_table_name_by_se_private_id()
dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id()
b) Once we have the name, call dd_mdl_acquire()-> dd::acquire_shared_table_mdl()
to acquire MDL on the table (note, to acquire dd::Table, tehre must be a MDL on
the table))
c) Then use dd::get_dd_client(thd)->acquire() to get dd::Table with table names.
d) Call dd_table_open_on_dd_obj() with dd::Table, in this function, TABLE and
TABLE_SHARE are also fetched with following APIs:
open_table_def() to get TABLE_SHARE
open_table_from_share() to get TABLE object.
Then with all three objects available (dd::Table, TABLE, TABLE_SHARE), we can
now get all information needed to fill dict_table_t and dict_index_t etc.
For dd_table_opne_on_name(), it will skip the step a) and directly go to step b)
and onwards.
==================================================================
Section 3: Mapping from InnoDB System Table to DD System table
Following is a list of where each InnoDB system table column end with newDD
system tables
1. InnoDB SYS Tables Mapping
TABLE: SYS_TABLES
1.1 SYS_TABLES::NAME
InnoDB: Stores database name & table name. For example like "test/t1"
New DD: mysql.tables.name
stores only table name ("t1"). Database name is stored in mysql.schemata.
Name is usually passed from server in the case of opening a table for SQL.
However, during DDL, queries involving FTS etc. and internal operations, we
might also need to open the table with ID. In these cases, APIs to fetch Table
Name by InnoDB Object ID (se_private_id):
dd::cache::Dictionary_client::get_table_name_by_se_private_id()
dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id()
And Vice Versa, to we can obtain a dd::Table using following API:
dd::cache::Dictionary_client::acquire()
1.2 SYS_TABLES::ID
InnoDB: known as InnoDB table id. Uniquely identifies a table. (Also used in
purge, import, etc)
New DD: mysql.tables.se_private_id
The se_private_id is obtained from dd::Table with following API:
dd::Table::se_private_id()
1.3 SYS_TABLES::N_COLS
InnoDB: Stores the number of cols in a table. Also encodes the virtual columns
31st bit of this field is used to determine ROW_FORMAT, 1 -COMPACT, 0 - REDUNDANT
New DD: Equivalent doesn't exist. Instead we have to iterate over mysql.columns
for a given mysql table id.
Obtained from TABLE_SHARE::fields
1.4 SYS_TABLES::TYPE
InnoDB: Stores the table flags.
Determines: Compact or redundnant, zip size (compressed page size),
atomic blobs (768 byte prefix in-page or not),
has_data_dir (DATA-DIRECTORY remote tablespace), has_shared_space (General
tablespace).
New DD:
a) row_type, Compact or redundant : mysql.table.row_format dd::Table::RF_DYNAMIC,
dd::Table::RF_COMPACT etc.
This is obtained from TABLE_SHARE::row_type
b) Zip size : mysql.table.options key_block_size
This is obtained from TABLE_SHARE::key_block_size
datadir : the exact data directory is stored in
dd::Table::se_private_data, shown as
mysql.tables.se_private_data.data_directory; if a partitioned table, it's in
dd::Partition::options
has_data_dir : removed
has_shared_space : FIL_TYPE_SHARED or FIL_TYPE_IMPLICIT,
1.5 SYS_TABLES::MIX_ID
InnoDB: Unused
1.6 SYS_TABLES::MIX_LEN
InnoDB: Stores DICT_TF2 flags
- temporary, has_doc_id, has_fts_index, use_file_per_table, fts aux format,
is_intrinsic, encryption
New DD:
is_temporary : removed, temporary table metadata will only be in cache,
not on system tables
has_doc_id : Not yet implemented
use_file_per_table : This now can be found in mysqql.tablespaces.name, if it's
innodb_file_per_table.x
fts_aux-format : Not yet implimented
is_intrinsic : removed
encryption : mysql.table.options encrypt_type,
mysql.tablespaces.se_private_data flags
1.7 SYS_TABLES::CLUSTER_NAME (unused)
1.8 SYS_TABLES::SPACE
InnoDB: table space id. The id of the tablespace where table resides.
New DD: m_tablespace (not the same as InnoDB tablespace id)
==================================================================
2. TABLE: SYS_COLUMNS
2.1 SYS_COLUMNS::TABLE_ID
InnoDB: table_id that uniquely identifies a InnoDB table
New DD: mysql.columns.table_id (but this is not InnoDB table_id). No such equivalent
exist and not necessary
2.2 SYS_COLUMNS::POS
InnoDB: ordinal position of a column in table, also encodes vcol
New DD: mysql.columns.oridinal_position
Vcol info: - virtual column number & virtual column sequence (the "nth" virtual
column) ?
2.3 SYS_COLUMNS::NAME
InnoDB: column name
New DD: mysql.columns.name
In-Memory: TABLE::field::name
2.4 SYS_COLUMNS::MTYPE
InnoDB: main type:1-varchar, 2- char, 3- fixbinary etc
New DD: mysql.columns.type
2.5 SYS_COLUMNS::PRTYPE
InnoDB: determines mysql data type,charset, nullability, precision
New DD: combination of mysql.columns.is_nullable, is_zerofill,
is_unsigned,numeric_precision etc
2.6 SYS_COLUMNS::LEN
InnoDB: col length, 4-int, 8-bigint, for multi-byte includes the charset len
(2*N, 3*N , 4*N etc)
New DD: mysql.columns.char_length
==================================================================
3. TABLE: SYS_INDEXES
3.1 SYS_INDEXES::TABLE_ID
InnoDB: table_id of the table where index belongs
New DD: mysql.indexes.table_id (this is not innodb table id)
3.2 SYS_INDEXES::ID
InnoDB: index_id that is unique within a tablespace
New DD: mysql.indexes.se_private_data: Ex: id=111;root=3;trx_id=1803;
3.3 SYS_INDEXES::NAME
InnoDB: PRIMARY, GEN_CLUST_INDEX, index name
New DD: mysql.indexes.name
3.4 SYS_INDEXES::N_FIELDS
InnoDB: Number of fields in the index (0 for GEN_CLUST_INDEX)
New DD: counting the number of rows from mysql.index_column_usage with specific
index id
Obtained from TABLE_SHARE::keys
3.5 SYS_INDEXES::TYPE
InnoDB: clustered, secondary, unique, primary, FTS, spatial, vcol
New DD: derive from mysql.indexes.type + mysql.indexes.algorithm
Obtained from TABLE::key_info::flags
3.6 SYS_INDEXES::SPACE
InnoDB: space_id where the index resides
New DD: Can be only found in the se_private_data of corresponding dd::Tablespace
3.7 SYS_INDEXES::PAGE_NO
InnoDB: Root page number of the index
New DD: Stored as part of se_private_data. id=111;root=3;trx_id=1803
3.8 SYS_INDEXES::MERGE_THRESHOLD
InnoDB: thresold for merging pages
New DD: mysql.indexes.comment (ex. MERGE_THRESHOLD=40)
==================================================================
4. TABLE: SYS_FIELDS
4.1 SYS_FIELDS::INDEX_ID
InnoDB: index_id of the index where this index field belongs to
New DD: mysql.index_column_usage.index_id (note, this is not innodb index_id)
4.2 SYS_FIELDS::POS
InnoDB: position of key field in index.
New DD: mysql.index_column_usage.ordinal_position
4.3 SYS_FIELDS::COL_NAME
InnoDB: field name
New DD: mysql.index_column_usage.column_id (which in turn can be used to get the
name from mysql.columns)
Note: mysql.index_column_usage.order stores the ASC/DESC property of index field
==================================================================
5. TABLE: SYS_TABLESPACES
5.1 SYS_TABLESPACES::SPACE
InnoDB: tablespace id (space_id)
New DD: mysql.tablespaces.se_private_data "id" is the space_id. (flags=353;id=5;)
Note: mysql.tablespaces.id is not InnoDB space_id.
5.2 SYS_TABLESPACES::NAME
InnoDB: Tablespace name
New DD: mysql.tablespaces.name. For implicit tablespaces (file-per-table
tablespaces): innodb_file_per_table.6
5.3 SYS_TABLESPACES::FLAGS
InnoDB: tablespace flags
New DD: mysql.tablesapces.se_private_data "flags" (flags=353;id=5;)
==================================================================
6. TABLE: SYS_DATAFILES
6.1 SYS_DATAFILES::SPACE
InnoDB: tablespace id
New DD: mysql.tablespace_files.tablespace_id (equivalent. This is not innodb
space_id)
The InnoDB tablespace id is stored in dd::Tablespace::se_private_data, shown as
mysql.tablespaces.se_private_data.
6.2 SYS_DATAFILES::PATH
InnoDB: path of the tablespace file
New DD: mysql.tablespace_files.file_name
==================================================================
7. TABLE: SYS_VIRTUAL
There is no corresponding system table for SYS_VIRTUAL in DD. Metadata are
parsed from mysql.columns.generation_expression. There is prebuilt::vcols and
prebuilt::bcols cache the indexed virtual column and base column. If properly
done, the relationship between virtual column and base column should be stored
in a system table after parsing, rather than keeping a generation clause.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.