WL#9534: InnoDB_New_DD: Instantiate InnoDB in-memory metadata with newDD objects
Affects: Server-8.0
—
Status: Complete
This worklog is to instantiate InnoDB in-memory metadata from the newDD objects. The work is used to be performed by dict_load_* code to read metadata from InnoDB system tables, now with "InnoDB system table" being replaced by newDD system tables, the instantiation work would now be done through reading the newDD metadata objects. There are two scenario such instantiation happens: 1. Requested by ha_innobase::open for SQL operation 2. Requested by InnoDB purge thread, rollback etc. internal operation In short, this is the work mainly serverd as a translation layer that translates DD in-memory metadata to that of InnoDB.
1. Tables/indexes newly created continue to work after server reboot 2. Purge operation continues to work as expected, espcially during crash recovery 3. DML rollback continues to work as expected 4. Proper MDL lock are taken for all DML/DDL and background operation, and DML/DDL works as expected (except FK & partition)
Essentially, this work is to replace dict_table_open_* (on_id/on_name) with their newDD counterpart. It will replaces metadata instantiation (dict_load_table etc.) with newDD counterpart. InnoDB metadata structures (dict_table_t) are engine wide, this means they are used by multiple sessions, and live in memory for a longer period of time than their server counterpart. There are limited scenario the InnoDB metadata needs to be intantiated from scratch: 1) with ha_innobase::open() for fulfilling SQL query requests 2) directly table open for internal operation, such as purge, and crash recovery In other scenario, such as those used by rollback, the in-memory metadata should already be instantiated and only need to locate them from Hash structures. The normal procedure to open a table would be: 1) Check if the InnoDB table metadata is already cached, if so, use it 2) Check if MDL needs to be placed on the table, if so, request MDL on the table. 3) If dict_table_t is not found in-memory , fetch information from MySQL TABLE, TABLE_SHARE or dd::table objects and instantiate it (it used to be done through read InnoDB system tables). The last step is essentially a process of creating a set of in-memory metadata(for InnoDB) from another set of in-memory metadata (from server). It never directly pass from the new DD system tables. Some examples are: 1) InnoDB table row format is now getting from TABLE_SHARE::row_type and TABLE_SHARE::real_row_type (ROW_TYPE_DYNAMIC, ROW_TYPE_COMPRESSED etc.) 2) key_block_size is now getting from TABLE_SHARE::key_block_size 3) encryption is now getting from TABLE::encrypt_type 4) Index info from TABLE::key_info 5) InnoDB table ID from dd::Table::se_private_id() 6) temp table is from dd::Table::is_persistent() 7) Column info is from TABLE::field[] 8) Virtual column info would also extracted from TABLE::field[] 9) Index Id is from Index::se_private_data (DD_INDEX_ID) 10) Index spaceid is also from Index::se_private_data (DD_SPACE_ID) 11) Index root page is also from Index::se_private_data (DD_INDEX_ROOT) we will enumerate these mappings more in detail in LLD. Another notable difference with 5.7 is that InnoDB can request proper MDL lock when the table is opened directly from InnoDB. Shared MDL is taken by background operations such as purge when they opens a table. So with the MDL lock, we will no long need dict_operation_lock to synchronize work in purge and drop table. One exception is the SDI internal tables/indexes, they are InnoDB specific, so they are currently still uses dict_operation_lock. This worklog also makes virtual column/index work as expected. The Sys_virtuals table is no longer present. And all virtual column/index related information are parsed directly from server TABLE structure, when the table is opened in the first time.
A major observation on the project is that server/runtime has kept the TABLE and TABLE_SHARE objects, so have InnoDB kept its dict_table_t and other dict_*_t objects. The reason is mainly: 1. The project time constraint 2. Try to minimize the impact on other modules(such as optimizer) as much as possible. For server, new DD classes are added to accomodate new information to be stored in this project, noteablely, the dd::Table, dd::Column etc. classes. In the future, these objects could be merged with TABLE and TABLE_SHARE classes. For this project, the main purpose is to translate the dictionary information from server objects to InnoDB in-memory objects, so we will get information from dd::Table, TABLE and TABLE_SHARE and then fill the dict_table_t etc. Following is a simple list and examples on how we extract each in memory objects from DD/server in-memory classes: ================================================================== Section 1: Mapping between DD in-memory objects/funcs with InnoDB in-memory structure members: 1) dict_table_t: dict_table_t::id <== dd::Table::se_private_id() dict_table_t::n_cols <== TABLE_SHARE::fields dict_table_t::cols <== TABLE::field[] dict_table_t::flags <== TABLE_SHARE::row_type, TABLE_SHARE::key_block_size and TABLE_SHARE::key_block_size dict_table_t::flags2 DICT_TF2_TEMPORARY <== dd::Table::is_persistent() DICT_HAS_DOC_ID <== Look for FTS_DOC_ID in with dd_find_column() in dd::TABLE, and check if it is hidden DICT_TF2_FTS <== Look for Fulltext index in dd::TABLE (dd::Index::type() == dd::Index::IT_FULLTEXT) DICT_TF2_DISCARD <== dd_part->table().options()dd::Index DICT_TF2_ENCRYPTION <== dd_part->table().options() dict_table_t::data_dir_path <== dd::Table::se_private_data (data_directory) dict_table_t::space <== dd::Tablespace::se_praivate_data dict_table_t::autoinc <== dd::Table::se_private_data (autoinc) 2) dict_index_t dict_index_t::name <== TABLE::key_info::name dict_index_t::type <== TABLE::key_info::flags dict_index_t::id <== dd::Index::se_private_data() (DD_INDEX_ID) dict_index_t::space <== dd::Tablespace::se_private_data() (DD_SPACE_ID) dict_index_t::root <== dd::Index::se_private_data() (DD_INDEX_ROOT) dict_index_t::merge_threshold <== TABLE_SHARE::comment::str() dict_index_t::n_fields <== TABLE::key_info::user_defined_key_parts dict_field_t::prefix_len <== TABLE::key_info::key_part::length, TABLE::key_info::key_part::key_part_flag (HA_PART_KEY_SEG) 3) dict_col_t dict_col_t::prtype <==> TABLE::field::type(), TABLE::field::real_maybe_null() dict_col_t::len <==> TABLE::field::pack_length(), TABLE::field::length_bytes dict_col_t::mtype <==> TABLE::field::type(), TABLE::field::real_type(), TABLE::field::binary(), TABLE::field::flags dict_v_col_t::base_col <=== TABLE::field::gcol_info 4) dict_foreign_t dict_foreign_t::type <== dd::Foreign_key::update_rule(), dd::Foreign_key::delete_rule() dict_foreign_t::n_fields <== dd::Foreign_key::elements().size() dict_foreign_t::referenced_col_names <== dd::Foreign_key::elements::referenced_column_name() dict_foreign_t::foreign_col_names <== dd::Foreign_key::elements::column() dict_foreign_t::referenced_table_name <== dd::Foreign_key::referenced_table_schema_name(), dd::Foreign_key::referenced_table_name() So as shown in those examples, the in-memory metadata info can be extracted from TABLE, TABLE_SHARE or dd::Table, dd:Index objects. ================================================================== Section 2: Steps for dd_table_open_on_[id,name] As mentioned earlier in HLS, there are a couple of ways to open table, 1) ha_innobase::open for SQL queries 2) dd_open_table_on_[id,name] for internal operation, DDL, FTS tables and other bk operations. For 2), if it is dd_open_table_on_id, then we will use following APIs to fetch the table name and open table a) Get the table name by one of following, depending on if it is partition table dd::cache::Dictionary_client::get_table_name_by_se_private_id() dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id() b) Once we have the name, call dd_mdl_acquire()-> dd::acquire_shared_table_mdl() to acquire MDL on the table (note, to acquire dd::Table, tehre must be a MDL on the table)) c) Then use dd::get_dd_client(thd)->acquire() to get dd::Table with table names. d) Call dd_table_open_on_dd_obj() with dd::Table, in this function, TABLE and TABLE_SHARE are also fetched with following APIs: open_table_def() to get TABLE_SHARE open_table_from_share() to get TABLE object. Then with all three objects available (dd::Table, TABLE, TABLE_SHARE), we can now get all information needed to fill dict_table_t and dict_index_t etc. For dd_table_opne_on_name(), it will skip the step a) and directly go to step b) and onwards. ================================================================== Section 3: Mapping from InnoDB System Table to DD System table Following is a list of where each InnoDB system table column end with newDD system tables 1. InnoDB SYS Tables Mapping TABLE: SYS_TABLES 1.1 SYS_TABLES::NAME InnoDB: Stores database name & table name. For example like "test/t1" New DD: mysql.tables.name stores only table name ("t1"). Database name is stored in mysql.schemata. Name is usually passed from server in the case of opening a table for SQL. However, during DDL, queries involving FTS etc. and internal operations, we might also need to open the table with ID. In these cases, APIs to fetch Table Name by InnoDB Object ID (se_private_id): dd::cache::Dictionary_client::get_table_name_by_se_private_id() dd::cache::Dictionary_client::get_table_name_by_partition_se_private_id() And Vice Versa, to we can obtain a dd::Table using following API: dd::cache::Dictionary_client::acquire() 1.2 SYS_TABLES::ID InnoDB: known as InnoDB table id. Uniquely identifies a table. (Also used in purge, import, etc) New DD: mysql.tables.se_private_id The se_private_id is obtained from dd::Table with following API: dd::Table::se_private_id() 1.3 SYS_TABLES::N_COLS InnoDB: Stores the number of cols in a table. Also encodes the virtual columns 31st bit of this field is used to determine ROW_FORMAT, 1 -COMPACT, 0 - REDUNDANT New DD: Equivalent doesn't exist. Instead we have to iterate over mysql.columns for a given mysql table id. Obtained from TABLE_SHARE::fields 1.4 SYS_TABLES::TYPE InnoDB: Stores the table flags. Determines: Compact or redundnant, zip size (compressed page size), atomic blobs (768 byte prefix in-page or not), has_data_dir (DATA-DIRECTORY remote tablespace), has_shared_space (General tablespace). New DD: a) row_type, Compact or redundant : mysql.table.row_format dd::Table::RF_DYNAMIC, dd::Table::RF_COMPACT etc. This is obtained from TABLE_SHARE::row_type b) Zip size : mysql.table.options key_block_size This is obtained from TABLE_SHARE::key_block_size datadir : the exact data directory is stored in dd::Table::se_private_data, shown as mysql.tables.se_private_data.data_directory; if a partitioned table, it's in dd::Partition::options has_data_dir : removed has_shared_space : FIL_TYPE_SHARED or FIL_TYPE_IMPLICIT, 1.5 SYS_TABLES::MIX_ID InnoDB: Unused 1.6 SYS_TABLES::MIX_LEN InnoDB: Stores DICT_TF2 flags - temporary, has_doc_id, has_fts_index, use_file_per_table, fts aux format, is_intrinsic, encryption New DD: is_temporary : removed, temporary table metadata will only be in cache, not on system tables has_doc_id : Not yet implemented use_file_per_table : This now can be found in mysqql.tablespaces.name, if it's innodb_file_per_table.x fts_aux-format : Not yet implimented is_intrinsic : removed encryption : mysql.table.options encrypt_type, mysql.tablespaces.se_private_data flags 1.7 SYS_TABLES::CLUSTER_NAME (unused) 1.8 SYS_TABLES::SPACE InnoDB: table space id. The id of the tablespace where table resides. New DD: m_tablespace (not the same as InnoDB tablespace id) ================================================================== 2. TABLE: SYS_COLUMNS 2.1 SYS_COLUMNS::TABLE_ID InnoDB: table_id that uniquely identifies a InnoDB table New DD: mysql.columns.table_id (but this is not InnoDB table_id). No such equivalent exist and not necessary 2.2 SYS_COLUMNS::POS InnoDB: ordinal position of a column in table, also encodes vcol New DD: mysql.columns.oridinal_position Vcol info: - virtual column number & virtual column sequence (the "nth" virtual column) ? 2.3 SYS_COLUMNS::NAME InnoDB: column name New DD: mysql.columns.name In-Memory: TABLE::field::name 2.4 SYS_COLUMNS::MTYPE InnoDB: main type:1-varchar, 2- char, 3- fixbinary etc New DD: mysql.columns.type 2.5 SYS_COLUMNS::PRTYPE InnoDB: determines mysql data type,charset, nullability, precision New DD: combination of mysql.columns.is_nullable, is_zerofill, is_unsigned,numeric_precision etc 2.6 SYS_COLUMNS::LEN InnoDB: col length, 4-int, 8-bigint, for multi-byte includes the charset len (2*N, 3*N , 4*N etc) New DD: mysql.columns.char_length ================================================================== 3. TABLE: SYS_INDEXES 3.1 SYS_INDEXES::TABLE_ID InnoDB: table_id of the table where index belongs New DD: mysql.indexes.table_id (this is not innodb table id) 3.2 SYS_INDEXES::ID InnoDB: index_id that is unique within a tablespace New DD: mysql.indexes.se_private_data: Ex: id=111;root=3;trx_id=1803; 3.3 SYS_INDEXES::NAME InnoDB: PRIMARY, GEN_CLUST_INDEX, index name New DD: mysql.indexes.name 3.4 SYS_INDEXES::N_FIELDS InnoDB: Number of fields in the index (0 for GEN_CLUST_INDEX) New DD: counting the number of rows from mysql.index_column_usage with specific index id Obtained from TABLE_SHARE::keys 3.5 SYS_INDEXES::TYPE InnoDB: clustered, secondary, unique, primary, FTS, spatial, vcol New DD: derive from mysql.indexes.type + mysql.indexes.algorithm Obtained from TABLE::key_info::flags 3.6 SYS_INDEXES::SPACE InnoDB: space_id where the index resides New DD: Can be only found in the se_private_data of corresponding dd::Tablespace 3.7 SYS_INDEXES::PAGE_NO InnoDB: Root page number of the index New DD: Stored as part of se_private_data. id=111;root=3;trx_id=1803 3.8 SYS_INDEXES::MERGE_THRESHOLD InnoDB: thresold for merging pages New DD: mysql.indexes.comment (ex. MERGE_THRESHOLD=40) ================================================================== 4. TABLE: SYS_FIELDS 4.1 SYS_FIELDS::INDEX_ID InnoDB: index_id of the index where this index field belongs to New DD: mysql.index_column_usage.index_id (note, this is not innodb index_id) 4.2 SYS_FIELDS::POS InnoDB: position of key field in index. New DD: mysql.index_column_usage.ordinal_position 4.3 SYS_FIELDS::COL_NAME InnoDB: field name New DD: mysql.index_column_usage.column_id (which in turn can be used to get the name from mysql.columns) Note: mysql.index_column_usage.order stores the ASC/DESC property of index field ================================================================== 5. TABLE: SYS_TABLESPACES 5.1 SYS_TABLESPACES::SPACE InnoDB: tablespace id (space_id) New DD: mysql.tablespaces.se_private_data "id" is the space_id. (flags=353;id=5;) Note: mysql.tablespaces.id is not InnoDB space_id. 5.2 SYS_TABLESPACES::NAME InnoDB: Tablespace name New DD: mysql.tablespaces.name. For implicit tablespaces (file-per-table tablespaces): innodb_file_per_table.6 5.3 SYS_TABLESPACES::FLAGS InnoDB: tablespace flags New DD: mysql.tablesapces.se_private_data "flags" (flags=353;id=5;) ================================================================== 6. TABLE: SYS_DATAFILES 6.1 SYS_DATAFILES::SPACE InnoDB: tablespace id New DD: mysql.tablespace_files.tablespace_id (equivalent. This is not innodb space_id) The InnoDB tablespace id is stored in dd::Tablespace::se_private_data, shown as mysql.tablespaces.se_private_data. 6.2 SYS_DATAFILES::PATH InnoDB: path of the tablespace file New DD: mysql.tablespace_files.file_name ================================================================== 7. TABLE: SYS_VIRTUAL There is no corresponding system table for SYS_VIRTUAL in DD. Metadata are parsed from mysql.columns.generation_expression. There is prebuilt::vcols and prebuilt::bcols cache the indexed virtual column and base column. If properly done, the relationship between virtual column and base column should be stored in a system table after parsing, rather than keeping a generation clause.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.