WL#3732: Information schema optimization
Affects: Server-5.5
—
Status: Complete
The current implementation of I_S retrieves the contents of several of its tables directly from FRM, MYD, and MYI database files and the file system itself. These tables are: COLUMNS, KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS, STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS and VIEWS. In order to retrieve the contents of any of the tables, the function sql_show.cc:get_all_tables() needs to perform many my_dir() and open_table() calls that require costly disc access. As a result the current implementation results in orders of magnitude slower query times over these I_S tables compared to accessing regular tables. The goal of this task is to improve query times over the tables above by reducing the accesses to FRM, MYD, and MYI files and the file system to the possible minimum. This is needed to help solve BUG#19588.
The information schema tables are internally represented as temporary tables that are filled at query execution time. Those I_S tables whose data is retrieved from FRM, MYD, and MYI files are filled by the function get_all_tables(). The function get_all_tables() is called from 'get_schema_tables_result()' which can be called in two places: JOIN::exec() or create_sort_index(). Here is simplified schema of this function: int get_all_tables(THD *thd, TABLE_LIST *tables, COND *cond) { ... get_list_of_database_names(); while ((db_name= it_db_names++)) { ... get_list_of_table_names_in_the_database(); while ((table_name= it_table_names++)) { .... open_table() store_data_into_I_S_table() } } } Functions get_list_of_database_names() & get_list_of_table_names_in_the_database() use find_files() function which uses my_dir() function. Function 'find_files()' retrieves list of database name from mysql system directory or list of table_names from database directory depending on parameters. There are several problems which make described above I_S tables slow: 1. The process of obtaining of db names & table names list. my_dir() function is used for this process. The function scans mysql system directory or database directory and creates list of database names or file names It's slow because it uses disk access operations. 2. Function open_table() is slow too. This function read and parses FRM & MYD & MYI files. This is a very slow. We need to do following improvements to speed up slow I_S tables: 1. Try to avoid the use of the my_dir() function. We can obtain some data(db name & table name) from 'where' condition(COND *cond). If the query supplies constant(s) for the database and/or table name(s), we directly try to access them, and if no such database(s) and/or table(s) exist, open_table()|open_frm() returns an error, which we map to FALSE for the corresponding predicate. Example1: select table_name from information_schema.tables where table_schema='test'; Existing situation: For example we have 5 databases & 20 tables in each. We have to perform 1(list of db names)+5(list of table names for each db) calls of my_dir(); But we can get db_name 'test' from where condition and we have to perform only one my_dir() call (to get table names from 'test' db); Example2: select table_name from information_schema.tables where table_schema='test' and table_name='t1'; We can get table_schema='test' and table_name='t1' and perform further operations only for 'test/t1.frm' table. Count of my_dir() call is 0 in this case. See 'modified schema of get_all_tables() function' below. 2. Try to avoid the use of open_table() function We don't need to execute open_table() at all for some queries Example: select table_name from information_schema.tables; The values of 'table_name' are already known after my_dir() function call. So we can skip open_table() call. See 'modified schema of get_all_tables() function' below. 3. In some cases it's enough to open FRM file only. Example: select table_name, table_type from information_schema.tables where table_schema='test'; we get 'table_name' during get_list_of_table_names_in_the_database() call and we can get 'table_type' from FRM file. So we don't need to touch MYD and MYI files. See 'modified schema of get_all_tables() function' below. 4. We don't need to open views for I_S tables: KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS, STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS. See 'modified schema of get_all_tables() function' below. 5. We don't need to open tables for I_S table VIEW. See 'modified schema of get_all_tables() function' below. 6. Use LEX_STRING instead of char *db_name, *table_name to avoid strlen(). Here is modified schema of get_all_tables() function: get_all_tables() { /* Calculate lookup values form 'where' cond */ get_lookup_field_values(thd, cond, tables, &lookup_field_vals); /* Check which I_S table fields are used in 'select' and set table_open_method depending on these fields. */ get_table_open_method(TABLE_LIST *tables, ST_SCHEMA_TABLE *schema_table, enum enum_schema_tables schema_table_idx); /* Create db names list. if we have database lookup value then we just add it to list and skip my_dir() function call. */ make_db_list(lookup_field_vals) // see schema of this function below while ((db_name= it_db_names++)) { ... /* Create table names list. if we have table lookup value then we just add it to list and skip find_files() function call. */ make_table_list(lookup_field_vals, db_name); // see schema below while ((table_name= it_table_names++)) { .... /* Open table depending on 'table_open_method' SKIP_OPEN_TABLE - do not need to open table OPEN_FRM_ONLY - open FRM file only OPEN_FULL_TABLE - open FRM, MYD, MYI */ if (SKIP_OPEN_TABLE) store_data_into_I_S_table() else if (OPEN_FRM_ONLY) { open_frm_only(); } else { if (I_S table is one of KEY_COLUMN_USAGE, PARTITIONS, REFERENTIAL_CONSTRAINTS, STATISTICS, TABLES, TABLE_CONSTRAINTS, TRIGGERS) open_table_only(); else if (I_S table is VIEW) open_view_only(); else open_table(); } store_data_into_I_S_table() } } } The schema of make_[db]|[table]_list() function: make_[db]|[table]_list() { if (lookup value) { add lookup value into list; return } else { find_files(); } } The schema of get_lookup_field_values() function: get_lookup_field_values(...) { if (SHOW command) get wild str values for database|table names from LEX else // 'select from I_S table' calculate lookup values from 'WHERE' condition }
table.h 1. add new constants(open_method) #define SKIP_OPEN_TABLE 0 #define OPEN_FRM_ONLY 1 #define OPEN_FULL_TABLE 2 2. Add new field 'open_method' into struct st_field_info; uint open_method; 3. Add new field into ST_SCHEMA_TABLE struct uint i_s_requested_object; /* the object we need to open(TABLE | VIEW) */. 4. Add new field to TABLE_LIST struct. uint i_s_requested_object; This field is set from ST_SCHEMA_TABLE.i_s_requested_object for processed table before opening. 5. Add new field i_s_table_state to TABLE_LIST struct. uint i_s_table_state[3]; this field is used for 'explain select' from I_S tablese sql_show.cc 1. Add initialization of 'open_method' to 'st_field_info' structs. Example: ST_FIELD_INFO tables_fields_info[]= { {"TABLE_CATALOG", FN_REFLEN, MYSQL_TYPE_STRING, 0, 1, 0, SKIP_OPEN_TABLE}, {"TABLE_SCHEMA",NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE}, {"TABLE_NAME", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, "Name", SKIP_OPEN_TABLE}, {"TABLE_TYPE", NAME_LEN, MYSQL_TYPE_STRING, 0, 0, 0, OPEN_FRM_ONLY}, {"ENGINE", NAME_LEN, MYSQL_TYPE_STRING, 0, 1, "Engine", OPEN_FRM_ONLY}, {"VERSION", 21 , MYSQL_TYPE_LONG, 0, 1, "Version", OPEN_FRM_ONLY}, {"ROW_FORMAT", 10, MYSQL_TYPE_STRING, 0, 1, "Row_format", OPEN_FULL_TABLE}, {"TABLE_ROWS", 21 , MYSQL_TYPE_LONG, 0, 1, "Rows", OPEN_FULL_TABLE}, {"AVG_ROW_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Avg_row_length", OPEN_FULL_TABLE}, {"DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_length", OPEN_FULL_TABLE}, {"MAX_DATA_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Max_data_length", OPEN_FULL_TABLE}, {"INDEX_LENGTH", 21 , MYSQL_TYPE_LONG, 0, 1, "Index_length", OPEN_FULL_TABLE}, {"DATA_FREE", 21 , MYSQL_TYPE_LONG, 0, 1, "Data_free", OPEN_FULL_TABLE}, {"AUTO_INCREMENT", 21 , MYSQL_TYPE_LONG, 0, 1, "Auto_increment", OPEN_FULL_TABLE}, {"CREATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Create_time", OPEN_FULL_TABLE}, {"UPDATE_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Update_time", OPEN_FULL_TABLE}, {"CHECK_TIME", 0, MYSQL_TYPE_TIMESTAMP, 0, 1, "Check_time", OPEN_FULL_TABLE}, {"TABLE_COLLATION", 64, MYSQL_TYPE_STRING, 0, 1, "Collation", OPEN_FRM_ONLY}, {"CHECKSUM", 21 , MYSQL_TYPE_LONG, 0, 1, "Checksum", OPEN_FULL_TABLE}, {"CREATE_OPTIONS", 255, MYSQL_TYPE_STRING, 0, 1, "Create_options", OPEN_FRM_ONLY}, {"TABLE_COMMENT", 80, MYSQL_TYPE_STRING, 0, 0, "Comment", OPEN_FULL_TABLE}, {0, 0, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE} }; 2. Add initialization of 'i_s_requested_object' to 'ST_SCHEMA_TABLE' structs. Example: ST_SCHEMA_TABLE schema_tables[]= { .... {"VIEWS", view_fields_info, create_schema_table, get_all_tables, 0, get_schema_views_record, 1, 2, 0, OPEN_VIEW_ONLY}, .... } 3. Modify LOOKUP_FIELD_VALUES struct /* values which are extracted from where condition or values from LEX struct for some 'SHOW' commands */ typedef struct st_lookup_field_values { LEX_STRING db_value, table_value; bool wild_db_value, wild_table_value; } LOOKUP_FIELD_VALUES; 4. New function which calculates database name and table name values from 'where' condition if it's possible void get_lookup_field_values(THD *thd, COND *cond, TABLE_LIST *table, LOOKUP_FIELD_VALUES *lookup_field_vals); 5. New function which set table open method setup_table_open_method(TABLE_LIST *tables, ST_SCHEMA_TABLE *schema_table, enum enum_schema_tables schema_table_idx) 6. New function int make_db_list(THD *thd, List*files, LOOKUP_FIELD_VALUES *lookup_field_vals, bool *with_i_schema, bool is_wild_value) see 'schema of make_[db]|[table]_list()' 7. New function int make_table_name_list(THD *thd, List *files, LEX *lex, LOOKUP_FIELD_VALUES *lookup_field_vals, bool with_i_schema, LEX_STRING *db_name) see 'schema of make_[db]|[table]_list()' 8. Modify 'get_all_tables' function according to new schema. sql_base.cc 1. Modify functions which are used during open table process according to table opening method and requested_object. sql_select.cc 1. Add support for I_S tables into select_describe() function
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.