WL#7276: Configuration data base for Optimizer Cost Model
Status: Complete
The current optimizer cost model uses "cost constants" that are hard-coded in the source code. These should be replaced by "cost constants" that can be changed without making changes to the source code and that can be adjusted by the server administrator. This worklog defines the tables that will be used for storing the "cost constants" to be used by the new cost model. These tables will be read and used by the server. The server administrator can adjust the behavior of the cost model by making changes to the configuration data stored in these tables. This worklog will define the tables that will be used for storing configuration data for the cost model and will implement the necessary functionality for creating these tables when configuring a new server. The implementation of how these tables will be read and used will be described in separate worklogs. User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/cost-model.html
This worklog adds two tables for storing "cost constants" for the optimizer cost model. F-1: After booting the server or upgrading from a previous version, two new table shall be present in the mysql database: server_cost engine_cost F-2: The tables shall be created in the InnoDB storage engine. F-3: The table definitions shall be as specified in this worklog. F-4: The content of the tables shall be as specified in this worklog. F-5: The tables shall have the same access permissions as other tables in the mysql database. F-6: Changes to the tables shall not be replicated. F-7: Changes to the tables shall have no impact on the server or how queries are optimized and executed until WL#7315 is implemented. F-8: The content of the text columns named "cost_name" and "engine_name" shall be case-insensitive. F-9: For the server_cost table it shall not be possible to enter multiple entries for the same cost constant. F-10: For the engine_cost table it shall not be possible to enter multiple entries for the same (cost_name, engine_name, device_type) combination. F-11: The column "last_update" shall be automatically updated to contain the time stamp for when the row last was changed.
REQUIREMENTS: ============= High-level requirements for configuration data for the optimizer cost model: 1. It must be easy to add new "cost contants" to the configuration tables when we extend or do changes to the cost model. 2. It must be possible for the server administrator to adjust the "cost constants". 3. It must be possible for the server administrator to remove adjustments done to "cost constants" and make the server run with default values. 4. Changing "cost constants" must be supported when upgrading to a newer version of the server. For cost model configuration data for storage engines there are some extra requirements: 5. It should be possible to adjust cost constants per storage engine (ie. different storage engines can have a different set of cost constant values). 6. It should be possible to support different storage device types, ie. have different cost constants for eg. SSD and HDD (it is unlikely that this will be supported in 5.7 due to neither storage engines nor dictionary have information about storage device types). 7. It should be possible to add storage engine specific cost constants. TABLES FOR STORING COST CONSTANTS: ================================== There will be two tables for storing values for "cost constants" used by the optimizer cost model. The first table will contain "cost constants" for server operations while the second will contain "cost constants" for storage engines. These tables will be stored in the "mysql" database. Table for storing "cost constants" for server operations: ========================================================= Table name: server_cost Table definition: Column name Data type ---------------------------------------- cost_name VARCHAR(64) NOT NULL cost_value FLOAT DEFAULT NULL last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP comment VARCHAR(1024) DEFAULT NULL Index on this table: PRIMARY KEY (cost_name) Description of the columns: * cost_name: The name of the "cost constant" used in the cost model. The name is case insensitive. This column will be declared to use a case insensitive collation (utf8_general_ci) to prevent that multiple entries that only differ by upper or lower case of the name are inserted. If a cost constant name is found that is not used by the server then a warning will be written to the error log (this will be implemented in WL#7315). * cost_value: If the server administrator wants to change the value of a cost constant, then she should update this field. If this column has a value, this value will be used by the cost model. Otherwise, the default value will be used. The default value will be defined in the source code and documented in the MySQL documentation. * last_update: This will contain the time for when this row was last updated. * comment: This can be used by the DBA to add a comment about why this "cost constant" has changed its value. The reason for having a default value in the source code and a cost_value in the configuration table, is that we will be able to change the default value from one release to another without over-writing any values changed by the server administrator. It will also make it easy for the server administrator to restore to the default value by just clearing the content of the cost_value column. Examples for what this table may contain: Initial version based on the cost constants currently used by the server: cost_name cost_value last_update comment -------------------------------------------------------------------- row_evaluate_cost NULL 2014-01-20 17:26:04 NULL key_compare_cost NULL 2014-01-20 17:26:04 NULL key_compare_cost_sort NULL 2014-01-20 17:26:04 NULL memory_temptable_create_cost NULL 2014-01-20 17:26:04 NULL memory_temptable_row_cost NULL 2014-01-20 17:26:04 NULL disk_temptable_create_cost NULL 2014-01-20 17:26:04 NULL disk_temptable_row_cost NULL 2014-01-20 17:26:04 NULL Table for storing "cost constants" for storage engine operations: ================================================================= Table name: engine_cost Table definition: Column name Data type ---------------------------------------- engine_name VARCHAR(64) NOT NULL device_type INTEGER NOT NULL cost_name VARCHAR(64) NOT NULL cost_value FLOAT DEFAULT NULL last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP comment VARCHAR(1024) DEFAULT NULL Description of the columns: * engine_name: The name of the storage engine that this "cost constant" should be used for. The engine_name must correspond to the internally used name for the storage engine (ie. it must be possible to look it up by using eg. ha_resolve_by_name()). If this column has the special value "default", this will be used for all storage engines that do not have their own entry in this table. * device_type: Specifies the device type that this cost constant should be used for. The purpose of this field is to be able to specify different cost constants for different storage device types, e.g. to have one set of cost constants for HDDs and one set for SSDs. It is unlikely that this will be supported in 5.7 given that there is currently no way to determine what kind of storage device a given table is stored on. One way to add support for this could be to add information about storage device to the data dictionary and extend each table space with information about what device type it is stored on. In 5.7 the only valid value in this field will be 0. * cost_name: see above. * cost_value: see above. * last_update: see above. * comment: see above. Index on this table: PRIMARY KEY (cost_name, engine_name, device_type) Examples for what this table may contain: Example 1: Initial version based on the cost constants currently used by the server (note that last_update and comment columns are not included in order to make each row fit on a single line): engine_name device_type cost_name cost_value ------------------------------------------------------------------------------ default 0 io_block_read_cost NULL Example 2: Different cost constants for different storage engines: engine_name device_type cost_name cost_value ------------------------------------------------------------------------------ default 0 io_block_read_cost NULL InnoDB 0 io_block_read_cost 0.9 MyISAM 0 io_block_read_cost 1.1 Here the cost model will use 0.9 as the cost for reading a disk block when doing cost estimates for InnoDB (since the server administrator has changed the value), 1.1 as the cost when using MyISAM, and the default value (as defined in the source code) for all other storage engines. Example 3: This is an example for how this table might look like when we have implemented the new cost model: engine_name device_type cost_name cost_value -------------------------------------------------------------------------- default 0 random_block_read_cost NULL default 0 sequential_block_read_cost NULL default 0 cpu_read_random_record_cost NULL default 0 cpu_read_next_record_cost NULL default 0 cpu_read_random_index_cost NULL default 0 cpu_read_next_index_cost NULL NDB 0 transfer_mb_cost NULL One thing to note in this example are that a new "cost constant" has been introduced for NDB. This "cost constant" will only be used by the cost estimation functions implemented by NDB. If the DBA has done measurements or otherwise determined that the performance is improved by adjusting the "cost constants" used by InnoDB and NDB, the content of this table might look like this after the DBA has adjusted the "cost constants": engine_name device_type cost_name cost_value -------------------------------------------------------------------------- default 0 random_block_read_cost NULL default 0 sequential_block_read_cost NULL default 0 cpu_read_random_record_cost NULL default 0 cpu_read_next_record_cost NULL default 0 cpu_read_random_index_cost NULL default 0 cpu_read_next_index_cost NULL InnoDB 0 random_block_read_cost 0.0008 InnoDB 0 sequential_block_read_cost 0.0002 InnoDB 0 cpu_read_random_record_cost 0.00001 InnoDB 0 cpu_read_next_record_cost 0.000005 InnoDB 0 cpu_read_random_index_cost 0.000005 InnoDB 0 cpu_read_next_index_cost 0.000001 NDB 0 cpu_read_random_record_cost 0.0006 NDB 0 cpu_read_next_record_cost 0.00005 NDB 0 transfer_mb_cost 0.00003 HANDLING OF DATA IN COST CONSTANT TABLES: ========================================= The following general rules will be used (and implemented by WL#7315) when interpreting the data in the "cost constant" tables: 1. All string values will be treated as case insensitive. If multiple entries are added that only differ by upper or lower case of a string, then one of them will be used. By using a case in-sensitive collation for the string columns and by including the columns in the primary key, this issue should mostly be avoided. 2. If cost constant names or engine names are not recognized by the server, a warning will be written to the error log. 3. If invalid cost values are supplied (e.g. negative costs), a warning will be written to the error log. CREATING THE COST CONSTANT TABLES: ================================== The cost constant tables will be created during bootstrap of the MySQL server. They will also be created during server upgrade if they do not already exist. The creating of the tables will be done by adding code to the mysql_system_tables.sql script. During upgrade: * if an entry does not exists (or have been deleted), the new entry will be added to the table. UPDATES TO THE COST CONSTANT TABLES: ==================================== The cost constant tables should normally only be updated by the server administrator. The new tables will be placed in the "mysql" database and they will have the same access rights as other tables found in this database. ALTERNATIVE: ============ The following alternative data model has been considered but was decided to not be implemented. For the part of the proposal above that concerns the cost of getting data read from the storage engine, all cost constants were related to (a) the storage engine and (b) the storage device type. So all databases and tables stored in a given storage engine on a specific device type will have the same cost constants. An alternative approach is to specify these cost constants per database and/or per table. Table for storing "cost constants" for IO for databases and tables: =================================================================== Table name: cost_config_tables Table definition: Column name Data type --------------------------------- table_schema VARCHAR(64) table_name VARCHAR(64) cost_name VARCHAR(64) NOT NULL value FLOAT NOT NULL Description of the columns: * table_schema: The name of the schema/database this cost constant should be used for. If this is NULL then this cost constant will be used as default value for all schema/databases unless overridden as described below. * table_name: Name of the table that this cost constant should be used for. If this is NULL, then this cost constant will be used for all tables in the specified database. If this is non-NULL then the database_name should also be non-NULL. * cost_name: see above. * value: The value for this cost constant. Index on this table: UNIQUE KEY (cost_name, database_name, table_name) Examples for what this table may contain: Example 1: Different cost constants for tables on SSD or in memory database_name table_name cost_name value ---------------------------------------------------- NULL NULL io_block_read_cost 1.0 db_ssd NULL io_block_read_cost 0.3 db_ssd table_mem io_block_read_cost 0.1 Here the default value for the cost of reading a random disk block is 1.0. The exception is for a database which the administrator has placed on an SSD and given a cost constant of 0.3 for the cost of reading a disk block. The administrator has also observed that one of the tables always are in memory and further reduced the cost to 0.1 for reading blocks from that particular table. Example 2: This is an example for how this table might look like when we have implemented the new cost model. database_name table_name cost_name value -------------------------------------------------------------- NULL NULL random_block_read_cost 0.001 NULL NULL sequential_block_read_cost 0.0001 NULL NULL cpu_read_random_record_cost 0.00001 NULL NULL cpu_read_next_record_cost 0.000005 NULL NULL cpu_read_random_index_cost 0.000005 NULL NULL cpu_read_next_index_cost 0.000001 db_ssd NULL random_block_read_cost 0.0003 db_ssd NULL sequential_block_read_cost 0.00008 db_ssd table_mem random_block_read_cost 0.0001 db_ssd table_mem sequential_block_read_cost 0.00005 In this example, the database "db_ssd" is placed on an SSD. The server administrator has adjusted the two cost constants that relates to reading data from disk accordingly. Similarly, since the table "table_mem" is mostly cached in memory, the server administrator has further reduced the cost constants for this table.
The following will be added to mysql_system_tables.sql in order to create and insert data into the configuration tables during initial server boot and upgrade: -- -- Optimizer Cost Model configuration -- -- Server cost constants CREATE TABLE IF NOT EXISTS server_cost ( cost_name VARCHAR(64) NOT NULL COLLATE utf8_general_ci, cost_value FLOAT DEFAULT NULL, last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, comment VARCHAR(1024) DEFAULT NULL, PRIMARY KEY (cost_name) ) ENGINE=InnoDB CHARACTER SET=utf8 COLLATE=utf8_general_ci STATS_PERSISTENT=0; INSERT IGNORE INTO server_cost VALUES ("row_evaluate_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("key_compare_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("key_compare_cost_sort", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("memory_temptable_create_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("memory_temptable_row_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("disk_temptable_create_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); INSERT IGNORE INTO server_cost VALUES ("disk_temptable_row_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); -- Engine cost constants CREATE TABLE IF NOT EXISTS engine_cost ( engine_name VARCHAR(64) NOT NULL COLLATE utf8_general_ci, device_type INTEGER NOT NULL, cost_name VARCHAR(64) NOT NULL COLLATE utf8_general_ci, cost_value FLOAT DEFAULT NULL, last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, comment VARCHAR(1024) DEFAULT NULL, PRIMARY KEY (cost_name, engine_name, device_type) ) ENGINE=InnoDB CHARACTER SET=utf8 COLLATE=utf8_general_ci STATS_PERSISTENT=0; INSERT IGNORE INTO engine_cost VALUES ("default", 0, "io_block_read_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); --------------------------------------------------------------------- See also prototype patch sent to the commit list on January 27 15:53.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.