WL#7182: Optimizer Cost Model API
Status: Complete
This worklog will introduce a cost model API and a cost model module to be used by the optimizer. The main motivation for doing this is: * remove use of hard coded cost constants from the optimizer and handler code. The existing cost constants will after this worklog be moved inside the cost module and only accessible through the new API. * move common parts of cost calculations from the server and handler code into the new cost module. User Documentation ================== No user documentation required.
This worklog should not introduce any functional changes. It is mostly about moving hard coded cost constants and some of the current cost calculations from the optimizer and handler code into a separate module with a new API. Due to the use of floating point arithmetic in cost calculations minor rounding errors might cause changes to query plans in the case where alternative query plans have almost identical cost estimates.
Overview of the new API for cost estimates ========================================== A high level presentation of the cost estimate API is given in WL#6564 together with a figure that shows how this API fits into the overall task of refactoring cost estimation code and make the hard coded cost constants configurable. The new cost estimate API consists of two new classes, Cost_model_server and Cost_model_table. These classes will be used by the optimizer, handler and storage engines to get cost estimates for basic server and storage engine operations. The initial implementation of these classes will mostly provide functions for getting the "cost constants" for the existing basic operations. These functions will replace the use of the currently hard coded cost constants. Some of the current cost calculations are also moved from the optimizer and handler code into these new classes. This will allow us to support multiple valid versions of the cost constants at the same time and allow for changing cost constants on a running server without influence on currently optimized queries. Later re-factoring and extensions of the cost model may move more parts of the cost estimation code from the optimizer source code into these classes. Thus, it should be expected that these classes will be extended with more functionality. 1. Cost estimates for server operations ======================================= The Cost_model_server class will be used for providing cost estimates for operations that are not dependent on a particular table. For each query there will be one object of this class. This object will be created as part of the THD and initialized each time a new query starts. The THD object will provide access to this object by a new function: const Cost_model_server* THD::cost_model() const; The reason for letting each THD have its own object of this class is to let different session run with different cost constants. This allows the server's cost constants to be changed online without influencing on on-going optimizations. The initial version of this class as implemented in this worklog will provide the values for the existing cost constants and cost functions for using temporary tables. Interface: ========== The initial implementation of the Cost_model_server class will have the following public functions: class Cost_model_server: /** Temporary table types that the cost model differentiate between. */ enum enum_tmptable_type { MEMORY_TMPTABLE, DISK_TMPTABLE }; /** Initialize the cost model object for a query. This function must be called before calling any cost estimation functions for a query. It should also be called when starting optimization of a new query in case any cost estimate constants have changed. */ void init(); /** Cost of processing a number of records and evaluating the query condition on the records. @param rows number of rows to evaluate @return Cost of evaluating the records */ double row_evaluate_cost(double rows) const; /** Cost of doing a number of key compare operations. @param keys number of key compare operations @return Cost of comparing the keys */ double key_compare_cost(double keys) const; /** Cost estimate for creating a temporary table. @param tmptable_type storage type for the temporary table @return Cost estimate */ double tmptable_create_cost(enum_tmptable_type tmptable_type) const; /** Cost estimate for inserting and reading records from a temporary table. @param tmptable_type storage type for the temporary table @param write_rows number of rows that will be written to table @param read_rows number of rows that will be read from table @return The estimated cost */ double tmptable_readwrite_cost(enum_tmptable_type tmptable_type, double write_rows, double read_rows) const; 2. Cost estimates for operations on table data ============================================== Some cost estimates will depend which table a particular operation is performed on. This will mainly be cost estimates that depends the table definition (eg. data type for columns or record size) or which storage engine or storage device type the table is stored on. The Cost_model_table class will provide these cost estimates. Each TABLE object will have a corresponding Cost_estimate_table object. This object will be available from the TABLE object by a new function: Cost_model_table* TABLE::cost_model() const; The Cost_model_table object must be initialized with a pointer to the query's Cost_model_server object since it will do calls into the Cost_model_server object for some of the "cost constants". The initial version of this class will provide "cost constants" and cost estimates for IO related functions. It will also provide wrapper functions for the "cost constants" that the Cost_table_server class implements. The reason for this is that in the handler, storage engines and in some of the lower parts of the optimizer, the THD object is not directly available. Interface: ========== The initial implementation of the Cost_model_table class will have the following public functions: class Cost_model_table: /** Initializes the cost model object. This function must be called before calling any cost estimation functions for a query. It should also be called when starting optimization of a new query in case any cost estimate constants have changed. @param cost_model_server the main cost model object for this query */ void init(const Cost_model_server *cost_model_server); /** Cost of processing a number of records and evaluating the query condition on the records. @param rows number of rows to evaluate @return Cost of evaluating the records */ double row_evaluate_cost(double rows) const; /** Cost of doing a number of key compare operations. @param keys number of key compare operations @return Cost of comparing the keys */ double key_compare_cost(double keys) const; /** Cost of reading a random block from disk. */ double io_block_read_cost() const; /** Cost of reading a number of random blocks from a table. @param blocks number of blocks to read @return Cost estimate */ double io_block_read_cost(double blocks) const; /** The fixed part of the cost for doing a sequential seek on disk. For a harddisk, this corresponds to half a rotation (see comment for get_sweep_read_cost() in handler.cc). */ double disk_seek_base_cost() const; /** Cost estimate for a sequential disk seek where a given number of blocks are skipped. @param seek_blocks number of blocks to seek past @return The cost estimate for the seek operation */ double disk_seek_cost(double seek_blocks) const; 3. Cost estimates for Handler operations ======================================== The handler will continue to provide cost estimates for the main operations that the storage engine executes. This is similar to how it is done today. This worklog will not implement any changes in the interface to the existing handler functions for cost estimates. Still, see WL#7209 for proposed changes to this interface. As an alternative, it would be possible to replace calling the handler directly by adding corresponding functions to the Cost_model_table object. These functions would then just forward the call to the handler API. This would require us to store information about the handler inside the Cost_model_table object and add some extra overhead.
Source code for new cost model API and cost module ================================================== This worklog will implement the following two new classes: Cost_model_server Cost_model_table The implementation of the new code will be in two new source code files named: sql/opt_costmodel.h sql/opt_costmodel.cc Creating and initialization of Cost_model_server object: ======================================================== The Cost_model_server object will be added as a private variable to the THD class. The THD class is extended with a new member function that will be used for initializing the Cost_model_server object. /** Initialize the optimizer cost model. This function should be called each time a new query is started. */ void init_cost_model(); The Cost_model_server object will be initialized when starting a new query. This initialization takes place in lex_start(). To get access to the Cost_model_server_object a new member function is added to the THD object: const Cost_model_server* THD::cost_model() const; Creating and initialization of Cost_model_table objects: ======================================================== The Cost_model_table object will be added as a private variable to the TABLE class. To get access to this object the following function is added: Cost_model_table* cost_model(); The Cost_model_table object must be initialized with the query's Cost_model_server object in order to get access to some of the "cost constants". To initialize the Cost_model_table object, the following function is added to the TABLE class: /** Initialize the optimizer cost model. This function should be called each time a new query is started. @param cost_model_server the main cost model object for this query */ void init_cost_model(const Cost_model_server* cost_model_server); This initialization must happen after the query is created but before calling cost functions on the Cost_model_table object. Since not all queries go through the main optimizer, this initialization has to be done on several places: * For most "normal" queries: make_join_statistics() in sql/sql_optimizer.cc * Internally created temporary tables: create_tmp_table() in sql/sql_tmp_table.cc * UPDATE statements: mysql_update() in sql/sql_update.cc * DELETE statements: mysql_delete() in sql/sql_delete.cc * "Queries" that list the help text: prepare_simple_select() Using the new cost model API: ============================= The optimizer and handler code will use the new API for the following cases: * All (with one exception) use of hard coded cost constants are replaced by calls to the new API * Some of the cost estimation for use of temporary tables are moved to the new cost module * Some cost estimation of disk accesses in the handler is moved into the new cost module
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.