WL#7315: Optimizer cost model: main memory management of cost constants
Affects: Parser-5.7
—
Status: Complete
The cost constants stored in the cost constant tables must be read into main memory when the server starts and then be used by the optimizer cost model. This worklog will implement the needed functionality of reading the cost constants from the cost constant tables, maintaining these in a main memory cache and providing the current version of them to the optimizer's cost model. This "cost constant cache" will be shared by all running queries and must support concurrent access and sharing of the cached cost constants. It must also support that the cost constant tables can be updated and re-loaded on a running server, and that after re-loading from the cost tables, the updated version of the cost constants is used for queries on new connections. This worklog is implementing a part of "Replace hard-coded constants with configurable cost constants" User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/cost-model.html
Functional requirements: F-1: When using the default configuration for cost constants, there shall be no functional changes in how queries are optimized and executed. F-2: When using the default configuration for cost constants, only in cases where alternative query plans have almost identical cost, there may be changes to query plans due to differences in rounding for floating point calculations. UPDATING AND READING COST CONSTANT TABLES ========================================= F-3: Starting the MySQL server shall cause the cost constant tables to be read and the cost constant values from the cost constant tables shall be used. F-4: Reloading the cost constants by using the command shall cause the cost constant tables to be read and the cost constant values from the cost constant tables shall be used. F-5: Dynamically loading a storage engine plugin shall cause the cost constant tables to be read and the cost constant values from the cost constant tables shall be used. F-6: Updates to the two cost constant tables shall not affect the optimizer before one of the activities described in F-3, F-4, or F-5 have been performed. F-7: On-going sessions shall not be affected when updates to the cost constant tables are read into memory. F-8: After updates to the cost constant tables are read into memory, new sessions shall use the updated cost constants. CHANGING VALUES FOR COST CONSTANTS ================================== F-9: Changing cost constants to non-default values may affect cost estimates printed in EXPLAIN JSON and in optimizer trace. F-10: Changing cost constants to non-default values may cause the optimizer to produce different and possibly less optimal query plans. F-11: Changing cost constants to non-default values will influence cost estimates of operations. Representative examples for changes: F-11a: row_evaluate_cost: changing this will influence the cost of evaluating record conditions. Increasing this will cause a query plan that examines many rows to become more expensive compared to a query plan that examines less records. E.g, a table scan will become relatively more expensive compared to a range scan that reads less records. F-11b: key_compare_cost: changing this will influence the cost of comparing record keys. Increasing this will cause a query plan that compares many keys to become more expensive. E.g. a query plan that does file sort will become relatively more expensive compared to a query plan that avoids the sorting by using an index. F-11d: memory_temptable_create_cost and memory_temptable_row_cost: These are used for the cost estimate for internally created temporary tables stored in a memory based storage engine. Increasing these will increase the cost estimate of using internally created tables and make the optimizer prefer query plans where there is less use of internally created temporary tables. F-11e: disk_temptable_create_cost and disk_temptable_row_cost: These are used for the cost estimate for internally created temporary tables stored in a disk-based storage engine. Increasing these will increase the cost estimate of using internally created tables and make the optimizer prefer query plans where there is less use of internally created temporary tables. F-11f: io_block_read_cost: changing this will influence the cost for reading data from disk. Increasing this will cause a query plan that reads many disk blocks to become more expensive compared to a query plan that need to read fewer disk blocks. Eg. a table scan will become relatively more expensive compared to a range scan that reads few blocks. READING COST CONSTANT TABLES: ============================= F-12: Handling of string values from the cost_name and engine_name of the cost constant tables shall be case-insensitive. F-13: If an entry for a cost constant in the server_cost table has a value in the cost_value column that is different from NULL and larger than 0.0, this value shall be used instead of the default value defined in the source code. F-14: If an entry for a cost constant in the engine_cost table has a value in the cost_value column that is different from NULL and larger than 0.0 and the engine_name column has the value "default" then this value shall be used as default cost constant value instead of the default value defined in the source code for all tables that do not have an engine specific cost value (see next item). F-15: If an entry for a cost constant in the engine_cost table has a value in the cost_value column that is different from NULL and larger than 0.0 and the engine_name column contains a recognized storage engine name, then this value shall be used as cost constant value for tables stored in this storage engine. ERROR HANDLING: =============== F-16: If one or both of the cost constant tables are missing or other causes prevent them from being opened, the following warning shall be written to the MySQL error log: [Warning] Failed to open optimizer cost constant tables F-17: If the name of a "cost constant" in the cost_name column of the server_cost table is not recognized, a warning of the following type shall be written to the MySQL error log: [Warning] Unknown cost constant "lunch_cost" in mysql.server_cost table F-18: If the value for a cost constant in the server_cost table is not valid, a warning of the following type shall be written to the MySQL error log: [Warning] Invalid value for cost constant "row_evaluate_cost" in mysql.server_cost table: -1.0 F-19: If the storage engine name in the engine_column of the engine cost table is not recognized, a warning of the following type shall be written to the MySQL error log: [Warning] Unknown storage engine "Falcon" in mysql.engine_cost table F-20: The only allowed value in the device_type column in the engine_cost table is "0". If it contains a different value, a warning of the following type shall be written to the MySQL error log: [Warning] Invalid device type -1 for "InnoDB" storage engine for cost constant "io_block_read_cost" in mysql.engine_cost table F-21: If the value in the cost_value column in the engine_cost table is zero or a negative value, a warning of the following type shall be written to the MySQL error log: [Warning] Invalid value for cost constant "io_block_read_cost" for "default" storage engine and device type 0 in mysql.engine_cost table: 0.0 Non-Functional requirements: NF-1: With the default configuration, there should be no or very small performance impact from the added functionality when there are no changes to the query plan.
REQUIREMENTS: ============= R-1) When the server starts, the "cost constants" stored in the database tables shall be read into memory and used by new connections. R-2) For "cost constants" that do not have values defined in the cost constant tables, values defined in the source code shall be used. R-3) Updating the main memory cost constants after there has been updates to the cost constants tables shall be supported. The functionality for re-reading the cost constant tables shall be implemented by this worklog while the actual user command for doing it will be implemented later. R-4) When a new connection is started it should get the latest version of the cost constants that have been read into memory. R-5) Updating the cost constants in memory by reading updated cost constants from the database tables should not influence on cost constants used by existing sessions. R-6) When a storage engine is loaded as a plugin, cost constants for this shall be read from the cost constant tables and made available for use. R-x It shall be easy for storage engines to add their own "cost constants" and have these made available for use in the storage engine. OVERVIEW: ========= This worklog implements the following main components: 1. A main memory representation of the "set of cost constants". This contains the cost constants read from the cost constant tables stored in the mysql database. For cost constants that do not have entries in the cost constant tables, default values defined in the source code will be used. 2. A cache for "cost constant sets". This is responsible for creating the set of cost constant, giving new sessions access to the latest versions of the cost constants, and for re-reading the cost constant tables in the case where these have been updated. COST CONSTANT SET ================= To store a complete set of all "cost constants", a new class named Cost_model_constants is implemented. This class contains all cost constants for both the server and all storage engines. The cost constants stored in the cost constants tables will be read from the tables and stored in an object of this class. An instance of this class will be used by the Cost model API to get the correct version of the cost constants. An o The following main classes will be implemented: class Server_cost_constants: ============================ This class stores the "cost constants" used for cost calculations of server operations. For each "cost constant", the class implements the following: a) a variable for storing the cost constant value b) a default value for the cost constant that will be used when initializing the object c) a getter function to retrieve the value of the cost constant. The class also contains the following function that will be used for updating a named "cost constant" with a new values after reading entries from the cost tables: bool set(const LEX_CSTRING &name, double value) See the low-level design section (or the source code) for the detailed interface for this class. class SE_cost_constants: ======================== This class stores "cost constants" for a storage engine. For each "cost constant" this class will implement the following: a) a variable for storing the cost constant value b) a default value for the cost constant that will be used when initializing the object c) a boolean variable specifying whether the current value for this cost const is based on a default value or has been overridden by an engine specific value in the cost tables. This will be used to determine if the current value for the cost constant should be updated or not when reading entries from the cost constant tables. d) a getter function to retrieve the value of the cost constant. Note that this class is intended to be used as a base class if a storage engine wants to add more cost constants (see the section about storage engine specific cost constants below). The detailed interface is included in the low-level design section. class Cost_model_constants: =========================== This is the main class for keeping a complete set of all cost constants. This is created by the cost constant cache. This class contains one object of type Server_cost_constants and an array of SE_cost_constants objects. This array has one entry for each storage engine that the server can support. For each storage engine, there will be one SE_cost_constants object for each type of storage device that is supported (in the initial version there will be only one storage device type). The main interface of this class will be: // Get the cost constants for server operations Server_cost_constants *get_server_cost_constants() // Get the cost constants to use for a given table SE_cost_constants *get_se_cost_constants(const TABLE *table) // Update a named server cost constant to a new value bool update_server_cost_constants(const LEX_STRING &name, double value) // Update a named storage engine cost constant to a new value bool update_engine_cost_constant(THD*, const LEX_STRING &se_name, uint storage_category, const LEX_STRING &name, double value) The main data structures for this class are: a) an instance of the Server_cost_constants class b) an array that stores cost constants for storage engines. The size of this array is MAX_HA (from handler.h). Each entry in this array will store MAX_STORAGE_CLASSES number of (pointers to) SE_cost_constants objects. MAX_STORAGE_CLASSES is the number of different types of storage devices that is supported. In the initial version this constant will have the value 1. When an object of this class is created, it will be initialized with default values of the cost constants as defined in the source code. The cost constant cache (see below) will read the cost constant tables and update the cost constant object with any non-default values. The Cost_model_constants object can be shared by multiple sessions/THDs. The first time a new session initializes its Cost_model_server object (see THD::m_cost_model in sql_class.h), it will get a reference to the current cost constant set object from the "cost constant cache". In order to be able to keep track of the number of THDs that uses this cost constant set, the Cost_model_constants object contains a reference counter. The use of this reference counter will be protected by the cost constant cache's mutex (see below for more details). Support engine specific cost constants ====================================== In order to support that a storage engine can add its own engine specific cost constants (ie. new cost constants, not a specific value for an existing cost constant), the handlerton class is extended with a new function: /** Retrieve cost constants to be used for this storage engine. A storage engine that wants to provide its own cost constants to be used in the optimizer cost model should implement this function. The server will call this function to get a cost constant object that will be used for tables stored in this storage engine instead of using the default cost constants. Life cycle for the cost constant object: The storage engine must allocate the cost constant object on the heap. After the function returns, the server takes over the ownership of this object. The server will eventually delete the object by calling delete. @param storage_category the storage type that the cost constants will be used for @return a pointer to the cost constant object, if NULL is returned the default cost constants will be used */ SE_cost_constants *(*get_cost_constants)(uint storage_category); A storage engine that wants to add its own engine specific cost constants, must then do the following: 1. implement a subclass of SE_cost_constants that has storage for these engine specific cost constants. 2. implement the above handlerton->get_cost_constants() function. This function should return a pointer to an object of the subclass of SE_cost_constants. This object must be allocated on the heap since it will be deleted by the server when there is no more use for it. When the server allocates SE_cost_constants objects for the storage engines, it will first check if the storage engine has added a function in the handlerton->get_cost_constants() etnry, and use this if it exits. If not, it will allocate a SE_cost_constant object using the base class for this engine. COST CONSTANT CACHE =================== The main responsibilities for the cost constant cache are: 1. Keep an in-memory copy of the cost constants values defined in the cost constant tables. 2. Provide fast access to the cost constants for a new session when it starts to optimize its first query. 3. Ensure that the version of the cost constants that a connection is using is not changed for the lifetime of the session. 4. Reload and make a new version of the cost constant set after there has been changes to the content of the cost constant tables. The cost constant cache is implemented in the class Cost_constant_cache. This is a singleton class that will be created when the MySQL server starts. The main interface of this class is: // Initializes the cost constant cache and creates an initial cost constant // set based on constant values defined in the source code void init(); // Closes the cost constant cache and releases allocated resources void close(); // Reads the cost constants from the cost constant tables and creates // a new cost constant set void reload(); // Returns a reference to the current version of the cost constants. The // reference counter of the cost constant set is incremented. Cost_model_constants *get_cost_constants(); // Decrement the reference counter on the cost constant set void release_cost_constants(Cost_model_constants *cost_constants); The cost constant cache will keep a copy of the current set of cost constants. Each time a new session initializes its Cost_model_server object (by calling Cost_model_server::init() in lex_start()), the Cost_model_server object will request the cost model cache to give it the current version of the cost constants. This is done by calling Cost_constant_cache::get_cost_constants(). This function will just return a pointer to the current set of cost constants. As time goes, new cost constant sets might be created and added to the cost constant cache. In order to know when a cost constant set can be deleted, reference counting is used. Each time a session asks for the cost constants, the reference counter is incremented. When the session releases the cost constant set by calling ::release_cost_constants(), the reference counter will be decremented. When the reference counter becomes zero, the cost constant set is deleted from the cost constant cache. Concurrency: ============ The cost constant cache is a singleton and cost constant sets can be shared by multiple sessions. To ensure that concurrent operations are safe, the following operations need to be protected by a mutex: a) internal changes to the cost constant cache. The only operation that needs protection is when the cost constant cache changes to use a new cost constant set. b) decrementing and incrementing of the reference counter on the cost constant sets. The Cost_constant_cache will contain a mysql_mutex_t mutex that will be used for the above operations. During normal operations (ie. when not updating the cost constant cache), this mutex will be locked: a) when a session starts optimizing its first query b) when a session ends In these two cases the mutex will be held for the time it takes to read the pointer to the current cost constant set and for incrementing or decrementing the reference counter. Booting and initialization: =========================== The cost constant cache is created when the mysqld process starts. This is done during the startup sequence in mysqld.cc. There are two cases: a) during initial bootstrap: the cost constant cache will be created and it will initialize a cost constant set by only using cost constant values defined in the source code (ie. it will not read the cost constant tables) b) normal startup: the cost constant cache will be created and a new cost constant set will be initialized by reading cost constants from the cost constant tables. Cost constants that do not have an explicit value in the cost constant tables will be given the value defined in the source code. Similarly, the cost constant cache will be deleted when mysqld is stopped. Reading cost constant tables: ============================= When Cost_constant_cache::reload() is called, a Cost_model_constants object is first initialized using cost constant values from the source code. Then each entry from the cost constant tables is read and used for updating this cost constant object. The two tables are read by first using open_and_lock_tables() to open the tables and then using READ_RECORD.read_record() to read each row. This is similarly to how tables in the mysql database are read in several other places in the server. When the content of each row is inserted into the cost constant set object, the following error situations are detected: -unknown name for "cost constant" -unknown storage engine name -invalid storage category number -negative values for the cost constants For each such error a warning is written to the mysqld's error log. The cost constants tables will be read in the following situations: 1. When the server does a normal startup (ie. not during bootstrap) 2. When the administrator requests that the cost constants should be refreshed. The command for doing this will be implemented in WL#7316. 3. When a new storage engine is loaded. INITIALIZING OF COST MODEL ========================== The cost constants shall be used by the Cost Model API. The following changes to the implementation of the cost model API are implemented: -Cost_model_server::init(): Will use the cost constant cache to get the current version of the cost constants. This will increment the reference counter on the cost constant set. -destructor for Cost_model_server: will "return" the cost constant object to the cost constant cache so that the reference counter is decremented. -Cost_model_table::init(): This will be extended to take a pointer to the table as argument. Information about the table is needed in order to look up the correct cost constant object for the storage engine that stores the table. -All cost functions: the use of hard coded cost values will be replaced by calling the getter functions on the cost constant objects. DEFAULT VALUES FOR COST CONSTANTS ================================= This worklog will not change the value for any of the existing cost constants. The following default values will be used: cost_name cost_value --------------------------------------- row_evaluate_cost 0.2 key_compare_cost 0.1 memory_temptable_create_cost 2.0 memory_temptable_row_cost 0.2 disk_temptable_create_cost 40.0 disk_temptable_row_cost 1.0 io_block_read_cost 1.0
COST CONSTANT SET ================= The cost constant set is implemented with the following main classes: Server_cost_constants --------------------- /** Cost constants for operations done by the server */ class Server_cost_constants { public: /** Creates a server cost constants object using the default values defined in this class. */ Server_cost_constants(); /** Cost for evaluating the query condition on a row. */ double row_evaluate_cost() const; /** Cost for comparing two keys. */ double key_compare_cost() const; /** Cost for creating an internal temporary table in memory. */ double memory_temptable_create_cost() const; /** Cost for retrieving or storing a row in an internal temporary table stored in memory. */ double memory_temptable_row_cost() const; /** Cost for creating an internal temporary table in a disk resident storage engine. */ double disk_temptable_create_cost() const; /** Cost for retrieving or storing a row in an internal disk resident temporary table. */ double disk_temptable_row_cost() const; /** Set the value of one of the cost constants. @param name name of cost constant @param value new value @return Status for updating the cost constant */ cost_constant_error set(const LEX_CSTRING &name, const double value); }; For each cost constant that is maintained by this class, the default value and the current value is stored in this class. For instance, the row_evaluate_cost cost constant will have the following added in the private section of this class: /* This section specifies default values for cost constants. */ /// Default cost for evaluation of the query condition for a row. static const double ROW_EVALUATE_COST= 0.2; /* This section specifies cost constants for server operations */ /// Cost for evaluating the query condition on a row double m_row_evaluate_cost; SE_cost_constants ----------------- /** Cost constants for a storage engine. Storage engines that want to add new cost constants should make a subclass of this class. */ class SE_cost_constants { public: SE_cost_constants(); virtual ~SE_cost_constants() {} /** Cost of reading one random block from disk. */ double io_block_read_cost() const; protected: /** Set the value of one of the cost constants. If a storage engine wants to introduce a new cost constant it should provide an implementation of this function. If the cost constant is not recognized by the function in the subclass, then this function should be called to allow the cost constant in the base class to be given the updated value. @param name name of cost constant @param value new value @param default specify whether the new value is a default value or an engine specific value @return Status for updating the cost constant */ virtual cost_constant_error set(const LEX_CSTRING &name, const double value, bool default_value); protected: friend class Cost_model_constants; /** Update the value of a cost constant. @param name name of the cost constant @param value the new value this cost constant should take @return Status for updating the cost constant */ cost_constant_error update(const LEX_CSTRING &name, const double value); /** Update the default value of a cost constant. If this const constant already has been given a non-default value, then calling this will have no effect on the current value for the cost constant. @param name name of the cost constant @param value the new value this cost constant should take @return Status for updating the cost constant */ cost_constant_error update_default(const LEX_CSTRING &name, const double value); private: /* This section specifies default values for cost constants. */ /// Default cost for reading a random disk block static const double IO_BLOCK_READ_COST= 1.0; /* This section specifies cost constants for the table */ /// Cost constant for reading a random disk block. double m_io_block_read_cost; /* This section has boolean variables that is used for knowing whether the above cost variables is using the default value or not. */ /// Whether the io_block_read_cost is a default value or not bool m_io_block_read_cost_default; }; Cost_model_constants -------------------- /** Set of all cost constants used by the server and all storage engines. */ class Cost_model_constants { public: /** Creates a set with cost constants using the default values defined in the source code. */ Cost_model_constants(); /** Destructor. The only reason for making this virtual is to be able to make a sub-class for use in unit testing. */ virtual ~Cost_model_constants(); /** Get the cost constants that should be used for server operations. @return the cost constants for the server */ const Server_cost_constants *get_server_cost_constants() const; /** Return the cost constants that should be used for a given table. @param table the table to find cost constants for @return the cost constants to use for the table */ const SE_cost_constants *get_se_cost_constants(const TABLE *table) const; /** Update the value for one of the server cost constants. @param name name of the cost constant @param value new value @return Status for updating the cost constant */ cost_constant_error update_server_cost_constant(const LEX_CSTRING &name, double value); /** Update the value for one of the storage engine cost constants. @param thd the THD @param se_name name of storage engine @param storage_category storage device type @param name name of cost constant @param value new value @return Status for updating the cost constant */ cost_constant_error update_engine_cost_constant(THD *thd, const LEX_CSTRING &se_name, uint storage_category, const LEX_CSTRING &name, double value); /** Increment the reference counter for this cost constant set */ void inc_ref_count(); /** Decrement the reference counter for this cost constant set When the returned value is zero, there is nobody using this object and it can be deleted by the caller. @return the updated reference count */ unsigned int dec_ref_count(); private: /// Cost constants for server operations Server_cost_constants m_server_constants; /// Cost constants for storage engines Cost_model_se_info m_engines[MAX_HA]; /// Reference counter for this set of cost constants. unsigned int m_ref_counter; }; Implementation notes: Data structures: * The m_engines[] array contains pointers to the storage engine cost constants for each storage engine. This array must have the same size and order the storage engines in the same order as the corresponding handlertons. The Cost_model_constants class depends on other code for doing the following: * Mapping storage engine name to handlerton: When updating the value for a cost constant for a specific storage engine, the storage engine is identified by its name as string in the engine_cost table. To validate that this name is a supported storage engine and locate the handlerton for this storage engine, the following functions are used: - ha_resolve_by_name(): locate the plugin for the storage engine - plugin_data(): locate the handlerton for the storage engine. COST CONSTANT CACHE =================== The cost constant cache is implemented in by the Cost_constant_cache class that has the following interface and main data structures: class Cost_constant_cache { public: /** Creates an empty cost constant cache. To initialize it with default cost constants, ::init() must be called. To use cost constants from the cost constant tables, ::reload() must be called. */ Cost_constant_cache(); /** Destructor for the cost constant cache. Before the cost constant cache is deleted, ::close() must have been called. */ ~Cost_constant_cache(); /** Initialize the cost module. The cost constants will be initialized with the default values found in the source code. To start using the cost constant values found in the configuration tables the ::reload() function must be called. */ void init(); /** Close the cost constant cache. All resources owned by the cost constant cache are released. */ void close(); /** Reload all cost constants from the configuration tables. */ void reload(); /** Get the currently used set of cost constants. This function will just return a pointer to a shared version of the cost constants. For tracking of how many sessions that is using the set and to be able to know when it is safe to delete the cost constant object, reference counting is used. This function will increase the ref count for the returned cost constant object. To ensure that the reference counter is decreased after finishing the use of the cost constants release_cost_constants() must be called. @note To ensure that the reference counter is only incremented once for each session that uses the cost constant set, this function should only be called once per session. @return pointer to the cost constants */ const Cost_model_constants *get_cost_constants(); /** Releases the cost constant set. This will decrement the reference counter on the cost constant set and if nobody is using it, it will be deleted. This function should be called each time a client (a session) no longer has any use for a cost constant set that it has previously gotten from calling ::get_cost_constants() @param cost_constants pointer to the cost constant set */ void release_cost_constants(const Cost_model_constants *cost_constants); private: /** The current set of cost constants that will be used by arriving queries. */ Cost_model_constants *current_cost_constants; /// Mutex protecting the pointer to the current cost constants mysql_mutex_t LOCK_cost_const; };
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.