WL#7315: Optimizer cost model: main memory management of cost constants
Affects: Parser-5.7
—
Status: Complete
The cost constants stored in the cost constant tables must be read into main memory when the server starts and then be used by the optimizer cost model. This worklog will implement the needed functionality of reading the cost constants from the cost constant tables, maintaining these in a main memory cache and providing the current version of them to the optimizer's cost model. This "cost constant cache" will be shared by all running queries and must support concurrent access and sharing of the cached cost constants. It must also support that the cost constant tables can be updated and re-loaded on a running server, and that after re-loading from the cost tables, the updated version of the cost constants is used for queries on new connections. This worklog is implementing a part of "Replace hard-coded constants with configurable cost constants" User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/cost-model.html
Functional requirements:
F-1: When using the default configuration for cost constants, there shall be no
functional changes in how queries are optimized and executed.
F-2: When using the default configuration for cost constants, only in cases
where alternative query plans have almost identical cost, there may be
changes to query plans due to differences in rounding for floating
point calculations.
UPDATING AND READING COST CONSTANT TABLES
=========================================
F-3: Starting the MySQL server shall cause the cost constant tables to be read
and the cost constant values from the cost constant tables shall be used.
F-4: Reloading the cost constants by using the command shall cause the cost
constant tables to be read and the cost constant
values from the cost constant tables shall be used.
F-5: Dynamically loading a storage engine plugin shall cause the cost constant
tables to be read and the cost constant values from the cost constant
tables shall be used.
F-6: Updates to the two cost constant tables shall not
affect the optimizer before one of the activities described in F-3, F-4,
or F-5 have been performed.
F-7: On-going sessions shall not be affected when updates to the cost constant
tables are read into memory.
F-8: After updates to the cost constant tables are read into memory, new
sessions shall use the updated cost constants.
CHANGING VALUES FOR COST CONSTANTS
==================================
F-9: Changing cost constants to non-default values may affect cost
estimates printed in EXPLAIN JSON and in optimizer trace.
F-10: Changing cost constants to non-default values may cause the optimizer to
produce different and possibly less optimal query plans.
F-11: Changing cost constants to non-default values will influence cost
estimates of operations. Representative examples for changes:
F-11a: row_evaluate_cost: changing this will influence the cost of evaluating
record conditions. Increasing this will cause a query plan that examines
many rows to become more expensive compared to a query plan that
examines less records. E.g, a table scan will become relatively more
expensive compared to a range scan that reads less records.
F-11b: key_compare_cost: changing this will influence the cost of comparing
record keys. Increasing this will cause a query plan that compares many
keys to become more expensive. E.g. a query plan that does file sort
will become relatively more expensive compared to a query plan that
avoids the sorting by using an index.
F-11d: memory_temptable_create_cost and memory_temptable_row_cost: These are
used for the cost estimate for internally created temporary tables
stored in a memory based storage engine. Increasing these will increase
the cost estimate of using internally created tables and make the
optimizer prefer query plans where there is less use of internally
created temporary tables.
F-11e: disk_temptable_create_cost and disk_temptable_row_cost: These are used
for the cost estimate for internally created temporary tables stored in
a disk-based storage engine. Increasing these will
increase the cost estimate of using internally created tables and make
the optimizer prefer query plans where there is less use of internally
created temporary tables.
F-11f: io_block_read_cost: changing this will influence the cost for reading
data from disk. Increasing this will cause a query plan that reads many
disk blocks to become more expensive compared to a query plan that need
to read fewer disk blocks. Eg. a table scan will become relatively more
expensive compared to a range scan that reads few blocks.
READING COST CONSTANT TABLES:
=============================
F-12: Handling of string values from the cost_name and engine_name of the cost
constant tables shall be case-insensitive.
F-13: If an entry for a cost constant in the server_cost table has a value in
the cost_value column that is different from NULL and larger than 0.0,
this value shall be used instead of the default value defined in the
source code.
F-14: If an entry for a cost constant in the engine_cost table has a value in
the cost_value column that is different from NULL and larger than 0.0 and
the engine_name column has the value "default" then this value shall be
used as default cost constant value instead of the default value defined
in the source code for all tables that do not have an engine specific cost
value (see next item).
F-15: If an entry for a cost constant in the engine_cost table has a value in
the cost_value column that is different from NULL and larger than 0.0 and
the engine_name column contains a recognized storage engine name, then
this value shall be used as cost constant value for tables stored in this
storage engine.
ERROR HANDLING:
===============
F-16: If one or both of the cost constant tables are missing or other causes
prevent them from being opened, the following warning shall be written to
the MySQL error log:
[Warning] Failed to open optimizer cost constant tables
F-17: If the name of a "cost constant" in the cost_name column of the
server_cost table is not recognized, a warning of the following type
shall be written to the MySQL error log:
[Warning] Unknown cost constant "lunch_cost" in mysql.server_cost table
F-18: If the value for a cost constant in the server_cost table is not valid,
a warning of the following type shall be written to the MySQL error log:
[Warning] Invalid value for cost constant "row_evaluate_cost" in
mysql.server_cost table: -1.0
F-19: If the storage engine name in the engine_column of the engine cost table
is not recognized, a warning of the following type shall be written
to the MySQL error log:
[Warning] Unknown storage engine "Falcon" in mysql.engine_cost table
F-20: The only allowed value in the device_type column in the engine_cost table
is "0". If it contains a different value, a warning of the following
type shall be written to the MySQL error log:
[Warning] Invalid device type -1 for "InnoDB" storage engine for cost
constant "io_block_read_cost" in mysql.engine_cost table
F-21: If the value in the cost_value column in the engine_cost table is zero or
a negative value, a warning of the following type shall be written to
the MySQL error log:
[Warning] Invalid value for cost constant "io_block_read_cost" for
"default" storage engine and device type 0 in mysql.engine_cost table: 0.0
Non-Functional requirements:
NF-1: With the default configuration, there should be no or very small
performance impact from the added functionality when there are no changes
to the query plan.
REQUIREMENTS:
=============
R-1) When the server starts, the "cost constants" stored in the database tables
shall be read into memory and used by new connections.
R-2) For "cost constants" that do not have values defined in the cost constant
tables, values defined in the source code shall be used.
R-3) Updating the main memory cost constants after there has been updates to
the cost constants tables shall be supported. The functionality for
re-reading the cost constant tables shall be implemented by this worklog
while the actual user command for doing it will be implemented later.
R-4) When a new connection is started it should get the latest version of the
cost constants that have been read into memory.
R-5) Updating the cost constants in memory by reading updated cost constants
from the database tables should not influence on cost constants used by
existing sessions.
R-6) When a storage engine is loaded as a plugin, cost constants for this shall
be read from the cost constant tables and made available for use.
R-x It shall be easy for storage engines to add their own "cost constants" and
have these made available for use in the storage engine.
OVERVIEW:
=========
This worklog implements the following main components:
1. A main memory representation of the "set of cost constants". This contains
the cost constants read from the cost constant tables stored in the mysql
database. For cost constants that do not have entries in the
cost constant tables, default values defined in the source code will be used.
2. A cache for "cost constant sets". This is responsible for creating the set
of cost constant, giving new sessions access to the latest versions of the
cost constants, and for re-reading the cost constant tables in the case
where these have been updated.
COST CONSTANT SET
=================
To store a complete set of all "cost constants", a new class named
Cost_model_constants is implemented. This class contains all cost constants for
both the server and all storage engines. The cost constants stored in the cost
constants tables will be read from the tables and stored in an
object of this class. An instance of this class will be used by the Cost model
API to get the correct version of the cost constants.
An o The following main classes will be implemented:
class Server_cost_constants:
============================
This class stores the "cost constants" used for cost calculations of server
operations. For each "cost constant", the class implements the following:
a) a variable for storing the cost constant value
b) a default value for the cost constant that will be used when initializing
the object
c) a getter function to retrieve the value of the cost constant.
The class also contains the following function that will be used for updating a
named "cost constant" with a new values after reading entries from the cost tables:
bool set(const LEX_CSTRING &name, double value)
See the low-level design section (or the source code) for the detailed interface
for this class.
class SE_cost_constants:
========================
This class stores "cost constants" for a storage engine. For each "cost
constant" this class will implement the following:
a) a variable for storing the cost constant value
b) a default value for the cost constant that will be used when initializing
the object
c) a boolean variable specifying whether the current value for this cost const
is based on a default value or has been overridden by an engine specific
value in the cost tables. This will be used to
determine if the current value for the cost constant should be updated or
not when reading entries from the cost constant tables.
d) a getter function to retrieve the value of the cost constant.
Note that this class is intended to be used as a base class if a storage engine
wants to add more cost constants (see the section about storage engine specific
cost constants below). The detailed interface is included in the low-level
design section.
class Cost_model_constants:
===========================
This is the main class for keeping a complete set of all cost constants. This is
created by the cost constant cache. This class contains one object of type
Server_cost_constants and an array of SE_cost_constants objects. This array has
one entry for each storage engine that the server can support. For each storage
engine, there will be one SE_cost_constants object for each type of storage
device that is supported (in the initial version there will be only one storage
device type). The main interface of this class will be:
// Get the cost constants for server operations
Server_cost_constants *get_server_cost_constants()
// Get the cost constants to use for a given table
SE_cost_constants *get_se_cost_constants(const TABLE *table)
// Update a named server cost constant to a new value
bool update_server_cost_constants(const LEX_STRING &name, double value)
// Update a named storage engine cost constant to a new value
bool update_engine_cost_constant(THD*, const LEX_STRING &se_name,
uint storage_category,
const LEX_STRING &name, double value)
The main data structures for this class are:
a) an instance of the Server_cost_constants class
b) an array that stores cost constants for storage engines. The size of this
array is MAX_HA (from handler.h). Each entry in this array will store
MAX_STORAGE_CLASSES number of (pointers to) SE_cost_constants objects.
MAX_STORAGE_CLASSES is the number of different types of storage devices
that is supported. In the initial version this constant will have the
value 1.
When an object of this class is created, it will be initialized with default
values of the cost constants as defined in the source code. The cost constant
cache (see below) will read the cost constant tables and update the cost
constant object with any non-default values.
The Cost_model_constants object can be shared by multiple sessions/THDs. The
first time a new session initializes its Cost_model_server object (see
THD::m_cost_model in sql_class.h), it will get a reference to the current cost
constant set object from the "cost constant cache". In order to be able to keep
track of the number of THDs that uses this cost constant set, the
Cost_model_constants object contains a reference counter. The use of this
reference counter will be protected by the cost constant cache's mutex (see
below for more details).
Support engine specific cost constants
======================================
In order to support that a storage engine can add its own engine specific cost
constants (ie. new cost constants, not a specific value for an existing cost
constant), the handlerton class is extended with a new function:
/**
Retrieve cost constants to be used for this storage engine.
A storage engine that wants to provide its own cost constants to
be used in the optimizer cost model should implement this function.
The server will call this function to get a cost constant object
that will be used for tables stored in this storage engine instead
of using the default cost constants.
Life cycle for the cost constant object: The storage engine must
allocate the cost constant object on the heap. After the function
returns, the server takes over the ownership of this object.
The server will eventually delete the object by calling delete.
@param storage_category the storage type that the cost constants will
be used for
@return a pointer to the cost constant object, if NULL is returned
the default cost constants will be used
*/
SE_cost_constants *(*get_cost_constants)(uint storage_category);
A storage engine that wants to add its own engine specific cost constants, must
then do the following:
1. implement a subclass of SE_cost_constants that has storage for these engine
specific cost constants.
2. implement the above handlerton->get_cost_constants() function. This
function should return a pointer to an object of the subclass of
SE_cost_constants. This object must be allocated on the heap since it will
be deleted by the server when there is no more use for it.
When the server allocates SE_cost_constants objects for the storage engines, it
will first check if the storage engine has added a function in the
handlerton->get_cost_constants() etnry, and use this if it exits. If not, it
will allocate a SE_cost_constant object using the base class for this engine.
COST CONSTANT CACHE
===================
The main responsibilities for the cost constant cache are:
1. Keep an in-memory copy of the cost constants values defined in the
cost constant tables.
2. Provide fast access to the cost constants for a new session when it starts
to optimize its first query.
3. Ensure that the version of the cost constants that a connection is using is
not changed for the lifetime of the session.
4. Reload and make a new version of the cost constant set after there has been
changes to the content of the cost constant tables.
The cost constant cache is implemented in the class Cost_constant_cache. This is
a singleton class that will be created when the MySQL server starts. The main
interface of this class is:
// Initializes the cost constant cache and creates an initial cost constant
// set based on constant values defined in the source code
void init();
// Closes the cost constant cache and releases allocated resources
void close();
// Reads the cost constants from the cost constant tables and creates
// a new cost constant set
void reload();
// Returns a reference to the current version of the cost constants. The
// reference counter of the cost constant set is incremented.
Cost_model_constants *get_cost_constants();
// Decrement the reference counter on the cost constant set
void release_cost_constants(Cost_model_constants *cost_constants);
The cost constant cache will keep a copy of the current set of cost constants.
Each time a new session initializes its Cost_model_server object (by calling
Cost_model_server::init() in lex_start()), the Cost_model_server object will
request the cost model cache to give it the current version of the cost
constants. This is done by calling Cost_constant_cache::get_cost_constants().
This function will just return a pointer to the current set of cost constants.
As time goes, new cost constant sets might be created and added to the cost
constant cache. In order to know when a cost constant set can be deleted,
reference counting is used. Each time a session asks for the cost constants, the
reference counter is incremented. When the session releases the cost constant
set by calling ::release_cost_constants(), the reference counter will be
decremented. When the reference counter becomes zero, the cost constant set is
deleted from the cost constant cache.
Concurrency:
============
The cost constant cache is a singleton and cost constant sets can be shared by
multiple sessions. To ensure that concurrent operations are safe, the following
operations need to be protected by a mutex:
a) internal changes to the cost constant cache. The only operation that
needs protection is when the cost constant cache changes to use a new cost
constant set.
b) decrementing and incrementing of the reference counter on the cost constant
sets.
The Cost_constant_cache will contain a mysql_mutex_t mutex that will be used for
the above operations. During normal operations (ie. when not updating the cost
constant cache), this mutex will be locked:
a) when a session starts optimizing its first query
b) when a session ends
In these two cases the mutex will be held for the time it takes to read the
pointer to the current cost constant set and for incrementing or decrementing
the reference counter.
Booting and initialization:
===========================
The cost constant cache is created when the mysqld process starts. This is done
during the startup sequence in mysqld.cc. There are two cases:
a) during initial bootstrap: the cost constant cache will be created and it
will initialize a cost constant set by only using cost constant values
defined in the source code (ie. it will not read the cost constant tables)
b) normal startup: the cost constant cache will be created and a new cost
constant set will be initialized by reading cost constants from the cost
constant tables. Cost constants that do not have an explicit value in the
cost constant tables will be given the value defined in the source code.
Similarly, the cost constant cache will be deleted when mysqld is stopped.
Reading cost constant tables:
=============================
When Cost_constant_cache::reload() is called, a Cost_model_constants object is
first initialized using cost constant values from the source code. Then each
entry from the cost constant tables is read and used for updating this cost
constant object.
The two tables are read by first using open_and_lock_tables()
to open the tables and then using READ_RECORD.read_record() to read each row.
This is similarly to how tables in the mysql database are read in several other
places in the server. When the content of each row is inserted into the cost
constant set object, the following error situations are detected:
-unknown name for "cost constant"
-unknown storage engine name
-invalid storage category number
-negative values for the cost constants
For each such error a warning is written to the mysqld's error log.
The cost constants tables will be read in the following situations:
1. When the server does a normal startup (ie. not during bootstrap)
2. When the administrator requests that the cost constants should be
refreshed. The command for doing this will be implemented in WL#7316.
3. When a new storage engine is loaded.
INITIALIZING OF COST MODEL
==========================
The cost constants shall be used by the Cost Model API. The following changes to
the implementation of the cost model API are implemented:
-Cost_model_server::init(): Will use the cost constant cache to get the
current version of the cost constants. This will increment the reference
counter on the cost constant set.
-destructor for Cost_model_server: will "return" the cost constant object to
the cost constant cache so that the reference counter is decremented.
-Cost_model_table::init(): This will be extended to take a pointer to the
table as argument. Information about the table is needed in order to look up
the correct cost constant object for the storage engine that stores the table.
-All cost functions: the use of hard coded cost values will be replaced by
calling the getter functions on the cost constant objects.
DEFAULT VALUES FOR COST CONSTANTS
=================================
This worklog will not change the value for any of the existing cost constants.
The following default values will be used:
cost_name cost_value
---------------------------------------
row_evaluate_cost 0.2
key_compare_cost 0.1
memory_temptable_create_cost 2.0
memory_temptable_row_cost 0.2
disk_temptable_create_cost 40.0
disk_temptable_row_cost 1.0
io_block_read_cost 1.0
COST CONSTANT SET
=================
The cost constant set is implemented with the following main classes:
Server_cost_constants
---------------------
/**
Cost constants for operations done by the server
*/
class Server_cost_constants
{
public:
/**
Creates a server cost constants object using the default values
defined in this class.
*/
Server_cost_constants();
/**
Cost for evaluating the query condition on a row.
*/
double row_evaluate_cost() const;
/**
Cost for comparing two keys.
*/
double key_compare_cost() const;
/**
Cost for creating an internal temporary table in memory.
*/
double memory_temptable_create_cost() const;
/**
Cost for retrieving or storing a row in an internal temporary table
stored in memory.
*/
double memory_temptable_row_cost() const;
/**
Cost for creating an internal temporary table in a disk resident
storage engine.
*/
double disk_temptable_create_cost() const;
/**
Cost for retrieving or storing a row in an internal disk resident
temporary table.
*/
double disk_temptable_row_cost() const;
/**
Set the value of one of the cost constants.
@param name name of cost constant
@param value new value
@return Status for updating the cost constant
*/
cost_constant_error set(const LEX_CSTRING &name, const double value);
};
For each cost constant that is maintained by this class, the default value and
the current value is stored in this class. For instance, the row_evaluate_cost
cost constant will have the following added in the private section of this class:
/*
This section specifies default values for cost constants.
*/
/// Default cost for evaluation of the query condition for a row.
static const double ROW_EVALUATE_COST= 0.2;
/*
This section specifies cost constants for server operations
*/
/// Cost for evaluating the query condition on a row
double m_row_evaluate_cost;
SE_cost_constants
-----------------
/**
Cost constants for a storage engine.
Storage engines that want to add new cost constants should make
a subclass of this class.
*/
class SE_cost_constants
{
public:
SE_cost_constants();
virtual ~SE_cost_constants() {}
/**
Cost of reading one random block from disk.
*/
double io_block_read_cost() const;
protected:
/**
Set the value of one of the cost constants.
If a storage engine wants to introduce a new cost constant it should
provide an implementation of this function. If the cost constant
is not recognized by the function in the subclass, then this function
should be called to allow the cost constant in the base class to be
given the updated value.
@param name name of cost constant
@param value new value
@param default specify whether the new value is a default value or
an engine specific value
@return Status for updating the cost constant
*/
virtual cost_constant_error set(const LEX_CSTRING &name, const double value,
bool default_value);
protected:
friend class Cost_model_constants;
/**
Update the value of a cost constant.
@param name name of the cost constant
@param value the new value this cost constant should take
@return Status for updating the cost constant
*/
cost_constant_error update(const LEX_CSTRING &name, const double value);
/**
Update the default value of a cost constant.
If this const constant already has been given a non-default value,
then calling this will have no effect on the current value for the
cost constant.
@param name name of the cost constant
@param value the new value this cost constant should take
@return Status for updating the cost constant
*/
cost_constant_error update_default(const LEX_CSTRING &name,
const double value);
private:
/*
This section specifies default values for cost constants.
*/
/// Default cost for reading a random disk block
static const double IO_BLOCK_READ_COST= 1.0;
/*
This section specifies cost constants for the table
*/
/// Cost constant for reading a random disk block.
double m_io_block_read_cost;
/*
This section has boolean variables that is used for knowing whether
the above cost variables is using the default value or not.
*/
/// Whether the io_block_read_cost is a default value or not
bool m_io_block_read_cost_default;
};
Cost_model_constants
--------------------
/**
Set of all cost constants used by the server and all storage engines.
*/
class Cost_model_constants
{
public:
/**
Creates a set with cost constants using the default values defined in
the source code.
*/
Cost_model_constants();
/**
Destructor.
The only reason for making this virtual is to be able to make
a sub-class for use in unit testing.
*/
virtual ~Cost_model_constants();
/**
Get the cost constants that should be used for server operations.
@return the cost constants for the server
*/
const Server_cost_constants *get_server_cost_constants() const;
/**
Return the cost constants that should be used for a given table.
@param table the table to find cost constants for
@return the cost constants to use for the table
*/
const SE_cost_constants *get_se_cost_constants(const TABLE *table) const;
/**
Update the value for one of the server cost constants.
@param name name of the cost constant
@param value new value
@return Status for updating the cost constant
*/
cost_constant_error update_server_cost_constant(const LEX_CSTRING &name,
double value);
/**
Update the value for one of the storage engine cost constants.
@param thd the THD
@param se_name name of storage engine
@param storage_category storage device type
@param name name of cost constant
@param value new value
@return Status for updating the cost constant
*/
cost_constant_error update_engine_cost_constant(THD *thd,
const LEX_CSTRING &se_name,
uint storage_category,
const LEX_CSTRING &name,
double value);
/**
Increment the reference counter for this cost constant set
*/
void inc_ref_count();
/**
Decrement the reference counter for this cost constant set
When the returned value is zero, there is nobody using this object
and it can be deleted by the caller.
@return the updated reference count
*/
unsigned int dec_ref_count();
private:
/// Cost constants for server operations
Server_cost_constants m_server_constants;
/// Cost constants for storage engines
Cost_model_se_info m_engines[MAX_HA];
/// Reference counter for this set of cost constants.
unsigned int m_ref_counter;
};
Implementation notes:
Data structures:
* The m_engines[] array contains pointers to the storage engine cost constants
for each storage engine. This array must have the same size and order the
storage engines in the same order as the corresponding handlertons.
The Cost_model_constants class depends on other code for doing the following:
* Mapping storage engine name to handlerton: When updating the value for a cost
constant for a specific storage engine, the storage engine is identified by
its name as string in the engine_cost table. To validate that this name
is a supported storage engine and locate the handlerton for this storage
engine, the following functions are used:
- ha_resolve_by_name(): locate the plugin for the storage engine
- plugin_data(): locate the handlerton for the storage engine.
COST CONSTANT CACHE
===================
The cost constant cache is implemented in by the Cost_constant_cache class that
has the following interface and main data structures:
class Cost_constant_cache
{
public:
/**
Creates an empty cost constant cache. To initialize it with default
cost constants, ::init() must be called. To use cost constants from
the cost constant tables, ::reload() must be called.
*/
Cost_constant_cache();
/**
Destructor for the cost constant cache. Before the cost constant cache
is deleted, ::close() must have been called.
*/
~Cost_constant_cache();
/**
Initialize the cost module.
The cost constants will be initialized with the default values found in
the source code. To start using the cost constant values found in
the configuration tables the ::reload() function must be called.
*/
void init();
/**
Close the cost constant cache.
All resources owned by the cost constant cache are released.
*/
void close();
/**
Reload all cost constants from the configuration tables.
*/
void reload();
/**
Get the currently used set of cost constants.
This function will just return a pointer to a shared version of the
cost constants. For tracking of how many sessions that is using the
set and to be able to know when it is safe to delete the cost constant
object, reference counting is used. This function will increase the
ref count for the returned cost constant object. To ensure that the
reference counter is decreased after finishing the use of the cost
constants release_cost_constants() must be called.
@note To ensure that the reference counter is only incremented once for
each session that uses the cost constant set, this function should only
be called once per session.
@return pointer to the cost constants
*/
const Cost_model_constants *get_cost_constants();
/**
Releases the cost constant set.
This will decrement the reference counter on the cost constant set and
if nobody is using it, it will be deleted. This function should be
called each time a client (a session) no longer has any use for a
cost constant set that it has previously gotten from calling
::get_cost_constants()
@param cost_constants pointer to the cost constant set
*/
void release_cost_constants(const Cost_model_constants *cost_constants);
private:
/**
The current set of cost constants that will be used by arriving
queries.
*/
Cost_model_constants *current_cost_constants;
/// Mutex protecting the pointer to the current cost constants
mysql_mutex_t LOCK_cost_const;
};
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.