MySQL 9.2.0
Source Code Documentation
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages Concepts
cost_model.h File Reference

Go to the source code of this file.

Classes

struct  FilterCost
 See EstimateFilterCost. More...
 
struct  BytesPerTableRow
 This struct represents the number of bytes we expect to read for a table row. More...
 

Enumerations

enum class  RangeScanType : char { kMultiRange , kSingleRange }
 Type of range scan operation. More...
 

Functions

void AddCost (THD *thd, const ContainedSubquery &subquery, double num_rows, FilterCost *cost)
 Used internally by EstimateFilterCost() only. More...
 
FilterCost EstimateFilterCost (THD *thd, double num_rows, Item *condition, const Query_block *outer_query_block)
 Estimate the cost of evaluating “condition”, “num_rows” times. More...
 
FilterCost EstimateFilterCost (THD *thd, double num_rows, const Mem_root_array< ContainedSubquery > &contained_subqueries)
 A cheaper overload of EstimateFilterCost() that assumes that all contained subqueries have already been extracted (ie., it skips the walking, which can be fairly expensive). More...
 
void EstimateSortCost (THD *thd, AccessPath *path, double distinct_rows=kUnknownRowCount)
 Estimate costs and output rows for a SORT AccessPath. More...
 
void EstimateMaterializeCost (THD *thd, AccessPath *path)
 Provide row estimates and costs for a MATERIALIZE AccessPath. More...
 
double EstimateDistinctRows (THD *thd, double child_rows, Bounds_checked_array< const Item *const > terms)
 Estimate the number of rows with a distinct combination of values for 'terms'. More...
 
void EstimateAggregateCost (THD *thd, AccessPath *path, const Query_block *query_block)
 Estimate costs and result row count for an aggregate operation. More...
 
void EstimateDeleteRowsCost (AccessPath *path)
 
void EstimateUpdateRowsCost (AccessPath *path)
 
void EstimateStreamCost (THD *thd, AccessPath *path)
 Estimate the costs and row count for a STREAM AccessPath. More...
 
void EstimateLimitOffsetCost (AccessPath *path)
 Estimate the costs and row count for a WINDOW AccessPath. More...
 
void EstimateTemptableAggregateCost (THD *thd, AccessPath *path, const Query_block *query_block)
 Estimate the costs and row count for a Temp table Aggregate AccessPath. More...
 
void EstimateWindowCost (AccessPath *path)
 Estimate the costs and row count for a WINDOW AccessPath. More...
 
double EstimateSemijoinFanOut (THD *thd, double right_rows, const JoinPredicate &edge)
 Estimate the fan out for a left semijoin or a left antijoin. More...
 
double FindOutputRowsForJoin (THD *thd, double left_rows, double right_rows, const JoinPredicate *edge)
 Estimate the number of output rows from joining two relations. More...
 
bool IsClusteredPrimaryKey (const TABLE *table, unsigned key_idx)
 Determines whether a given key on a table is both clustered and primary. More...
 
BytesPerTableRow EstimateBytesPerRowWideTable (const TABLE *table, int64_t max_row_size)
 Estimate the average number of bytes that we need to read from the storage engine when reading a row from 'table'. More...
 
unsigned ClampedBlockSize (const TABLE *table)
 We clamp the block size to lie in the interval between the max and min allowed block size for InnoDB (2^12 to 2^16). More...
 
BytesPerTableRow EstimateBytesPerRowTable (const TABLE *table)
 Estimates the number of bytes that MySQL must process when reading a row from a table, independently of the size of the read set. More...
 
unsigned EstimateBytesPerRowIndex (const TABLE *table, unsigned key_idx)
 Estimates the number of bytes that MySQL must process when reading a row from a secondary index, independently of the size of the read set. More...
 
int IndexHeight (const TABLE *table, unsigned key_idx)
 Estimates the height of a B-tree index. More...
 
double TableAccessIOCost (const TABLE *table, double num_rows, BytesPerTableRow row_size)
 Calculate the IO-cost of reading 'num_rows' rows from 'table'. More...
 
double CoveringIndexAccessIOCost (const TABLE *table, unsigned key_idx, double num_rows)
 Calculate the IO-cost of doing a lookup on index 'key_idx' on 'table' and then read 'num_rows' rows. More...
 
double RowReadCost (double num_rows, double fields_read_per_row, double bytes_per_row)
 Computes the expected cost of reading a number of rows. More...
 
double RowReadCostTable (const TABLE *table, double num_rows)
 Computes the cost of reading a number of rows from a table. More...
 
double RowReadCostIndex (const TABLE *table, unsigned key_idx, double num_rows)
 Computes the cost of reading a number of rows from an index. More...
 
double EstimateTableScanCost (const TABLE *table)
 Estimates the cost of a full table scan. More...
 
double IndexLookupCost (const TABLE *table, unsigned key_idx)
 Estimates the cost of an index lookup. More...
 
double EstimateIndexRangeScanCost (const TABLE *table, unsigned key_idx, RangeScanType scan_type, double num_ranges, double num_output_rows)
 Estimates the cost of an index range scan. More...
 
double EstimateIndexScanCost (const TABLE *table, unsigned key_idx)
 Estimates the cost of an index scan. More...
 
double EstimateRefAccessCost (const TABLE *table, unsigned key_idx, double num_output_rows)
 Estimates the cost of an index lookup (ref access). More...
 

Variables

constexpr size_t kMaxItemLengthEstimate = 4096
 When we make cost estimates, we use this as the maximal length the values we get from evaluating an Item (in bytes). More...
 
constexpr ha_rows kRowEstimateFallback = 1000
 A fallback cardinality estimate that is used in case the storage engine cannot provide one (like for table functions). More...
 
constexpr double kIOStartCost {937.0}
 We model the IO cost for InnoDB tables with the DYNAMIC row format. More...
 
constexpr double kIOByteCost {0.0549}
 The additional cost of reading an extra byte from disk. More...
 
constexpr double kBlockFillFactor {0.75}
 This is the estimated fraction of an (innodb) block that is in use (i.e. More...
 
constexpr unsigned kMinEstimatedBytesPerRow = 8
 The minimum number of bytes to return for row length estimates. More...
 
constexpr unsigned kMaxEstimatedBytesPerRow = 8 * 1024
 The maximum number of bytes to return for row length estimates. More...
 

Enumeration Type Documentation

◆ RangeScanType

enum class RangeScanType : char
strong

Type of range scan operation.

Enumerator
kMultiRange 

Using the MRR optimization.

kSingleRange 

Plain range scan, without the MRR optimization.

Function Documentation

◆ AddCost()

void AddCost ( THD thd,
const ContainedSubquery subquery,
double  num_rows,
FilterCost cost 
)

Used internally by EstimateFilterCost() only.

◆ ClampedBlockSize()

unsigned ClampedBlockSize ( const TABLE table)
inline

We clamp the block size to lie in the interval between the max and min allowed block size for InnoDB (2^12 to 2^16).

Ideally we would have a guarantee that stats.block_size has a reasonable value (across all storage engines, types of tables, state of statistics), but in the absence of such a guarantee we clamp to the values for the InnoDB storage engine since the cost model has been calibrated for these values.

◆ CoveringIndexAccessIOCost()

double CoveringIndexAccessIOCost ( const TABLE table,
unsigned  key_idx,
double  num_rows 
)

Calculate the IO-cost of doing a lookup on index 'key_idx' on 'table' and then read 'num_rows' rows.

◆ EstimateAggregateCost()

void EstimateAggregateCost ( THD thd,
AccessPath path,
const Query_block query_block 
)

Estimate costs and result row count for an aggregate operation.

Parameters
[in,out]thdThe current thread.
[in,out]pathThe AGGREGATE path.
[in]query_blockThe Query_block to which 'path' belongs.

◆ EstimateBytesPerRowIndex()

unsigned EstimateBytesPerRowIndex ( const TABLE table,
unsigned  key_idx 
)
inline

Estimates the number of bytes that MySQL must process when reading a row from a secondary index, independently of the size of the read set.

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
Returns
The estimated number of bytes per row in the index.

◆ EstimateBytesPerRowTable()

BytesPerTableRow EstimateBytesPerRowTable ( const TABLE table)
inline

Estimates the number of bytes that MySQL must process when reading a row from a table, independently of the size of the read set.

Parameters
tableThe table to produce an estimate for.
Returns
The estimated row size.
Note
There are two different relevant concepts of bytes per row:
  1. The bytes per row on the storage engine side.
  2. The bytes per row on the server side.

One the storage engine side, for InnoDB at least, we compute stats.mean_rec_length as the size of the data file divided by the number of rows in the table.

On the server side we are interested in the size of the MySQL representation in bytes. This could be a more accurate statistic when determining the CPU cost of processing a row (i.e., it does not matter very much if InnoDB pages are only half-full). As an approximation to the size of a row in bytes on the server side we use the length of the record buffer for rows that should not exceed the maximal size of an InnoDB B-tree record. (Otherwise, we call EstimateBytesPerRowWideTable() to make the estimate).

Note that when the table fits in a single page stats.mean_rec_length will tend to overestimate the record length since it is computed as stats.data_file_length / stats.records and the data file length is at least a full page which defaults to 16384 bytes (for InnoDB at least). We may then get a better estimate from table->s->rec_buff_length.

◆ EstimateBytesPerRowWideTable()

BytesPerTableRow EstimateBytesPerRowWideTable ( const TABLE table,
int64_t  max_row_size 
)

Estimate the average number of bytes that we need to read from the storage engine when reading a row from 'table'.

This is the size of the (b-tree) record and the overflow pages of any field that is part of the projection. This is similar to what EstimateBytesPerRowTable() does, but this function is intended to do a more accurate but also more expensive calculation for tables with potentially large rows (i.e. tables with BLOBs or large VARCHAR fields).

Note that this function tailored for InnoDB (and also for the DYNAMIC row format of InnoDB). At some point we may want to move this logic to the handler, so that we can customize it for other engines as well.

Parameters
tableThe target table
max_row_sizeThe row size if all variable-sized fields are full.
Returns
The estimated row size.

◆ EstimateDeleteRowsCost()

void EstimateDeleteRowsCost ( AccessPath path)

◆ EstimateDistinctRows()

double EstimateDistinctRows ( THD thd,
double  child_rows,
Bounds_checked_array< const Item *const >  terms 
)

Estimate the number of rows with a distinct combination of values for 'terms'.

See also
EstimateDistinctRowsFromStatistics for additional details.
Parameters
thdThe current thread.
child_rowsThe number of input rows.
termsThe terms for which we estimate the number of unique combinations.
Returns
The estimated number of output rows.

◆ EstimateFilterCost() [1/2]

FilterCost EstimateFilterCost ( THD thd,
double  num_rows,
const Mem_root_array< ContainedSubquery > &  contained_subqueries 
)
inline

A cheaper overload of EstimateFilterCost() that assumes that all contained subqueries have already been extracted (ie., it skips the walking, which can be fairly expensive).

This data is typically computed by FindContainedSubqueries().

◆ EstimateFilterCost() [2/2]

FilterCost EstimateFilterCost ( THD thd,
double  num_rows,
Item condition,
const Query_block outer_query_block 
)

Estimate the cost of evaluating “condition”, “num_rows” times.

This is a fairly rudimentary estimation, but it includes the cost of any subqueries that may be present and that need evaluation.

◆ EstimateIndexRangeScanCost()

double EstimateIndexRangeScanCost ( const TABLE table,
unsigned  key_idx,
RangeScanType  scan_type,
double  num_ranges,
double  num_output_rows 
)

Estimates the cost of an index range scan.

The cost model for index range scans accounts for the index lookup cost as well as the cost of reading rows. Both index scans and ref accesses can be viewed as special cases of index range scans, so the cost functions for those operations call this function under the hood.

Note
In the future this function should be extended to account for IO cost.
Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
scan_typeWhether this an MRR or a regular index range scan.
num_rangesThe number of ranges.
num_output_rowsThe estimated expected number of output rows.
Returns
The estimated cost of the index range scan operation.

◆ EstimateIndexScanCost()

double EstimateIndexScanCost ( const TABLE table,
unsigned  key_idx 
)
inline

Estimates the cost of an index scan.

An index scan scans all rows in the table along the supplied index.

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
Returns
The estimated cost of the index scan.

◆ EstimateLimitOffsetCost()

void EstimateLimitOffsetCost ( AccessPath path)

Estimate the costs and row count for a WINDOW AccessPath.

As described in

See also
AccessPath::m_init_cost, the cost to read k out of N rows would be init_cost + (k/N) * (cost - init_cost).

◆ EstimateMaterializeCost()

void EstimateMaterializeCost ( THD thd,
AccessPath path 
)

Provide row estimates and costs for a MATERIALIZE AccessPath.

MATERIALIZE AccessPath is created by both old and new optimizer in many different contexts where temp table needs to be created, both with and without deduplication. E.g. materialization for a derived table, materializing deduplicated input rows for DISTINCT, GROUP BY clause without an aggregation functions, SET operations, etc

Note:

  • SET operations that do deduplication (such as UNION DISTINCT, EXCEPT and INTERSECT) currently do not consider deduplication cost. They should.
  • There is no aggregation involved in this path. Aggregation with temp table uses a different Access path.

◆ EstimateRefAccessCost()

double EstimateRefAccessCost ( const TABLE table,
unsigned  key_idx,
double  num_output_rows 
)
inline

Estimates the cost of an index lookup (ref access).

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
num_output_rowsThe estimated number of output rows.
Returns
The estimated cost of the index scan.

◆ EstimateSemijoinFanOut()

double EstimateSemijoinFanOut ( THD thd,
double  right_rows,
const JoinPredicate edge 
)

Estimate the fan out for a left semijoin or a left antijoin.

The fan out is defined as the number of result rows, divided by the number of input rows from the left hand relation. For a semijoin, J1:

SELECT ... FROM t1 WHERE EXISTS (SELECT ... FROM t2 WHERE predicate)

we know that the fan out of the corresponding inner join J2:

SELECT ... FROM t1, t2 WHERE predicate

is: F(J2) = CARD(t2) * SELECTIVITY(predicate) , where CARD(t2)=right_rows, and SELECTIVITY(predicate)=edge.selectivity. If 'predicate' is a deterministic function of t1 and t2 rows, then J1 is equivalent to an inner join J3:

SELECT ... FROM t1 JOIN (SELECT DISTINCT f1,..fn FROM t2) d ON predicate

where f1,..fn are those fields from t2 that appear in the predicate.

Then F(J1) = F(J3) = F(J2) * CARD(d) / CARD(t2) = CARD(d) * SELECTIVITY(predicate).

This function therefore collects f1..fn and estimates CARD(d). As a special case, 'predicate' may be independent of t2. The query is then equivalent to:

SELECT ... FROM t1 WHERE predicate AND (SELECT COUNT(*) FROM t2) > 0

The fan out is then the selectivity of 'predicate' multiplied by the probability of t2 having at least one row.

Parameters
thdThe current thread.
right_rowsThe number of input rows from the right hand relation.
edgeJoin edge.
Returns
fan out.

◆ EstimateSortCost()

void EstimateSortCost ( THD thd,
AccessPath path,
double  distinct_rows = kUnknownRowCount 
)

Estimate costs and output rows for a SORT AccessPath.

Parameters
thdCurrent thread.
paththe AccessPath.
distinct_rowsAn estimate of the number of distinct rows, if remove_duplicates==true and we have an estimate already.

◆ EstimateStreamCost()

void EstimateStreamCost ( THD thd,
AccessPath path 
)

Estimate the costs and row count for a STREAM AccessPath.

◆ EstimateTableScanCost()

double EstimateTableScanCost ( const TABLE table)
inline

Estimates the cost of a full table scan.

Primarily used to assign a cost to the TABLE_SCAN access path.

Parameters
tableThe table to estimate cost for.
Returns
The cost of scanning the table.

◆ EstimateTemptableAggregateCost()

void EstimateTemptableAggregateCost ( THD thd,
AccessPath path,
const Query_block query_block 
)

Estimate the costs and row count for a Temp table Aggregate AccessPath.

◆ EstimateUpdateRowsCost()

void EstimateUpdateRowsCost ( AccessPath path)

◆ EstimateWindowCost()

void EstimateWindowCost ( AccessPath path)

Estimate the costs and row count for a WINDOW AccessPath.

◆ FindOutputRowsForJoin()

double FindOutputRowsForJoin ( THD thd,
double  left_rows,
double  right_rows,
const JoinPredicate edge 
)
inline

Estimate the number of output rows from joining two relations.

Parameters
thdThe current thread.
left_rowsNumber of rows in the left hand relation.
right_rowsNumber of rows in the right hand relation.
edgeThe join between the two relations.

◆ IndexHeight()

int IndexHeight ( const TABLE table,
unsigned  key_idx 
)
inline

Estimates the height of a B-tree index.

We estimate the height of the index to be the smallest positive integer h such that table_records <= (1 + records_per_page)^h.

This function supports both clustered primary indexes and secondary indexes. Secondary indexes will tend to have more records per page compared to primary clustered indexes and as a consequence they will tend to be shorter.

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
Returns
The estimated height of the index.

◆ IndexLookupCost()

double IndexLookupCost ( const TABLE table,
unsigned  key_idx 
)
inline

Estimates the cost of an index lookup.

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
Returns
The estimated cost of an index lookup.
Note
The model "cost ~ index_height" works well when the Adaptive Hash Index (AHI) is disabled. The AHI essentially works as a dynamic cache for the most frequently accessed index pages that sits on top of the B-tree. With AHI enabled the cost of random lookups does not appear to be predictable using standard explanatory variables such as index height or the logarithm of the number of rows in the index. The performance of AHI will also be dependent on the access pattern, so it is fundamentally difficult to create an accurate model. However, our calibration experiments reveal two things that hold true both with and without AHI enabled:
  1. Index lookups usually take 1-3 microseconds. The height of a B-tree grows very slowly (proportional to log(N)/log(R) for tables with N rows and R records per page), making it near-constant in the common case where tables have many records per page, even without AHI.
  2. Random lookups in large indexes tend to be slower. A random access pattern will cause more cache misses, both for regular hardware caching and AHI. In addition, for a larger B-tree only the top levels fit in cache and we will invariably see a few cache misses with random accesses.

We model the cost of index lookups by interpolating between a model with constant cost and a model that depends entirely on the height of the index. The constants are based on calibration experiments with and without AHI.

Another factor that is important when calibrating the cost of index lookups is whether we are interested in the average cost when performing many lookups such as when performing an index nested loop join or scanning along a secondary non-covering index and lookup into the primary index, or the cost of a single lookup such as a point select that uses an index. From our measurements we see that the average running time of an index lookup can easily be a factor ~5x faster when performing thousands of successive lookups compared to just one. This is likely due to hardware caching effects. Since getting the cost right in the case when we perform many lookups is typically more important, we have opted to calibrate costs based on operations that perform many lookups.

For adding IO costs to this model (future work) we probably want to assume that we only fetch a single page when performing an index lookup, as everything but leaf pages will tend to be cached, at least when performing many index lookups in a query plan, which is exactly the case where it is important to get costs right.

◆ IsClusteredPrimaryKey()

bool IsClusteredPrimaryKey ( const TABLE table,
unsigned  key_idx 
)
inline

Determines whether a given key on a table is both clustered and primary.

Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
Returns
True if the key is clustered and primary, false otherwise.

◆ RowReadCost()

double RowReadCost ( double  num_rows,
double  fields_read_per_row,
double  bytes_per_row 
)
inline

Computes the expected cost of reading a number of rows.

The cost model takes into account the number of fields that is being read from the row and the width of the row in bytes. Both RowReadCostTable() and RowReadCostIndex() call this function and thus use the same cost model.

Parameters
num_rowsThe (expected) number of rows to read.
fields_read_per_rowThe number of fields to read per row.
bytes_per_rowThe total length of the row to be processed (including fields that are not read) in bytes.
Returns
The expected cost of reading num_rows.
Note
It is important that this function be robust to fractional row estimates. For example, if we index nested loop join two primary key columns and the inner table is smaller than the outer table, we should see that num_rows for the inner index lookup is less than one. In this case it is important that we return the expected cost of the operation. For example, if we only expect to read 0.1 rows the cost should be 0.1 of the cost of reading one row (we are working with a linear cost model, so we only have to take the expected number of rows into account, and not the complete distribution).

◆ RowReadCostIndex()

double RowReadCostIndex ( const TABLE table,
unsigned  key_idx,
double  num_rows 
)
inline

Computes the cost of reading a number of rows from an index.

See also
ReadRowCost() for further details.
Parameters
tableThe table to which the index belongs.
key_idxThe position of the key in table->key_info[].
num_rowsThe (expected) number of rows to read.
Returns
The cost of reading num_rows.

◆ RowReadCostTable()

double RowReadCostTable ( const TABLE table,
double  num_rows 
)
inline

Computes the cost of reading a number of rows from a table.

See also
ReadRowCost() for further details.
Parameters
tableThe table to read from.
num_rowsThe (expected) number of rows to read.
Returns
The cost of reading num_rows.

◆ TableAccessIOCost()

double TableAccessIOCost ( const TABLE table,
double  num_rows,
BytesPerTableRow  row_size 
)

Calculate the IO-cost of reading 'num_rows' rows from 'table'.

Variable Documentation

◆ kBlockFillFactor

constexpr double kBlockFillFactor {0.75}
constexpr

This is the estimated fraction of an (innodb) block that is in use (i.e.

not free for future inserts).

◆ kIOByteCost

constexpr double kIOByteCost {0.0549}
constexpr

The additional cost of reading an extra byte from disk.

◆ kIOStartCost

constexpr double kIOStartCost {937.0}
constexpr

We model the IO cost for InnoDB tables with the DYNAMIC row format.

For other storage engines the IO cost is currently set to zero. For other InnoDB row formats, the model may not be a good fit.

We only count the cost of accessing the leaf pages of indexes (clustered or unclustered)that are not already in the buffer pool. For tables/indexes that are (estimated to be) fully cached we get an IO cost of zero. Tables that are small relative to the buffer pool are assumed to be fully cached (all pages are in the buffer pool). The only operations for which we count IO cost are random and sequential page reads.

The cost of a disk IO operation is modeled as an affine function:

io_cost = kIOStartCost + no_of_bytes * kIOByteCost

Note that the granularity of no_of_bytes is the storage engine page size. The values used here are derived from measurements in a cloud setting. This means that they may be wrong in another context, such as when running against a local SSD. Cloud measurements may also give inconsistent results due to heterogeneous cloud hardware, and the impact of activity in other cloud VMs. Ideally we should be able to configure these parameters, or even set them dynamically based on observed behavior.

◆ kMaxEstimatedBytesPerRow

constexpr unsigned kMaxEstimatedBytesPerRow = 8 * 1024
constexpr

The maximum number of bytes to return for row length estimates.

The current value of this constant is set to cover a wide range of row sizes and should capture the most common row lengths in bytes. We place an upper limit on the estimates since we have performed our calibration within this range, and we would like to prevent cost estimates from running away in case the underlying statistics are off in some instances. In such cases we prefer capping the resulting estimate. As a reasonable upper limit we use half the default InnoDB page size of 2^14 = 16384 bytes.

◆ kMaxItemLengthEstimate

constexpr size_t kMaxItemLengthEstimate = 4096
constexpr

When we make cost estimates, we use this as the maximal length the values we get from evaluating an Item (in bytes).

Actual values of e.g. blobs may be much longer, but even so we use this as an upper limit when doing cost calculations. (For context,

See also
Item::max_length .)

◆ kMinEstimatedBytesPerRow

constexpr unsigned kMinEstimatedBytesPerRow = 8
constexpr

The minimum number of bytes to return for row length estimates.

This is mostly to guard against returning estimates of zero, which may or may not actually be able to happen in practice.

◆ kRowEstimateFallback

constexpr ha_rows kRowEstimateFallback = 1000
constexpr

A fallback cardinality estimate that is used in case the storage engine cannot provide one (like for table functions).

It's a fairly arbitrary non-zero value.