|
void | AddCost (THD *thd, const ContainedSubquery &subquery, double num_rows, FilterCost *cost) |
| Used internally by EstimateFilterCost() only. More...
|
|
FilterCost | EstimateFilterCost (THD *thd, double num_rows, Item *condition, const Query_block *outer_query_block) |
| Estimate the cost of evaluating “condition”, “num_rows” times. More...
|
|
FilterCost | EstimateFilterCost (THD *thd, double num_rows, const Mem_root_array< ContainedSubquery > &contained_subqueries) |
| A cheaper overload of EstimateFilterCost() that assumes that all contained subqueries have already been extracted (ie., it skips the walking, which can be fairly expensive). More...
|
|
void | EstimateSortCost (THD *thd, AccessPath *path, double distinct_rows=kUnknownRowCount) |
| Estimate costs and output rows for a SORT AccessPath. More...
|
|
void | EstimateMaterializeCost (THD *thd, AccessPath *path) |
|
double | EstimateDistinctRows (THD *thd, double child_rows, Bounds_checked_array< const Item *const > terms) |
| Estimate the number of rows with a distinct combination of values for 'terms'. More...
|
|
void | EstimateAggregateCost (THD *thd, AccessPath *path, const Query_block *query_block) |
| Estimate costs and result row count for an aggregate operation. More...
|
|
void | EstimateDeleteRowsCost (AccessPath *path) |
|
void | EstimateUpdateRowsCost (AccessPath *path) |
|
void | EstimateStreamCost (AccessPath *path) |
| Estimate the costs and row count for a STREAM AccessPath. More...
|
|
void | EstimateLimitOffsetCost (AccessPath *path) |
| Estimate the costs and row count for a WINDOW AccessPath. More...
|
|
void | EstimateTemptableAggregateCost (THD *thd, AccessPath *path, const Query_block *query_block) |
| Estimate the costs and row count for a Temp table Aggregate AccessPath. More...
|
|
void | EstimateWindowCost (AccessPath *path) |
| Estimate the costs and row count for a WINDOW AccessPath. More...
|
|
double | EstimateSemijoinFanOut (THD *thd, double right_rows, const JoinPredicate &edge) |
| Estimate the fan out for a left semijoin or a left antijoin. More...
|
|
double | FindOutputRowsForJoin (THD *thd, double left_rows, double right_rows, const JoinPredicate *edge) |
| Estimate the number of output rows from joining two relations. More...
|
|
bool | IsClusteredPrimaryKey (const TABLE *table, unsigned key_idx) |
| Determines whether a given key on a table is both clustered and primary. More...
|
|
unsigned | EstimateBytesPerRowTable (const TABLE *table) |
| Estimates the number of bytes that MySQL must process when reading a row from a table, independently of the size of the read set. More...
|
|
unsigned | EstimateBytesPerRowIndex (const TABLE *table, unsigned key_idx) |
| Estimates the number of bytes that MySQL must process when reading a row from a secondary index, independently of the size of the read set. More...
|
|
int | IndexHeight (const TABLE *table, unsigned key_idx) |
| Estimates the height of a B-tree index. More...
|
|
double | RowReadCost (double num_rows, double fields_read_per_row, double bytes_per_row) |
| Computes the expected cost of reading a number of rows. More...
|
|
double | RowReadCostTable (const TABLE *table, double num_rows) |
| Computes the cost of reading a number of rows from a table. More...
|
|
double | RowReadCostIndex (const TABLE *table, unsigned key_idx, double num_rows) |
| Computes the cost of reading a number of rows from an index. More...
|
|
double | EstimateTableScanCost (const TABLE *table) |
| Estimates the cost of a full table scan. More...
|
|
double | IndexLookupCost (const TABLE *table, unsigned key_idx) |
| Estimates the cost of an index lookup. More...
|
|
double | EstimateIndexRangeScanCost (const TABLE *table, unsigned key_idx, double num_ranges, double num_output_rows) |
| Estimates the cost of an index range scan. More...
|
|
double | EstimateIndexScanCost (const TABLE *table, unsigned key_idx) |
| Estimates the cost of an index scan. More...
|
|
double | EstimateRefAccessCost (const TABLE *table, unsigned key_idx, double num_output_rows) |
| Estimates the cost of an index lookup (ref access). More...
|
|
unsigned EstimateBytesPerRowTable |
( |
const TABLE * |
table | ) |
|
|
inline |
Estimates the number of bytes that MySQL must process when reading a row from a table, independently of the size of the read set.
- Parameters
-
table | The table to produce an estimate for. |
- Returns
- The estimated number of bytes.
- Note
- There are two different relevant concepts of bytes per row:
- The bytes per row on the storage engine side.
- The bytes per row on the server side.
One the storage engine side, for InnoDB at least, we compute stats.mean_rec_length as the size of the data file divided by the number of rows in the table.
On the server side we are interested in the size of the MySQL representation in bytes. This could be a more accurate statistic when determining the CPU cost of processing a row (i.e., it does not matter very much if InnoDB pages are only half-full). As an approximation to the size of a row in bytes on the server side we use the length of the record buffer. This does not accurately represent the size of some variable length fields as we only store pointers to such fields in the record buffer. So in a sense the record buffer length is an estimate for the bytes per row but with an 8-byte cap on variable length fields – i.e. better than no estimate, and capped to ensure that costs do not explode.
For now, we will use the record buffer length (share->rec_buff_length) since it is more robust compared to the storage engine mean record length (file->stats.mean_rec_length) in the following cases:
- When the table fits in a single page stats.mean_rec_length will tend to overestimate the record length since it is computed as stats.data_file_length / stats.records and the data file length is at least a full page which defaults to 16384 bytes (for InnoDB at least).
- When the table contains records that are stored off-page it would seem that stats->data_file_length includes the overflow pages which are not relevant when estimating the height of the B-tree.
The downside of using this estimate is that we do not accurately account for the presence of variable length fields that are stored in-page.
double EstimateSemijoinFanOut |
( |
THD * |
thd, |
|
|
double |
right_rows, |
|
|
const JoinPredicate & |
edge |
|
) |
| |
Estimate the fan out for a left semijoin or a left antijoin.
The fan out is defined as the number of result rows, divided by the number of input rows from the left hand relation. For a semijoin, J1:
SELECT ... FROM t1 WHERE EXISTS (SELECT ... FROM t2 WHERE predicate)
we know that the fan out of the corresponding inner join J2:
SELECT ... FROM t1, t2 WHERE predicate
is: F(J2) = CARD(t2) * SELECTIVITY(predicate) , where CARD(t2)=right_rows, and SELECTIVITY(predicate)=edge.selectivity. If 'predicate' is a deterministic function of t1 and t2 rows, then J1 is equivalent to an inner join J3:
SELECT ... FROM t1 JOIN (SELECT DISTINCT f1,..fn FROM t2) d ON predicate
where f1,..fn are those fields from t2 that appear in the predicate.
Then F(J1) = F(J3) = F(J2) * CARD(d) / CARD(t2) = CARD(d) * SELECTIVITY(predicate).
This function therefore collects f1..fn and estimates CARD(d). As a special case, 'predicate' may be independent of t2. The query is then equivalent to:
SELECT ... FROM t1 WHERE predicate AND (SELECT COUNT(*) FROM t2) > 0
The fan out is then the selectivity of 'predicate' multiplied by the probability of t2 having at least one row.
- Parameters
-
thd | The current thread. |
right_rows | The number of input rows from the right hand relation. |
edge | Join edge. |
- Returns
- fan out.
double IndexLookupCost |
( |
const TABLE * |
table, |
|
|
unsigned |
key_idx |
|
) |
| |
|
inline |
Estimates the cost of an index lookup.
- Parameters
-
table | The table to which the index belongs. |
key_idx | The position of the key in table->key_info[]. |
- Returns
- The estimated cost of an index lookup.
- Note
- The model "cost ~ index_height" works well when the Adaptive Hash Index (AHI) is disabled. The AHI essentially works as a dynamic cache for the most frequently accessed index pages that sits on top of the B-tree. With AHI enabled the cost of random lookups does not appear to be predictable using standard explanatory variables such as index height or the logarithm of the number of rows in the index. The performance of AHI will also be dependent on the access pattern, so it is fundamentally difficult to create an accurate model. However, our calibration experiments reveal two things that hold true both with and without AHI enabled:
- Index lookups usually take 1-3 microseconds. The height of a B-tree grows very slowly (proportional to log(N)/log(R) for tables with N rows and R records per page), making it near-constant in the common case where tables have many records per page, even without AHI.
- Random lookups in large indexes tend to be slower. A random access pattern will cause more cache misses, both for regular hardware caching and AHI. In addition, for a larger B-tree only the top levels fit in cache and we will invariably see a few cache misses with random accesses.
We model the cost of index lookups by interpolating between a model with constant cost and a model that depends entirely on the height of the index. The constants are based on calibration experiments with and without AHI.
Another factor that is important when calibrating the cost of index lookups is whether we are interested in the average cost when performing many lookups such as when performing an index nested loop join or scanning along a secondary non-covering index and lookup into the primary index, or the cost of a single lookup such as a point select that uses an index. From our measurements we see that the average running time of an index lookup can easily be a factor ~5x faster when performing thousands of successive lookups compared to just one. This is likely due to hardware caching effects. Since getting the cost right in the case when we perform many lookups is typically more important, we have opted to calibrate costs based on operations that perform many lookups.
For adding IO costs to this model (future work) we probably want to assume that we only fetch a single page when performing an index lookup, as everything but leaf pages will tend to be cached, at least when performing many index lookups in a query plan, which is exactly the case where it is important to get costs right.