MySQL 9.0.0
Source Code Documentation
cost_constants.h
Go to the documentation of this file.
1/* Copyright (c) 2023, 2024, Oracle and/or its affiliates.
2
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License, version 2.0,
5 as published by the Free Software Foundation.
6
7 This program is designed to work with certain software (including
8 but not limited to OpenSSL) that is licensed under separate terms,
9 as designated in a particular file or component or in included license
10 documentation. The authors of MySQL hereby grant you an additional
11 permission to link the program and your derivative works with the
12 separately licensed software that they have either included with
13 the program or referenced in the documentation.
14
15 This program is distributed in the hope that it will be useful,
16 but WITHOUT ANY WARRANTY; without even the implied warranty of
17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18 GNU General Public License, version 2.0, for more details.
19
20 You should have received a copy of the GNU General Public License
21 along with this program; if not, write to the Free Software
22 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
23
24#ifndef SQL_JOIN_OPTIMIZER_COST_CONSTANTS_H_
25#define SQL_JOIN_OPTIMIZER_COST_CONSTANTS_H_
26
27/**
28 @file sql/join_optimizer/cost_constants.h Hypergraph optimizer cost constants.
29
30 This file contains cost constants that are used during optimization by the
31 hypergraph optimizer. Ideally all (server) cost constants should be contained
32 in this file, but some code paths might still lead to the old cost model (with
33 constants in sql/opt_costconstants.h).
34
35 As we integrate more storage engines into the cost model we may add
36 engine-specific constants. Eventually we might make some constants (or groups
37 of related constants) user-configurable to provide users with the opportunity
38 to customize the cost model to better reflect their actual costs.
39
40 The cost constants here have generally been calibrated in microseconds using
41 regression analysis on a release build of the server. In order to avoid tying
42 these constants to the execution time on a particular machine we define a cost
43 unit in terms of a fundamental operation in MySQL (reading a row during a
44 table scan, @see kUnitCostInMicroseconds). Cost constants are then defined
45 relative to the unit cost, with the idea that the ratio between running times
46 is less sensitive to changes in hardware.
47
48 For this batch of constants we include a particular measure of the unit cost
49 in terms of microseconds. When adjusting the cost model in the future the
50 following approach should be adopted:
51
52 1. Determine the unit cost c1 in microseconds.
53 2. Determine the cost c2 of the constant of interest in microseonds.
54 3. Set the value of the constant to the ratio c2 / c1.
55*/
56
57/// We define the cost unit for the MySQL hypergraph cost model as follows: A
58/// cost of 1.0 represents the average cost per row of a "SELECT * FROM t" table
59/// scan where t is an InnoDB table with ten integer columns and one million
60/// rows. We assume that the InnoDB table is optimized (pages are full) and
61/// loaded into the buffer pool.
62constexpr double kUnitCostInMicroseconds = 0.434;
63
64/*
65 To compute the server cost of reading a row we use the following three
66 constants: kReadOneRowCost, kReadOneFieldCost, and kReadOneBytecost. For a
67 table scan under InnoDB the cost of reading one row is based on the following
68 model which has been calibrated using linear regression and predicts actual
69 running time well on tables with integer columns:
70
71 cost of reading a single row = kReadOneRowCost + kReadOneFieldCost *
72 num_fields_in_read_set + kReadOneByteCost * length_of_record_buffer_in_bytes.
73
74 In the future the cost model for reading rows may be extended to include
75 storage engine specific costs and IO cost.
76*/
77
78/// Fixed cost of reading a row from the storage engine into the record buffer.
79/// Used in base table access paths such as TABLE_SCAN, INDEX_SCAN,
80/// INDEX_RANGE_SCAN.
82
83/// Cost of per field in the read set. Used to account for the increase in cost
84/// when reading more fields from a row.
86
87/// Overhead per byte when reading a row. With a row-based format we have to
88/// process more data to extract the same number of fields when rows are larger,
89/// as measured by row length in bytes.
90///
91/// Note: This constant has been calibrated on tables with integer columns. We
92/// should therefore be careful about applying this cost to variable-length
93/// fields that are stored off-page. We use the length of the record buffer
94/// (TABLE_SHARE::rec_buff_length).
95constexpr double kReadOneByteCost = 0.001 / kUnitCostInMicroseconds;
96
97/// Cost of evaluating one filter on one row. Calibrated using simple integer
98/// filters, e.g. x < k, so it might be prudent to use a higher number, but then
99/// again, almost everything is calibrated on integers.
100///
101/// From calibration experiments we would prefer a cost model for filtering to
102/// consist of a fixed cost for filtering the row, together with a variable cost
103/// for the number of filter operations:
104///
105/// cost = kFilterOneRowCost + kApplyOneFilterCost * num_filter_evaluations
106///
107/// The expected number of filter evaluations for a row can be estimated. For
108/// example, the condition x < k1 AND x < k2 will require more filter
109/// evaluations if the selectivity of x < k1 is high, as then the second
110/// condition will also have to be evaluated. If we consider x < k1 OR x < k2,
111/// then a low selectivity of the first term will make it likely that the second
112/// term will have to be evaluated as well. Unfortunately the current cost model
113/// only provides partial support for these mechanisms, and does not support
114/// using a fixed filtering cost per row, so the constant has been adjusted to
115/// reflect this, pending a rewrite/refactoring of the filtering cost.
116/// @note See the filtering section in hypergraph_cost_model.test.
118
119// For index lookups the Adaptive Hash Index (AHI) makes it difficult to
120// accurately predict costs. We opt to interpolate between a cost model with and
121// without AHI. See IndexLookupCost() in cost_model.h for further details.
122
123/// The cost per page that is visited when performing an index lookup in an
124/// InnoDB B-tree. When the Adaptive Hash Index (AHI) is disabled the number of
125/// pages visited when performing an index lookup is equal to the height of the
126/// index since we traverse the tree from the root node to a leaf node,
127/// performing a binary search within each page. This constant has been
128/// calibrated with AHI disabled.
130
131/// Fixed cost of an index lookup when AHI is enabled (default).
133
134/// Default cost of an index lookup when we are missing information to compute a
135/// more accurate cost estimate. Used e.g. with the MEMORY engine when computing
136/// the cost of index operations on a secondary non-covering index.
137// TODO(tlchrist): Calibrate this.
139
140/// Fixed overhead per input row when sorting. This represents the cost of
141/// reading a row into the sort buffer. The accuracy of the cost model could be
142/// further improved if we take into account the amount of data that is read
143/// into the sort buffer.
144constexpr double kSortOneRowCost = 0.15 / kUnitCostInMicroseconds;
145
146/// Cost per comparison during sorting. Calibrated using ORDER BY on a single
147/// INT column. The cost is of course higher if we sort on multiple columns, and
148/// of the data type is something more complex, but not so much higher that it
149/// is clear that it would be worth taking this into account in the cost model.
151
152/// Hash join constants.
156
157/// In need of calibration.
163
164#endif // SQL_JOIN_OPTIMIZER_COST_CONSTANTS_H_
constexpr double kHashReturnOneRowCost
Definition: cost_constants.h:155
constexpr double kUnitCostInMicroseconds
We define the cost unit for the MySQL hypergraph cost model as follows: A cost of 1....
Definition: cost_constants.h:62
constexpr double kReadOneFieldCost
Cost of per field in the read set.
Definition: cost_constants.h:85
constexpr double kTempTableAggLookupCost
Definition: cost_constants.h:162
constexpr double kHashBuildOneRowCost
Hash join constants.
Definition: cost_constants.h:153
constexpr double kWindowOneRowCost
Definition: cost_constants.h:161
constexpr double kApplyOneFilterCost
Cost of evaluating one filter on one row.
Definition: cost_constants.h:117
constexpr double kReadOneRowCost
Fixed cost of reading a row from the storage engine into the record buffer.
Definition: cost_constants.h:81
constexpr double kIndexLookupPageCost
The cost per page that is visited when performing an index lookup in an InnoDB B-tree.
Definition: cost_constants.h:129
constexpr double kMaterializeOneRowCost
Definition: cost_constants.h:160
constexpr double kSortComparisonCost
Cost per comparison during sorting.
Definition: cost_constants.h:150
constexpr double kIndexLookupFixedCost
Fixed cost of an index lookup when AHI is enabled (default).
Definition: cost_constants.h:132
constexpr double kHashProbeOneRowCost
Definition: cost_constants.h:154
constexpr double kSortOneRowCost
Fixed overhead per input row when sorting.
Definition: cost_constants.h:144
constexpr double kStreamOneRowCost
Definition: cost_constants.h:159
constexpr double kAggregateOneRowCost
In need of calibration.
Definition: cost_constants.h:158
constexpr double kReadOneByteCost
Overhead per byte when reading a row.
Definition: cost_constants.h:95
constexpr double kIndexLookupDefaultCost
Default cost of an index lookup when we are missing information to compute a more accurate cost estim...
Definition: cost_constants.h:138