MySQL 8.4.2
Source Code Documentation
|
Classes | |
class | AggregateRowEstimator |
This class finds disjoint sets of aggregation terms that form prefixes of some non-hash index, and makes row estimates for those sets based on index metadata. More... | |
Typedefs | |
using | TermArray = Bounds_checked_array< const Item *const > |
Array of aggregation terms. More... | |
Functions | |
TermArray | GetAggregationTerms (const JOIN &join) |
double | EstimateDistinctRowsFromStatistics (THD *thd, TermArray terms, double child_rows) |
Estimate the number of distinct tuples in the projection defined by 'terms'. More... | |
template<typename FunctionLow , typename FunctionHigh > | |
double | SmoothTransition (FunctionLow function_low, FunctionHigh function_high, double lower_limit, double upper_limit, double argument) |
For a function f(x) such that: f(x) = g(x) for x<=l f(x) = h(x) for x>l. More... | |
double | EstimateRollupRowsPrimitively (double aggregate_rows, size_t grouping_expressions) |
Do a cheap rollup row estimate for small result sets. More... | |
double | EstimateRollupRowsAdvanced (THD *thd, double aggregate_rows, TermArray terms) |
Do more precise rollup row estimate for larger result sets. More... | |
double | EstimateAggregateRows (THD *thd, const AccessPath *child, const Query_block *query_block, bool rollup) |
Estimate the row count for an aggregate operation (including ROLLUP rows for GROUP BY ... WITH ROLLUP). More... | |
using anonymous_namespace{cost_model.cc}::TermArray = typedef Bounds_checked_array<const Item *const> |
Array of aggregation terms.
double anonymous_namespace{cost_model.cc}::EstimateAggregateRows | ( | THD * | thd, |
const AccessPath * | child, | ||
const Query_block * | query_block, | ||
bool | rollup | ||
) |
Estimate the row count for an aggregate operation (including ROLLUP rows for GROUP BY ... WITH ROLLUP).
thd | Current thread. |
child | The input to the aggregate path. |
query_block | The query block to which the aggregation belongs. |
rollup | True if we should add rollup rows to the estimate. |
double anonymous_namespace{cost_model.cc}::EstimateDistinctRowsFromStatistics | ( | THD * | thd, |
TermArray | terms, | ||
double | child_rows | ||
) |
Estimate the number of distinct tuples in the projection defined by 'terms'.
We use the following data to make a row estimate, in that priority:
We may need to combine multiple estimates into one. As an example, assume that we aggregate on three fields: f1, f2 and f3. There is and index where f1, f2 are a key prefix, and we have a histogram on f3. Then we could make good estimates for "GROUP BY f1,f2" or "GROUP BY f3". But how do we combine these into an estimate for "GROUP BY f1,f2,f3"? If f3 and f1,f2 are uncorrelated, then we should multiply the individual estimates. But if f3 is functionally dependent on f1,f2 (or vice versa), we should pick the larger of the two estimates.
Since we do not know if these fields are correlated or not, we multiply the individual estimates and then multiply with a damping factor. The damping factor is a function of the number of estimates (two in the example above). That way, we get a combined estimate that falls between the two extremes of functional dependence and no correlation.
thd | Current thread. |
terms | The terms for which we estimate the number of distinct combinations. |
child_rows | The row estimate for the input path. |
double anonymous_namespace{cost_model.cc}::EstimateRollupRowsAdvanced | ( | THD * | thd, |
double | aggregate_rows, | ||
TermArray | terms | ||
) |
Do more precise rollup row estimate for larger result sets.
If we have ROLLUP, there will be additional rollup rows. If we group on N terms T1..TN, we assume that the number of rollup rows will be:
1 + CARD(T1) + CARD(T1,T2) +...CARD(T1...T(N-1))
were CARD(T1...TX) is a row estimate for aggregating on T1..TX.
thd | Current thread. |
aggregate_rows | Number of rows after aggregation. |
terms | The group-by terms. |
double anonymous_namespace{cost_model.cc}::EstimateRollupRowsPrimitively | ( | double | aggregate_rows, |
size_t | grouping_expressions | ||
) |
Do a cheap rollup row estimate for small result sets.
If we group on n terms and expect k rows in total (before rollup), we make the simplifying assumption that each term has k^(1/n) distinct values, and that all terms are uncorrelated from each other. Then the number of rollup rows can be expressed as the sum of a finite geometric series:
1 + m+ m^2+m^3...m^(n-1)
where m = k^(1/n).
aggregate_rows | Number of rows after aggregation. |
grouping_expressions | Number of terms that we aggregated on. |
double anonymous_namespace{cost_model.cc}::SmoothTransition | ( | FunctionLow | function_low, |
FunctionHigh | function_high, | ||
double | lower_limit, | ||
double | upper_limit, | ||
double | argument | ||
) |
For a function f(x) such that: f(x) = g(x) for x<=l f(x) = h(x) for x>l.
tweak f(x) so that it is continuous at l even if g(l) != h(l). We obtain this by doing a gradual transition between g(x) and h(x) in an interval [l, l+k] for some constant k.
function_low | g(x) |
function_high | h(x) |
lower_limit | l |
upper_limit | l+k |
argument | x (for f(x)) |