We use the following data to make a row estimate, in that priority:
- (Non-hash) indexes where the aggregation terms form some prefix of the index key. The handler can give good estimates for these.
- Histograms for aggregation terms that are fields. The histograms give an estimate of the number of unique values.
- The table size (in rows) for terms that are fields without histograms. (If we have "SELECT ... FROM t1 JOIN t2 GROUP BY t2.f1", there cannot be more results rows than there are rows in t2.) We also make the pragmatic assumption that that field values are not unique, and therefore make a row estimate somewhat lower than the table row count.
- In the remaining cases we make an estimate based on the input row estimate. This is based on two assumptions: a) There will be fewer output rows than input rows, as one rarely aggregates on a set of terms that are unique for each row, b) The more terms there are, the more output rows one can expect.
We may need to combine multiple estimates into one. As an example, assume that we aggregate on three fields: f1, f2 and f3. There is and index where f1, f2 are a key prefix, and we have a histogram on f3. Then we could make good estimates for "GROUP BY f1,f2" or "GROUP BY f3". But how do we combine these into an estimate for "GROUP BY f1,f2,f3"? If f3 and f1,f2 are uncorrelated, then we should multiply the individual estimates. But if f3 is functionally dependent on f1,f2 (or vice versa), we should pick the larger of the two estimates.
Since we do not know if these fields are correlated or not, we multiply the individual estimates and then multiply with a damping factor. The damping factor is a function of the number of estimates (two in the example above). That way, we get a combined estimate that falls between the two extremes of functional dependence and no correlation.
- Parameters
-
terms | The aggregation terms. |
child_rows | The row estimate for the input path. |
trace | Append optimizer trace text to this if non-null. |
- Returns
- The row estimate for the aggregate operation.