WL#7339: Use improved records per key estimate interface in optimizer
Affects: Server-Prototype Only — Status: Complete
WL#7338 implemented a new interface for providing "records per key" values using float values (instead of integers). Using this new interface instead of using the currently used rec_per_key values will give more correct record estimates for ref access against an indexed column. This worklog will change the optimizer to use this new interface. User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html
At the time of completing this worklog, InnoDB will be the only storage engine that provides "records per key" estimates using float values (see WL#6068). As a result, this worklog shall initially only have impact on queries against tables stored in InnoDB and in partitioned InnoDB tables . WL#6068 implements two main changes to how "records per key" estimates are computed: a) records per key estimates are now computed as a floating point value instead of an integer value (where a double value previously was rounded down to the nearest integer value). b) the "division by two" of the actual records per key estimate has been removed. As a consequence, the "records per key" estimates for indexes in InnoDB that this worklog will start using, will be about two times larger than previous, and for values that earlier were 1, the new "records per key" estimate can be up to four times bigger. Non-functional changes/consequences: ==================================== NC-1: Since this worklog will change the index estimates for ref-access and cause plan changes, the performance of the queries where the query plan is changed may be different. Some queries may run faster, other may run slower. NC-2: The heuristics know as "eq-ref extention" that reduces the time the join optimizer spends on optimize a many-table join may be used less. Before this worklog, indexes with actual records per key values up to 4, would be (wrongly) candidates for "eq-ref extiontion". After this worklog, only indexes where the records per key values is actually 1 will be considered by the "eq-ref extention" heuristics. This should result in more optimal plans but may in some cause increase the time to optimize the query. Functional requirements: ======================== F-1: The join optimizer shall use more accurate "records per key" estimates when computing the number of records to read for ref-access (fan-out). This may cause the join order to change. F-2: Better index selection: Due to using floating point values for the record per key estimates, the choice between indexes with almost the same record per key estimate shall be better (before this worklog and WL#6068, indexes with actual records per key estimates between 1 and 4 would all have a records per key estimate of 1). F-3: The rows and filter estimates as printed by explain shall be more correct in cases where they are based on "records per key" estimates. F-4: Cardinality numbers printed by SHOW INDEX/INDEXES/KEYS shall be much more accurate. F-5: Cardinality numbers available in the STATISTICS table in information schema shall be much more accurate. Non-functional requirements: ============================ NF-1: The performance of queries on tables stored in other storage engines than InnoDB shall not be affected by this worklog (at least not until the storage engine start providing records per key estimates using the new interface).  It is likely that other storage engines will provide "records per key" estimates using the new interface at a later time.
The new interface for getting "records per key" values is specified and implemented in WL#7338. This worklog will change the optimizer to use this new interface. The main changes that will be implemented are: 1. All places where the optimizer does direct lookup into the KEY::rec_per_key array for getting rec_per_key information will be replaced by using the new "records per key" interface that is implemented in the KEY struct. 2. All use of integer values for records per key information and corresponding calculations will be changed into using either float or double values. Requirements that needs to be supported: R1. The join optimizer heuristics "eq ref extension" is based on using records per key estimates and cost estimates to determine whether the next table should be directly added to the join plan as an "eq ref" table. When switching from integer to float based "records per key" estimates it is possible that even for tables with unique indexes, the "record per key" estimate from the storage engine might not be exactly 1.0 and thus fail to be recognized as an "eq_ref" table. It must be verified that with the implementation of this worklog, "eq ref extention" still works for tables where the index is unique.
One potential issue found during implementation: 1. Speed of join optimizer: The optimization in the join optimizer to reduce optimization time refered to as "eq ref extention" will likely be used less. The current criteria for adding a table to the join order is *not* testing for "eq ref" but instead testing for that the next table will read the same estimated number of rows (in addition to a few other criteria, see code in sql_planner.cc). In most cases this boils down to using "eq ref extention" if the rec_per_key value is 1. In the old rec_per_key implementation actual "record per key" estimates that where lower than 3-4 would have a rec_per_key value of 1 (in InnoDB) and thus be candidate for "eq ref extension". With more correct floating point values for the rec_per_key estimate, this will cause the "eq ref extension" to likely be used less. This can result in better join orders but the drawback is increased time to compute the plan. One possible solution for this last issue is: a) check that the index used for joining actually is unique and only then use the "eq ref extention". b) possibly add a heuristics that if the "rec_per_key" value for the index is less than e.g 1.4, then it can be used for "eq ref extension".
Copyright (c) 2000, 2022, Oracle Corporation and/or its affiliates. All rights reserved.