WL#6068: Remove InnoDB rec_per_keys hack
Status: Complete
The InnoDB code contains the following hack: /* Since MySQL seems to favor table scans too much over index searches, we pretend index selectivity is 2 times better than our estimate: */ rec_per_key = rec_per_key / 2; The validity of this hack should be evaluated. If it is determined that this, in general, leads to better plans, it should be determined how the Optimizer could be changed to avoid this hack. (Note that several cases have been observed where this hack actually leads to sub-optimal query plans.) Another issue is that with the new InnoDB persistent statistics, this leads to a mismatch between the statistics stored in the statistics tables and the numbers reported by SHOW INDEX since the latter is based on rec_per_key. It should also be considered whether it is better that InnoDB reports the estimated cardinality of column values directly instead of using rec_per_key. Some undesired effects have been seen for small values of rec_per_key that could have been avoided if the Optimizer had access to the original cardinality numbers. It was decided to still use "number of records per distinct key (rec_per_key)" because the optimizer code uses that metric internally, but provide that as a floating point number and without the /2 hack.
Since this WL will fill in a new value added by "WL#7338 Interface for improved records per key estimates", but that value is going to be unused, this WL will not cause any visible changes.
In "WL#7338 Interface for improved records per key estimates" a new interface has been added to ship a floating point rec_per_key in addition to the existent one. In this WL InnoDB will supply the new floating point rec_per_key, without the /2 hack. Internally InnoDB derives "number of distinct records (n_diff)" and calculates the to-be-shipped value like rec_per_key = total_row_in_the_table / n_diff for the integer rec_per_key an integer division is used. The new floating point rec_per_key will be calculated in the same way, but using a floating point division.
Calculate the rec_per_key as a floating point number (rec_per_key_t) inside InnoDB (function innodb_rec_per_key()) and ship it to MySQL using the recently added KEY::set_records_per_key() method. To preserve the current behavior (until the Optimizer starts using the new floating point values) - truncate the rec_per_key to an integer and provide it via the old mechanism.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.