WL#7339: Use improved records per key estimate interface in optimizer

Affects: Server-Prototype Only — Status: Complete

Description
Requirements
Dependent Tasks
High Level Architecture
Low Level Design

WL#7338 implemented a new interface for providing "records per key" values using
float values (instead of integers). Using this new interface instead of using the
currently used rec_per_key values will give more correct record estimates for
ref access against an indexed column. This worklog will change the optimizer to
use this new interface.

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html

At the time of completing this worklog, InnoDB will be the only
storage engine that provides "records per key" estimates using
float values (see WL#6068). As a result, this worklog shall initially
only have impact on queries against tables stored in InnoDB and in
partitioned InnoDB tables [1].

WL#6068 implements two main changes to how "records per key" estimates
are computed:

 a) records per key estimates are now computed as a floating point value
    instead of an integer value (where a double value previously was rounded
    down to the nearest integer value).
 b) the "division by two" of the actual records per key estimate has been
    removed.

As a consequence, the "records per key" estimates for indexes in InnoDB that
this worklog will start using, will be about two times larger than previous, and
for values that earlier were 1, the new "records per key" estimate can be up to
four times bigger.

Non-functional changes/consequences:
====================================

NC-1: Since this worklog will change the index estimates for ref-access and
      cause plan changes, the performance of the queries where the query plan
      is changed may be different. Some queries may run faster, other may
      run slower.

NC-2: The heuristics know as "eq-ref extention" that reduces the time the
      join optimizer spends on optimize a many-table join may be
      used less. Before this worklog, indexes with actual records per key
      values up to 4, would be (wrongly) candidates for "eq-ref extiontion".
      After this worklog, only indexes where the records per key values is
      actually 1 will be considered by the "eq-ref extention" heuristics. This 
      should result in more optimal plans but may in some cause increase the
      time to optimize the query.

Functional requirements:
========================

F-1: The join optimizer shall use more accurate "records per key" estimates when
     computing the number of records to read for ref-access (fan-out). This may
     cause the join order to change.

F-2: Better index selection: Due to using floating point values for the record
     per key estimates, the choice between indexes with almost the same record
     per key estimate shall be better (before this worklog and WL#6068, indexes
     with actual records per key estimates between 1 and 4 would all have a
     records per key estimate of 1).

F-3: The rows and filter estimates as printed by explain shall be more correct
     in cases where they are based on "records per key" estimates.

F-4: Cardinality numbers printed by SHOW INDEX/INDEXES/KEYS shall be much more
     accurate.

F-5: Cardinality numbers available in the STATISTICS table in information
     schema shall be much more accurate.

Non-functional requirements:
============================

NF-1: The performance of queries on tables stored in other storage engines
      than InnoDB shall not be affected by this worklog (at least not until
      the storage engine start providing records per key estimates using the
      new interface).

[1] It is likely that other storage engines will provide "records per key"
    estimates using the new interface at a later time.

WL#6068: Remove InnoDB rec_per_keys hack
WL#7338: Interface for improved records per key estimates

The new interface for getting "records per key" values is specified and
implemented in WL#7338. This worklog will change the optimizer to use this new
interface. The main changes that will be implemented are:

1. All places where the optimizer does direct lookup into the KEY::rec_per_key[]
   array for getting rec_per_key information will be replaced by using the new
   "records per key" interface that is implemented in the KEY struct.

2. All use of integer values for records per key information and corresponding
   calculations will be changed into using either float or double values.

Requirements that needs to be supported:

R1. The join optimizer heuristics "eq ref extension" is based on using records
    per key estimates and cost estimates to determine whether the next table
    should be directly added to the join plan as an "eq ref" table. When
    switching from integer to float based "records per key" estimates it is
    possible that even for tables with unique indexes, the "record per key"
    estimate from the storage engine might not be exactly 1.0 and thus fail to
    be recognized as an "eq_ref" table. It must be verified that with the
    implementation of this worklog, "eq ref extention" still works for tables
    where the index is unique.

One potential issue found during implementation:

1. Speed of join optimizer: The optimization in the join optimizer to reduce
optimization time refered to as "eq ref extention" will likely be used less. The
current criteria for adding a table to the join order is *not* testing for "eq
ref" but instead testing for that the next table will read the same estimated
number of rows (in addition to a few other criteria, see code in
sql_planner.cc). In most cases this boils down to using "eq ref extention" if
the rec_per_key value is 1. In the old rec_per_key implementation actual "record
per key" estimates that where lower than 3-4 would have a rec_per_key value of 1
(in InnoDB) and thus be candidate for "eq ref extension". With more correct
floating point values for the rec_per_key estimate, this will cause the "eq ref
extension" to likely be used less. This can result in better join orders but the
drawback is increased time to compute the plan.

One possible solution for this last issue is:

  a) check that the index used for joining actually is unique and only then use
the "eq ref extention".
  b) possibly add a heuristics that if the "rec_per_key" value for the index is
less than e.g 1.4, then it can be used for "eq ref extension".