WL#13929: Introduce new optimizer switch to disable limit optimization

Affects: Server-8.0   —   Status: Complete

The optimization to switch from a non-ordering index to an
ordering index for group by and order by when there is a
limit clause goes very wrong for certain queries. Some of
the problems with the algorithm w.r.t this aspect are:

1.Algorithm assumes that data distribution is uniform.
For ex: If a query qualifies 10 rows out of 100 rows,
it presumes that every 10th row will qualify. However
if all the 10 qualifying rows are not distributed uniformly
and if the algorithm picks a ordering index over ref/range
index, then server ends up reading way too many rows than
its estimation. This results in a bad plan compared to the
original selection made by the optimizer.

2.Algorithm presumes that non-ordering index and the
ordering index being compared are not-correlated.

3.Algorithm is not aware of additional predicates that
could be pushed later using index condition pushdown for
a range/ref access method. This would further result in
reduction of qualifying rows for non-ordering index.
So the estimations for the final rows to be read for an
ordering index is not right resulting in picking ordering
index while the non-ordering index would have performed
much better.

4.The cost model used in the algorithm is flawed in arriving
at index scan time for the ordering index.Also, its
preference to pick covering index even when the ordering index
scan time is no-better and preferring to switch to ordering
index when group by is present without taking into consideration
other parameters results in bad plans for certain queries.

There are also other cases where it does not make the right
decision of choosing ordering index. W.r.t this aspect, there
are problems too.

1.The cost model used in the algorithm is flawed. It does not
take into consideration the external sort cost for a non-ordering
index.

2.Picking of ordering index is done very late. Ideally it should
be part of the join order decision making which could take
advantage of the requested order and also the limit clause.

As seen in the many bugs reported over years, this is an
optimization that can pick good plans but also very bad plans too.

The optimizer switch which is introduced as part of this patch
enables users to disable this optimization if the switch to
ordering index results in bad plan. However the other problem of
not switching to ordering index needs a change in the algorithm
itself and cannot be addressed with the introduction of this switch.

We have tested and analyzed most of the bugs reported in this area
and identified the above problems with the algorithm. We will not
be solving these problems with this change, but giving user an
option to not use the optimization.
Functional Requirements:

F1 - The new optimizer switch "prefer_ordering_index" will be similar to other 
switchable optimizations in its usability. Please refer 
https://dev.mysql.com/doc/refman/8.0/en/switchable-optimizations.html for the 
details of existing switchable optimizations.

F2 - Hint for "order by/group by" (force index for orderby/group by) will override 
the value for switch prefer_ordering_index i.e if an index is forced for order by, 
even if prefer_ordering_index is "off", optimizer will use the limit optimization 
to pick an ordering index

Non-functional requirements:

NF1 - There should not be any change in plans for any queries when using the 
default value (on) for the switch.

NF2 - When the switch prefer_ordering_index is set to "off", "limit optimization" 
will not be used. So all the queries which were switching to ordering index 
because of this optimization should not switch to ordering index.

A new optimizer switch "prefer_ordering_index" is introduced to enable/disable 
switching to an ordering index over the already chosen access plan because of 
limit clause. By default the switch is on which means optimizer will switch to 
ordering index over a non-ordering one if found better by the limit optimization. 
Users can disable if the switch to the ordering index results in bad performance.

The new optimizer switch can be used in similar lines to existing switchable 
optimizations.
SET [GLOBAL|SESSION] optimizer_switch='command[,command]...';

Please refer https://dev.mysql.com/doc/refman/8.0/en/switchable-optimizations.html
for details of existing switchable optimizations and how they can be used.

Example:

SET optimizer_switch = "prefer_ordering_index=on";
EXPLAIN SELECT non_covered_column FROM t WHERE other_id > 3 ORDER BY id ASC LIMIT 
2;
id      select_type     table   partitions      type    possible_keys   key     
key_len ref     rows    filtered        Extra
1       SIMPLE  t       NULL    index   index_other_id_covered_column   PRIMARY 8       
NULL    2       70.00   Using where

SET optimizer_switch = "prefer_ordering_index=off";
EXPLAIN SELECT non_covered_column FROM t WHERE other_id > 3 ORDER BY id ASC LIMIT 
2;
id      select_type     table   partitions      type    possible_keys   key     
key_len ref     rows    filtered        Extra
1       SIMPLE  t       NULL    range   index_other_id_covered_column   
index_other_id_covered_column   8       NULL    7       100.00  Using index 
condition; Using filesort

With prefer_ordering_index being "off", ordering index 
"index_other_id_covered_column" is not picked. Hence the need for external sort.