WL#6526: FORCE INDEX to avoid index dives when possible

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture
Low Level Design

Currently when user specifies FORCE INDEX, optimizer still always calculates
cost in order to pick access method and join order using index dives. Under some
circumstances it's possible to avoid doing this and this could speed up
execution by 40%.

The goal of this WL is to detect if access methods for tables and join order
could be derived from the query and produce query plan without doing index
dives in such cases.

F1) For a single table query, index dives are skipped during execution if:
   a) FORCE INDEX applies to a single index.  
   b) No subquery is present.
   c) Fulltext Index is not involved.
   d) No GROUP-BY or DISTINCT clause.
   e) No ORDER-BY clause.

F2) Not applicable to multi-table query.

F3) i) EXPLAIN format=traditional FOR CONNECTION will see the following changes:
       a) No. of "rows" will change to NULL for the table where index dive is
          skipped. 
       b) The value for "filtered" will be NULL.

    ii) EXPLAIN format=JSON FOR CONNECTION will see the following changes:
        a) "rows_examined_per_scan" and "rows_produced_per_join" will not be 
           mentioned when  index dive is skipped. 
        b) '"skip_index_dive_due_to_force": true' for the table where index dive
           is skipped. 
        c) Cost calculations will not be accurate since index dive was skipped.

    iii) No change in output of EXPLAIN format=traditional and 
         EXPLAIN format=json 

NF1) Queries that qualify the restrictions should see a reduction in execution
     time since an index dive will be skipped.
NF2) No change for queries that don't qualify.

As explained in 
http://jorgenloland.blogspot.in/2012/04/on-queries-with-many-values-in-in.html
"For as long as there have been a range access method in MySQL, the number of 
rows in a range has been estimated by diving down the index to find the start 
and end of the range and use these to count the number of rows between them. 
This technique is accurate, and is therefore a good basis to make the best 
possible execution plan (QEP) for the query. Unfortunately, it involves an 
index dive for each range. "

When a query contains FORCE INDEX hint, it implicitly indicates that the user 
knows the best index for this query. If the decision over which index will be 
used is a foregone conclusion, there is no point in doing index dives for 
cost calcuation. 

So optimizer should detect such scenarios and avoid performing the dive.

Index dives are mainly called from "handler::multi_range_read_info_const". 
So the check over whether an index dive is needed can be done here. 

Changes:

1) TABLE related checks are performed in :
bool TABLE::check_skip_dive(THD *thd)
{
  skipped_index_dive= (force_index &&
                       pos_in_table_list->index_hints->elements == 1 &&
                       !thd->lex->describe &&
                       thd->lex->current_select()->qualifies_skip_dive());
  return skipped_index_dive;
}

2) Query related checks are performed in :
bool SELECT_LEX::qualifies_skip_dive()
{
  return (parent_lex->is_single_level_stmt() &&
          single_table_or_straight_join() &&
          !(has_ft_funcs() ||
            is_grouped() ||
            is_distinct() ||
            is_ordered()));
}

3) The flag TABLE->skipped_index_dive is used to keep track of whether the 
index dive was skipped. This will help in providing optimizer trace and 
explain output.