WL#8244: Hints for subquery strategies

Status: Complete   —   Priority: Medium

Using the new syntax and infrastructure for hints (see WL#8016 and WL#8017), add
hints to control which subquery execution strategies should be used.  This
includes whether or not to use semi-join, which semi-join strategy to use, and,
in case semi-join is not used, whether to use subquery materialization or
in-to-exists transformation.

Note that this work also should make it possible to prevent any of the semi-join
strategy, also Duplicate Weed-out, which is not possible to turn off with
optimizer_switch.

User Documentation
==================

* http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-8.html
* http://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html
F-1: The hints specified in this work log shall conform with the syntax for hints
     as specified in the HLS.
F-2: If SEMIJOIN hint is not listing any strategies, semi-join shall be used if 
     possible.  A strategy for which the corresponding optimizer_switch is on 
     will be used if possible.  If none of these strategies are applicable, the 
     Duplicate Weed-out strategy shall be used.
F-3: If SEMIJOIN hint is given with a specification of possible strategies,
     one of the listed strategies shall be used if possible.
F-4: If the listed strategies of a SEMIJOIN hint are not applicable for the 
     given query, the Duplicate Weed-out strategy shall be used.    
F-5: If NO_SEMIJOIN hint is given without listing any strategies, semi-join
     shall not be used.
F-6: If NO_SEMIJOIN hint is given with a specification of strategies not to use,
     none of the listed strategies shall be used unless the list include all
     strategies that are applicable for the query.
F-7: If the listed strategies of a NO_SEMIJOIN hint include all applicable 
     strategies for the given query, the Duplicate Weed-out strategy shall 
     be used.
F-9: If a subquery is nested within another subquery and both subqueries are    
     merged into a semi-join of an outer query, any specification of semi-join
     strategies for the inner-most query will be ignored.  SEMIJOIN/NO_SEMIJOIN 
     hints can still be used force/prevent semi-join transformation for such 
     nested subqueries.
F-10: If a SUBQUERY hint is given, the given strategy shall be used for subquery
      execution if possible.
F-11: A SUBQUERY(MATERIALIZATION) hint may be ignored when the query cannot 
      be executed with subquery materialization. 
F-12: If two or more of the SEMIJOIN, NO_SEMIJON, and SUBQUERY hints are 
      specified (of same or different type), the first hint will have effect 
      while subsequent hints will be ignored.  If this happens a warning will be
      given.
F-13: If new optimizer_switch, duplicateweedout, is off, Duplicate Weed-out
      strategy shall not be used unless all other strategies that are 
      applicable for the given query, is also turned off. 
The hints to be added this work log are all query block level hints as specified
in WL#8017.  The hints have the following syntax:

  HINT([@QB_NAME] [arguments])

If QB_NAME is specified, this hint affects the query block with give the given
query block name, otherwise it belongs to current query block.  For subquery
hints this means that the hint should normally either be placed in the relevant
subquery block, or in the top query block with a specification of which query
block it should apply to.

The following optimizer hints will be implemented as part of this worklog:
   
1. Specify which semi-join strategies can be used

   Syntax:
	/*+ SEMIJOIN( [strategy [,strategy] ... ] ) */
   	strategy ::= MATERIALIZATION|FIRSTMATCH|LOOSESCAN|DUPSWEEDOUT

   If no strategy is specified, all strategies, for which the corresponding 
   optimizer_switch is on, are allowed.  

   If semi-join transformation is possible, but none of the specified strategies
   are applicable for the query, the list of strategies will be ignored.  That 
   is, another strategy may be used instead. 

2. Specify which semi-join strategies should NOT be used
   
   Syntax:
        /*+ NO_SEMIJOIN( [strategy [,strategy] ... ] ) */
   	strategy ::= MATERIALIZATION|FIRSTMATCH|LOOSESCAN|DUPSWEEDOUT
   
   If no strategy is specified, semi-join transformation will not be used.
   
   If both Duplicate Weed-out and FirstMatch is turned off by a NO_SEMIJOIN   
   hint, one these strategies may still be selected if the two other possible 
   strategies are not applicable.  A warning that the hint was ignored will
   then be given.

  
3. Specify whether Subquery Materialiation or In-to-exists should be used.
   
   Syntax: 
        /*+ SUBQUERY(strategy) */
        strategy ::= MATERIALIZATION|INTOEXISTS
    
   Specifying this hint will also turn off semijoin transformation for the
   given query block.

   If MATERIALIZATION is specified when subquery materialization is not 
   possible, the hint will be ignored.

Conflicting hints:

Only one of the above hints may be specified for each sub-query block.  If two
or more hints are specified (of same or different type), the first hint will
have effect while subsequent hints will be ignored.

Warnings:

A warning will be given if multiple of above hints are given for the same query
block.  In all other cases, a warning will NOT be given when a hint is ignored.

Duplicate Weed-out strategy:
A new optimizer_switch, duplicateweedout, will be added to make it  possible
to turn off Duplicate Weed-out.  Existing tests will be updated to make use 
of this optimizer_switch when testing specific strategies.  As for hints, 
if both firstmatch and duplicateweedout is set to off, duplicateweedout may
still be selected if remaining strategies are not applicable to the query.

If Duplicate Weed-out is turned off (either by optimizer_switch or hints), it
has been observed that one sometimes will get a plan that is far from optimal.
This is due to the heuristic pruning during greedy search, and one can avoid
this behavior by setting optimizer_prune_level=0.  
Basic Approach
---------------

When checking whether to transform a subquery to semi-join, SEMIJOIN/NO_SEMIJOIN
hints will be checked and decision based on rules described above.  Likewise,
SUBQUERY hint will be checked when deciding which subquery execution strategy to
choose.

Which semi-join strategies that are available for use will be stored in a bit
mask in the nested_join object of the corresponding semi-join nest when subquery
is converted to semi-join.  This bit mask will take into account both hints and
optimizer_switch settings, and will be checked where optimizer_switch settings
are currently checked.  For an IN-subquery that is nested within another IN-
subquery where both are merged into the same semi-join nest, hints will be ignored. 


Handling Duplicate Weed-out
---------------------------

Currently, it is not possible to turn-off Duplicate Weed-out strategy with
optimizer_switch.  The main reason for this is that greedy search assumes that
for any join order, at least one semi-join strategy can be applied.  In order to
still satisfy this assumption, it will during greedy search still be possible to
select Duplicate Weed-out strategy even when it is turned off.  However, if one
later in greedy search find another join order where a different strategy
applies, one will switch to this plan even if it is more costly than the earlier
found plan with Duplicate Weed-out.  This approach also requires changes to how
greedy search does plan pruning: If Duplicate Weed-out is turned off, plans
should not be pruned as long as the only plan found, use Duplicate Weed-out. 
Otherwise, we risk that we never evaluate join orders that permit alternative
strategies.

In more detail, this approach requires the following changes:

1. A new member in Optimize_table_order, found_plan_without_dupsweedout,
   that indicates whether plans without Duplicate Weed-out have been found 
   so far during greedy search.  (Note that only semi-join nest for which
   Duplicate Weed-out is turned off will matter.  Duplicate Weed-out may still
   be used for other semi-join nests.)

2. Change Optimize_table_order::consider_plan() to switch to a more
   costly plan if the best plan so far use Duplicate Weed-out for semi-join
   nests where it is turned off, while the current plan does not.

3. Do not prune plans if found_plan_without_dupsweedout is false.  There are
   two types of pruning that will be changed.  The pruning of partial plans 
   that are more costly than the best complete plan found so far, and the 
   heuristic pruning that prunes less promising partial plans.

4. In Optimize_table_order::advance_sj_state(), if Duplicate Weed-out strategy
   is disabled, do not select it when there are other applicable strategies for 
   this join order.


Alternatives that have been considered
--------------------------------------

The original idea was to give plans with Duplicate Weed-out a very high cost so
that other strategies are selected if possible.  However, this has some issues:

1. We would no longer be able to distinguish cost of different plans
   with Duplicate Weed-out.  Hence, we may end up with a bad plan with
   Duplicate Weed-out instead of a good plan with Duplicate Weed-out.

2. The cost of a Duplicate Weed-out plans as presented by EXPLAIN
   FORMAT=JSON would be misleading