WL#9158: Join Order Hints

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

This task is about hints what allows to control table order for the join
execution. Currently MySQL has limited possibility to do this. STRAIGHT_JOIN
hint can be used to force the optimizer to join the tables is the order in which
they are listed in the FROM clause. The goal of this task is to add more join
order hints using new hint infrastructure.

Proposed hints are:

 JOIN_FIXED_ORDER  similar to existing STRAIGHT_JOIN hint, will be used as
                   replacement of the old hint later.
 JOIN_ORDER   instructs the optimizer to use the specified table order.
 JOIN_PREFIX  instructs the optimizer to use the specified table order for the
              first tables of the join execution plan.
 JOIN_SUFFIX  instructs the optimizer to use the specified table order for the
              last tables of the join execution plan.

User Documentation

Functional requirements:

F-1: Hints must be enclosed into /*+ */ comment.

F-2: Hints must be specified after SELECT|INSERT|REPLACE|UPDATE|DELETE key words.

F-3: EXPLAIN must understand hints.

F-4: Active hints must be printed in EXPLAIN warning.

F-5: Subsequent conflicting/duplicating hints are ignored with warning.

F-6: Unresolved hints cause warning.

Non-Functional requirements:

NONE

1. JOIN_FIXED_ORDER

   JOIN_FIXED_ORDER([@QB_NAME])

   Hint forces the optimizer to join the tables is the order in which they are
   listed in the FROM clause.

2. JOIN_ORDER hint.

   Instructs the optimizer to use the specified table order.
   The hint can be applied to a set of tables. Tables that
   are not specified may be placed anywhere in the join order,
   even in between specified tables. 

   Specification:

   JOIN_ORDER([@QB_NAME] table[, table]...)


3. JOIN_PREFIX hint.

   Instructs the optimizer to use the specified table order for the
   first tables of the join execution plan. The hint can be applied
   to a set of tables. All other join tables will come after the
   given tables.

   Specification:

   JOIN_PREFIX([@QB_NAME] table[, table]...)


4. JOIN_SUFFIX hint.

   Instructs the optimizer to use the specified table order for the
   last tables of the join execution plan. The hint can be applied
   to a set of tables. All other join tables will come before the
   given tables.

   Specification:

   JOIN_SUFFIX([@QB_NAME] table[, table]...)

Note that hints should be able to control behavior of the semi-join tables
which are merged to the outer query block.

Example:

SELECT
/*+ JOIN_PREFIX(t2, t5@subq2, t4@subq1)
    JOIN_ORDER(t4@subq1, t3) 
    JOIN_SUFFIX(t1) */
count(*) FROM t1 JOIN t2 JOIN t3
           WHERE t1.f1 IN (SELECT /*+ QB_NAME(subq1) */ f1 FROM t4)
             AND t2.f1 IN (SELECT /*+ QB_NAME(subq2) */ f1 FROM t5);

If sub-queries subq1, subq2 are converted into semi-join, tables t4@subq1,
t5@subq2 are merged to the outer query block. In this case, the hint specified
in the outer query block should be able to control behavior of
t4@subq1, t5@subq2 tables.


Resolving conflicting hints.

 1. Conflicting hints
  
  In some cases hints can conflict. For example if JOIN_ORDER and JOIN_PREFIX
  have table orders what are impossible to apply at the same time:

  SELECT /*+ JOIN_ORDER(t1, t2) JOIN_PREFIX(t2, t1) */ ... FROM t1,t2;

  In this case, the first specified hint is applied and next conflicting hints
  are ignored without warning. If it's impossible to apply valid hint, it will
  be silently ignored without any warning.

 2. Only one JOIN_PREFIX and JOIN_SUFFIX hint of each type will be applied.
  Second specified hint of the same type is ignored with warning.
  JOIN_ORDER can be specified several times.

  For example,

  /*+ JOIN_PREFIX(t1) JOIN_PREFIX(t2) */

  Second JOIN_PREFIX is ignored with warning.

  /*+ JOIN_PREFIX(t1) JOIN_SUFFIX(t2) */

  Both hints are applicable, no warning in this case.

  /*+ JOIN_ORDER(t1, t2) JOIN_ORDER(t2, t3) */

  Both hints are applicable, no warning in this case.

 3. Ignored hints.

  Hint is ignored if a table specified in the hint has a circular dependency.

  For example:

  /*+ JOIN_ORDER(t1, t2) JOIN_PREFIX(t2, t1) */  
  
  JOIN_ORDER hint sets table 't2' dependent on t1. JOIN_PREFIX is ignored since
  it table 't1' can not be dependent on 't2'. Ignored hints are not printed in
  in EXPLAIN message.

 4. Interaction with const tables.

  MySQL optimizer places const tables first in the table order, and the
  position of a const table can not be affected by hints.  References to const
  tables in join order hints will be ignored. Hint is still applicable.

  For example:
  JOIN_ORDER(t1, CONST_T, t2) is equivalent to JOIN_ORDER(t1, t2).

  Note that the accepted hints shown in the warning message of EXPLAIN, will
  include const tables as they were specified.

 5. Interaction with the types of JOINs.

  MySQL supports several type of joins, LEFT|RIGHT|INNER|CROSS|STRAIGHT_JOIN
  joins. If the hint conflicts with the specified type of JOIN, it is ignored
  without warning.

  For example:

  SELECT /*+ JOIN_PREFIX(t1, t2) */FROM t2 LEFT JOIN t1; 

  Here is conflict between requested table order in the hint and JOIN
  order(LEFT JOIN).

Setting of the necessary table order is carried out by changing the table dependencies. Optimizer selects requested order using updated dependencies.
Table dependencies are stored in JOIN_TAB::dependent field.

See sql_select.h

  /**
    The set of tables that this table depends on. Used for outer join and
    straight join dependencies.
  */
  table_map     dependent;

Dependencies are set in JOIN::make_join_plan() method, 
see JOIN::init_planner_arrays(), JOIN::propagate_dependencies() methods.

How it works at the current moment. Consider we have following statement:

SELECT * FROM t1
  LEFT JOIN t2 ON t1.f1 = t2.f1
    LEFT JOIN (t3 JOIN t4 ON t3.f1 = t4.f1) ON t1.f1 = t3.f1;

Dependencies are set according to join type:

1. t1 has no dependencies
2. t2 is dependent on t1,
   JOIN_TAB(t2)::dependent is set to 1(first table in the join).
3. t3 is dependent on t2 and indirectly on t1(since t2 dependent on t1)
   JOIN_TAB(t3)::dependent is set to 3(first and second tables).
4. t4 is dependent on t2 and indirectly on t1
   JOIN_TAB(t4)::dependent is set to 3(first and second tables).

Note that JOIN_TAB::dependent value has sum of directly and indirectly
dependent tables.

JOIN_PREFIX modifies table dependencies(if possible) in such a way that
tables specified in the hint are set first in the join order.

Consider we have following statement:
 SELECT /*+ JOIN_PREFIX(t3, t2)*/ * FROM t1 JOIN t2 JOIN t3;

Before applying of the hint tables, t1, t2, t3 have no dependencies.
After applying:

1. t3 has no dependencies.
2. t2 is dependent on t3,
   JOIN_TAB(t2)::dependent is set to 4(third table in the join)
3. t1 should be last, so JOIN_TAB(t1)::dependent is set to 6
   (second and third tables in the join).

Note that we should check that hint tables have no dependencies
on other tables, otherwise they can not be set first and the hint is
ignored.

JOIN_SUFFIX modifies table dependencies(if possible) in such a way that
tables specified in the hint are set last in the join order.

Consider we have following statement:
 SELECT /*+ JOIN_SUFFIX(t3, t2)*/ * FROM t1 JOIN t2 JOIN t3;

Before applying of the hint tables, t1, t2, t3 have no dependencies.
After applying:

1. t1 has no dependencies.
2. t3 is dependent on t1,
   JOIN_TAB(t3)::dependent is set to 1(first table in the join)
3. t2 should be last, so JOIN_TAB(t2)::dependent is set to 5
   (first and third tables in the join).

Note that we should check that tables not specified in the hint are not
dependent on hint tables. In this case hint tables can not be set last
and the hint is ignored.

JOIN_ORDER modifies table dependencies(if possible) in such a way that
tables specified in the hint are set in the specified order.

Consider we have following statement:
 SELECT /*+ JOIN_ORDER(t3, t2)*/ * FROM t1 JOIN t2 JOIN t3;

Before applying of the hint tables, t1, t2, t3 have no dependencies.
After applying:

1. t1 has no dependencies.
2. t3 has no dependencies.
3. t2 is dependent on t3,
   JOIN_TAB(t2)::dependent is set to 4(third table in the join)

Note that we should check that previous hint table are not dependent
on subsequent, otherwise requested order can not be set and the hint
is ignored.

If table specified in the hint belongs to nested join, we need
to update dependencies of all tables of the nested join with the same 
dependency as for the hint table. It's necessary since interleaving of
nested join tables should satisfy condition checked in the
Optimize_table_order::check_interleaving_with_nj() method(see comment for
this method for more detailed explanation).

Procedure of applying hints is called from JOIN::make_join_plan() after
methods which update dependency information.

Code changes:

sql/lex.h

Added to 'static const SYMBOL symbols'

  { SYM_H("JOIN_PREFIX",            JOIN_PREFIX_HINT)},
  { SYM_H("JOIN_SUFFIX",            JOIN_SUFFIX_HINT)},
  { SYM_H("JOIN_ORDER",             JOIN_ORDER_HINT)},
  { SYM_H("JOIN_FIXED_ORDER",       JOIN_FIXED_ORDER_HINT)},

sql/sql_lex_hints.cc

Added to void Hint_scanner::add_hint_token_digest() following cases:
   case JOIN_PREFIX_HINT:
   case JOIN_SUFFIX_HINT:
   case JOIN_ORDER_HINT:
   case JOIN_FIXED_ORDER_HINT:

sql/sql_hints.yy

Added grammar for JOIN_PREFIX, JOIN_SUFFIX, JOIN_ORDER, JOIN_FIXED_ORDER hints.

sql/sql_optimizer.cc

Added Opt_hints_qb::apply_join_order_hints(JOIN *join) method call.
This function resolves hint table names, checks and updates table dependencies
according to the hint requirements.

Method JOIN::propagate_dependencies() is split on two methods:
JOIN::propagate_dependencies() - propagate table dependencies and check
                                 circular dependencies.
JOIN::init_key_dependencies() - Initialize key dependencies for join tables.


sql/sql_planner.cc:

Optimize_table_order::advance_sj_state():
added a change to prevent that inner tables of different semijoin nests are
interleaved for MatScan.

sql/parse_tree_hints.h

PT_qb_level_hint class:

  Added field:
  Hint_param_table_list table_list; // hint tables

  Added constructor:
  PT_qb_level_hint(const LEX_CSTRING qb_name_arg, bool switch_state_arg,
                   enum opt_hints_enum hint_type_arg,
                   const Hint_param_table_list &table_list_arg)
    : PT_hint(hint_type_arg, switch_state_arg),
    qb_name(qb_name_arg), args(0), table_list(table_list_arg)
  {}

  Added function:
  virtual Hint_param_table_list *get_table_list()
  {
    return &table_list;
  }

sql/parse_tree_hints.cc:

  Modified:
  PT_qb_level_hint::contextualize(Parse_context *pc)
  Added JOIN_PREFIX_HINT_ENUM, JOIN_SUFFIX_HINT_ENUM, JOIN_ORDER_HINT_ENUM,
  JOIN_FIXED_ORDER_HINT_ENUM cases.

  Modified:
  PT_qb_level_hint::append_args()
  Added processing of JOIN_PREFIX_HINT_ENUM, JOIN_SUFFIX_HINT_ENUM, 
  JOIN_ORDER_HINT_ENUM.

sql/opt_hints.h

  enum opt_hints_enum:
  Added JOIN_PREFIX_HINT_ENUM, JOIN_SUFFIX_HINT_ENUM, JOIN_ORDER_HINT_ENUM,
  JOIN_FIXED_ORDER_HINT_ENUM.

  struct st_opt_hint_info:
  Added new field
  bool irregular_hint;    ///< true if hint requires some special handling.

  Class Opt_hints:
  Added method

  /**
     Function prints hints which are non-standard and don't
     fit into existing hint infrastructure.

     @param thd pointer to THD object.
     @param str pointer to result String object.
  */
  virtual void print_irregular_hints(THD *thd, String *str)


Main problems is that join order hints have several non-standard dependencies:

1. Hint is depending on the place where it was specified,
   i.e. second specified hint is depending on the first one:

/*+  JOIN_SUFFIX(t1, t2) JOIN_ORDER(t3, t4) */

   in this case, JOIN_ORDER is applicable only if tables t3, t4 are not
   dependent on t2, t1. Join order hints should be processed in the specified
   order. So we need to place all specified hints into array and process them
   in a sequence.

2. Order of specified tables is important too since we should set table
   dependencies according to this order. We can not put tables in child
   (table level) array of existing hint infrastructure. We loose the order
   in this case.

So join order hints(PT_qb_level_hint object) are stored in
Opt_hints_qb::join_order_hints array and  tables of the hint
are stored in PT_qb_level_hint::table_list array.


  Class Opt_hints_qb:
  Added new fields
  // Array of join order hints
  Mem_root_array join_order_hints;
  ulonglong join_order_hints_ignored;  // ignored join order hints

  /**
    Function checks if join order hints are applicable and
    records table dependencies if possible.    

    @param join JOIN object
  */
  void apply_join_order_hints(JOIN *join);


  Simplified algorithm is following:

  Save original table dependencies;
  For each join_order_hints element
  {
    For each hint table
    {
      Find table in JOIN::tables;
      if  (table is found)
      {
        if (table is const)
           continue;  // ignore const table

        if (possible to set necessary dependencies)
        {
          // Note that we check dependencies only for hint tables
          // and skip dependency check for other tables.
          set dependencies;
          set dependencies for nested join tables;
        }
        else
        {
          hint is conflicting;
          restore original table dependencies;
        }
      }
    }

    Add dependencies that are related to non-hint tables;
   
    // Check if not specified tables have correct dependencies;
    if (JOIN::propagate_dependencies())
      restore original table dependencies;
  }

  void register_join_order_hint(PT_qb_level_hint* hint_arg);

  Added new function:
  /**
    Append table name.

    @param thd   pointer to THD object
    @param str   pointer to String object
  */
  virtual void append_table_name(THD *thd, String *str);

sql/opt_hints.cc

  struct st_opt_hint_info opt_hint_info[]
  Added
  {"JOIN_PREFIX", false, false, true},
  {"JOIN_SUFFIX", false, false, true},
  {"JOIN_ORDER", false, false, true},
  {"JOIN_FIXED_ORDER", false, true, false},

  Opt_hints::print(THD *thd, String *str, enum_query_type query_type);
  Function changed to allow print non standard hints.

  Added new function:

/**
  Function compares hint table name and TABLE_LIST table name.

  @param hint_table         hint table name
  @param table              Pointer to TABLE_LIST object

  @return true if table names are equal, false otherwise.
*/

  static bool compare_table_name(const Hint_param_table *hint_table,
                                 const TABLE_LIST* table)


  Added new functions:

/**
  Function updates dependencies for nested joins. If table
  specified in the hint belongs to nested join, we need
  to update dependencies of all tables of the nested join
  with the same dependency as for the hint table.

  @param JOIN             pointer to JOIN object
  @param table            pointer to TABLE_LIST object
  @param hint_tab_map     map of the tables, specified in the hint
*/

static void update_nested_join_deps(JOIN *join, const TABLE_LIST* table,
                                    table_map hint_tab_map)


/**
  Function resolves hint tables, checks and sets table dependencies
  according to the hint. If the hint is ignored due to circular table
  dependencies, original dependencies are restored.

  @param JOIN             pointer to JOIN object
  @param hint_table_list  hint table list
  @param type             hint type

  @return false if hint is applied, true otherwise.
*/

static bool set_join_hint_deps(JOIN *join,
                               const Hint_param_table_list *hint_table_list,
                               opt_hints_enum type)


/**
  Function returns dependencies used for updating table dependencies
  depending on hint type.

  @param type          hint type
  @param hint_tab_map  hint table map
  @param table_map     table map

  @return table dependencies.
*/

static table_map get_other_dep(opt_hints_enum type,
                               table_map hint_tab_map,
                               uint table_map);

Added new class:

/**
  Auxiliary class is used to save/restore table dependencies.
*/

class Join_order_hint_handler : public Sql_alloc