WL#3996: Add more hints

Affects: Server-5.7   —   Status: Complete

A hint is a substring within an SQL statement (usually a
DML statement) which the user intends as a "suggestion"
or "for your information" for the DBMS (usually the
server). The DBMS may disregard a hint, but by documenting
a hint we are telling users that we don't always disregard.
A subclass of hints is the "if/endif" comment, which tells
the DBMS to parse a clause only if some condition is true.

Typical hint classes are "optimizer" and "portability".

MySQL has used a variety of things that might be hints:

"Index Hint Syntax" e.g. "FORCE INDEX (x)"
http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

"View algorithm" e.g. "CREATE VIEW ALGORITHM=TEMPTABLE"
http://dev.mysql.com/doc/refman/5.0/en/create-view.html

"Comments" e.g. "/*! STRAIGHT_JOIN */" or "/*!32302 TEMPORARY */"
http://dev.mysql.com/doc/refman/5.0/en/comments.html

There was discussion of using hints for subqueries in 5.2,
but the optimizer team decided to use a server variable instead.

So the overall "hints" job (deciding a policy, changing the
parser, deprecating old ways, and implementing) is something
that probably won't come up again until after version 6.

Decisions
---------

I'll just list the things that require a decision someday.

1. Most hints should be /* Comment */ hints.
   Or some hints should be OPTION (option list) hints.

2. Hints may appear anywhere in any statement, and have
   effect throughout the statement. Or hints in a particular
   clause have effect only for that clause. Or hints in a
   particular subquery have effect only for that subquery.
   Or hints must be in a fixed place, at statement start/end.

3. Bad syntax in a hint causes the hint to be ignored.
   Or bad syntax causes an error.
   Or bad syntax causes a warning.
   Or there's a session variable that decides whether
   hints are ignored, or cause an error, or cause a warning.

4. Hints, even /* Comment */ hints, appear in SHOW CREATE
   and mysqldump. Or they don't.

5. Hints in main statements override hints in views.
   Or hints in views are ignored totally.

6. All existing hint-like things should be moved to the new
   style. Or they don't need to be.

References
----------

"Using Optimizer Hints" (Oracle)
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/hintsref.htm

"Giving optimization hints to DB2"
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.pmtg/p9li375.htm
"DB2 SELECTIVITY clause"
http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/index.html

"hints for DML queries" (SQL Server)
http://blogs.msdn.com/queryoptteam/archive/2006/07/14/666196.aspx

"Hints (Transact-SQL)" (SQL Server)
http://msdn2.microsoft.com/en-us/library/ms187713.aspx

"Using a SELECTIVITY clause to influence the optimizer" (DB2)
http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/index.html

"query enable" (PostgreSQL)
http://www.postgresql.org/docs/8.1/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

BUG#21300 Index hints for update statements
BUG#28371 Add SQL 'hints' for fine grained control of engine_condition_pushdown

"MySQL Forums :: Partitioning :: Are partition hints needed?"
http://forums.mysql.com/read.php?106,154391,154391

Syntax:
-------

1. Comment syntax will be used for new hints.
   The syntax will be /*+ */ 

   Reasoning: MySQL already supports the /*! */ syntax, but that is 
   	      interpreted as part of the ordinary SQL syntax.
              Oracle uses /*+ */ syntax for hints.

2. Multiple hints may be specified in the same comment. E.g., 

      /*+ HINT1(...) HINT2 HINT3(...) */

3  A statement block can have only one comment containing hints, and
   that comment must follow the SELECT, UPDATE, INSERT, REPLACE, or
   DELETE keyword.  It will have effect for the statement block where
   it is placed and other statement blocks nested within.  In other
   words, hints in the outer-most query block of a statement will
   have effect on all sub-query blocks.

   A hint in a subquery block will override conflicting hints specified at 
   outer query blocks.

   Conflicting hints for same query block, e.g.
   /*+ SEMIJOIN() NO_SEMIJOIN() */, will give a warning.

4. Bad syntax in a hint will cause a warning.  

5. Object references (e.g., table names, index names) to objects that
   are not part of the query will give a warning.
      
6. Hints in views are not supported.

7. Over time we will deprecate old hint syntax.


Optimizer decisions that should be possible to override with hints:
-------------------------------------------------------------------

1. Join order

2. Access method choice

3. Semi-Join/subquery strategy

4. Use of join buffering and  BKA

5. Use of MRR

6. Index condition push-down

7. Use of index_merge

8. Strategy for filesort
   - Priority queue for LIMIT queries
   - Inclusion of additional fields in sort buffer

Items 1 and 2 above is covered by existing hints; STRAIGHT_JOIN and
USE/FORCE/IGNORE INDEX, respectively.

Items 3 - 7 is currently possible to override by optimizer_switch.
There are several drawbacks with with using optimizer_switch:
   - Requires extra round-trips to server to set/reset session variables. 
   - Can be set only for whole statement, not for specific query block
     or table. E.g., it may be useful to turn on BKA for a specific
     index access, but not for whole table, or to turn off a specific
     semi-join strategy for one query block, but not for another. 
   - Turning off an optimizer_switch flag is like IGNORE INDEX; it
     restricts available choices.  We miss the equivalent of FORCE
     (with the exception of a few flags which turns off cost
     optimization) or USE.

For Item 8 there is currently no way to override the decision of the
optimizer.

Setting session variables through hints
---------------------------------------

In addition to overriding optimizer decision, it would be useful to be
able to override the current setting of session variables that affect
query execution (e.g., sort_buffer_size, join_buffer_size).  One
option is to provide this through hints, another option is to support
a specific syntax for this purpose.  For example:
   SELECT ... USING VARIABLES join_buffer_size=64*1024*1024;

One problem with using hints for this purpose is that it may not be
desirable to execute different query blocks within a statement with
different values for session variables.  One option could be to only
allow setting of session variables in top-level query block.

Hints that should be implemented
--------------------------------

1. Semi-Join strategy
   
   a. Specify which semi-join strategies can be used
      Syntax:
	/*+ SEMIJOIN( [strategy [strategy] ... ] ) */
   	strategy ::= MATERIALIZATION|FIRST_MATCH|LOOSE_SCAN|DUPS_WEEDOUT

      If no strategy is specified, all strategies are allowed

   b. Specify which semi-join strategies should NOT be used
      Syntax:
         /*+ NO_SEMIJOIN( [strategy [strategy] ... ] ) */
   	strategy ::= MATERIALIZATION|FIRST_MATCH|LOOSE_SCAN|DUPS_WEEDOUT
   
      If no strategy is specified, semi-join will not be used.

   TBD: How do we force a particular join order for semi-join?

2. Subquery strategy

   Specify whether Subquery Materialiation or In-to-exists should be
   used.
   Syntax: 
     /*+ SUBQUERY(strategy) */
     strategy ::= MATERIALIZATION|IN_TO_EXISTS

3. Join buffering (Block nested loop)

   a. Specify that join buffering (BNL) should be used for the
      specified tables.
      Syntax:
        /*+ BNL( table [table ...] ) */
      
      BNL will be used for the specified table(s) when appropriate.
      That is, the table must be an inner table of a join operation
      and ref access is not used.  For other tables, the hint is
      ignored without any warning.

   b. Specify that join buffering (BNL) should NOT be used for the
      specified tables.
   
      Syntax:
        /*+ NO_BNL( table [table ...] ) */

4. Batched Key Access (BKA)

   a. Specify that batched key access (BKA) should be used for the
      specified tables.
      Syntax:
        /*+ BKA( table [table ...] ) */
      
      BKA will be used for the specified table(s) when appropriate.
      That is, the table must be an inner table of a join operation
      and ref access is used.  For other tables, the hint is
      ignored without any warning.

   b. Specify that batched key access (BNL) should NOT be used for the
      specified tables.
   
      Syntax:
        /*+ NO_BKA( table [table ...] ) */

5. Multi-range-read (MRR)

  a. Specify that multi-range read (MRR) should be used for the
     specified index(es).
     Syntax:
       /*+ MRR( table [index [index] ...] ) */
     
     If no indexes are specified, the hint will apply to all indexes
     of the given table.

     This hint will not impact which access method is chosen for a
     given table, but given that a specified index is chosen for range
     (or ref?) access, MRR will be used.

  b. Specify that multi-range read (MRR) should NOT be used for the
     specified index(es).
     Syntax:
       /*+ NO_MRR( table [index [index] ...] ) */
     
     If no indexes are specified, the hint will apply to all indexes
     of the given table.


6. Index Condition Push-Down (ICP)

  a. Specify that index conditioni push-down (ICP) should be used for the
     specified index(es).
     Syntax:
       /*+ ICP( table [index [index] ...] ) */
     
     If no indexes are specified, the hint will apply to all indexes
     of the given table.

     This hint will not impact which access method is chosen for a
     given table, but given that a specified index is chosen for range
     or ref access, ICP will be used.

  b. Specify that index condition push-down (ICP) should NOT be used for the
     specified index(es).
     Syntax:
       /*+ NO_ICP( table [index [index] ...] ) */
     
     If no indexes are specified, the hint will apply to all indexes
     of the given table.

7. Use of Index Merge

  a. Specify that index merge should be used for the specified
     table/indexes
     Syntax:
       /*+ INDEX_MERGE( table [index [index] ...] ) */
  
     If no indexes are specified the optimizer will choose whatever
     indexes makes sense to merge for the particular query.

     Note: There is no specific hints for the different kinds of
           index merge.  It is assumed that it will be obvious for a
           given query which type of index merge that applies.

  b. Specify that index merge should NOT be used for the specified
     table.
     Syntax:
       /*+ NO_INDEX_MERGE( table ) */

     Note: It is not possible to specify indexes for this hint.  This
           means that a user can not specify that index merge is
           generally allowed, except for one particular combination of
           indexes.  In such cases, the user will have to use force
           one particular index merge combination using the
           INDEX_MERGE hint.

8. Strategies for filesort

   a. Use priority queue for LIMIT queries
     Syntax:
       /*+ ORDER_BY_LIMIT_OPTIMIZATION */

   b. Do NOT use priority queue for LIMIT queries
     Syntax:
       /*+ NO_ORDER_BY_LIMIT_OPTIMIZATION */

   c. Include additional fields in sort buffer
     Syntax:
       /*+ ORDER_BY_ADD_FIELDS */

   d. Do not include additional fields in sort buffer
     Syntax:
       /*+ NO_ORDER_BY_ADD_FIELDS */

9. Statement timeout
   Specify statement specific timeout.
   Syntax:
     /*+ MAX_EXECUTION_TIME(milliseconds) */    

   
Additional hints that could be implemented
-----------------------------------------

1. Name a query block
   Syntax:
      /* QB_NAME( name ) */

   This name will replace the automatic name (e.g., select#1) used in
   EXPLAIN output etc.

   TBD:  Maybe we should allow this hint also in views?


2. Specify that a hint should be applied to a specific query block
   Syntax:
      /* HINT(@name ...) */

   Specify that a hint is to be applied to a specific (sub)-query
   block.  This could be used for at least two purposes:
     - Collect all hints at top-level
     - Specify hints for views

   The name may either be explicitly assigned by a QB_NAME hint or
   implicitly assigned by the optimizer (e.g., select#1).

3. Join Order

   a. Specify join order for (subset of) tables
      Syntax:
        /*+ STRAIGHT_JOIN(table ... ) */

      MySQL will join the specified tables in the order specified in
      the argument list.

      If no table arguments are given, MySQL will join tables in the
      order in which they appear in the FROM clause.

      Not all tables of the join needs to be specified.  In that case,
      MySQL will be free to places non-specified tables wherever in
      the join sequence, it sees fit.  

   b. Specify leading join order for join
      Syntax: 
        /*+ JOIN_PREFIX(table ...) */

      Instructs the optimizer to use the specified set of tables as
      the prefix in the execution plan.

   Note: In order to be able to specify the join order when semi-join
   	 is involved, including tables of sub-queries should be
   	 allowed for the above two hints.

	 Specifying a join order that conflict with dependencies given
	 by outer join will cause a warning.

4. Index usage

   a. Use one of the specified indexes for range and ref access
      Syntax:
        /*+ INDEX(table [index [index] ...] ) */ 
  
      Specifying no index arguments, will have same effect as
      specifying all indexes.

      TBD: Should specifying an INDEX hint prohibit table scan?  Or
      	   should only INDEX hints with no index arguments do so?

   b. Use none of the specified indexes for range and ref access
      Syntax:
        /*+ NO_INDEX(table [index [index] ...] ) */ 
  
      Specifying no index arguments, will have same effect as
      specifying all indexes.  That is, table scan than will be used.

   c. Use one of the specified indexes for ref access in nested loops join.
      Syntax:
        /*+ JOIN_INDEX(table [index [index] ...] ) */

      Specifying no index arguments, will have same effect as specifying
      all indexes.

   d. Do not invoke range optimizer on specified indexes.  
      Purpose is to avoid range optimization when it requires much resources.
      Syntax:
        /*+ NO_RANGE_OPTIMIZATION(table [index [index] ...] ) */

      Specifying no index arguments, will have same effect as specifying
      all indexes.