WL#9307: Enabling merging a derived table or view through a optimizer hint

Affects: Server-8.0   —   Status: Complete

Problem
=======

In 5.6 and earlier, derived tables were always materialized. Views were merged
if technically possible.
In 5.7, we have augmented the number of cases were merging is technically
possible, both for derived tables and for views.
This can give faster queries, for example if the outer query's WHERE contains a
selective predicate which can be pushed down to the view's inner tables.

However, some queries with derived tables benefit from better
performance when the subqueries are materialized. One such possible
case is a derived table with dependent subqueries in the select list,
used as the inner table of a left outer join:
select * from t1 left join (select (subq) from t2) as dt on ...;
Such tables will always
be read as many times as there are qualifying rows in the outer table,
and the select list subqueries are evaluated for each row combination.
In 5.6 this case was materialized, in 5.7 it is merged.
Hence, applications may suffer from worse performance due to this merging - a
regression.

This was filed as
BUG#79294 QUERY WITH DOUBLE NESTED SUBQUERY MUCH SLOWER IN 5.7
and fixed in 5.7.11: the fix has disabled merging of derived tables and views
containing dependent subqueries in the SELECT list, so they are materialized,
like in 5.6.

However, implementation of SHOW commands over new data dictionary would have
faster execution if merging happened. 

Fix
===

Add a hint so that the user can force a merge; 

F1 - hints 'merge(X)' and 'no_merge(X)' should be accepted by the parser X should be: empty, or a table name (possibly qualified with a query block name), or a list of those.

F2 - There are three heuristics, which, in trunk, force materialization of a derived table or view:

 * if the derived table's definition contains assignments to user variables
 * if the derived table's definition contains dependent subqueries
   in its SELECT list
 * if the derived table is referenced by a first-level subquery of UPDATE/DELETE (*)

In the WL those heuristics are "downgraded" to merely "suggest materialization".

F3 - the MERGE hint overrides heuristics of F2.

F4 - the hints override optimizer_switch flag 'derived_merge=on|off'.

F5 - the hints are overriden by the clause "algorithm=merge/temptable" of CREATE VIEW.

F6 - if the hints are not specified, there should be no behaviour change, with only the following exception: as a consequence of F2, the clause "algorithm=merge" of CREATE VIEW overrides heuristics of F2.

(*) example: UPDATE t1 SET d= NULL WHERE id IN (

            SELECT *  # <- it's a 1st-level subq of UPDATE
            FROM
              (SELECT id FROM t1) AS dt # <- derived, materialized
           );

see BUG#79333 for more examples.

   Implements a table-level hint:
    select /*+ merge(dt) */ * from (select * from t1) as dt;
    select /*+ no_merge(dt) */ * from (select * from t1) as dt;
    create view v1 as (select * from t1);
    select /*+ merge(v1) */ * from v1;
    select /*+ no_merge(v1) */ * from v1;
   Hint is of the form
   select /*+ [no_]merge([dt[@qb]]) */ ...
   and means "in query block 'qb', [don't] merge the derived table 'dt'".
   If 'qb' isn't specified it means the query block where the hint is.
   If '()' is empty it applies to all derived tables of the query block.
   For example:
   select /*+ no_merge() */ ...
   means "don't merge any derived table referenced by this query block".
   'dt@qb' can also be written as '@qb dt', and
   multiple tables can be listed in the hint, as explained in
   http://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html
   
   The hint:
   - overrides any heuristic which isn't a real technical constraint
   - overrides optimizer_switch flag 'derived_merge'
   - is overriden by "algorithm=merge/temptable" clause of CREATE VIEW.
   
   Specifying the hint in the query which references the derived table
   makes sense for views, as the view's definer doesn't always know
   all queries which will use his view.

   Changes:
   - hint functions which used a TABLE* argument now use a TABLE_LIST*
   (derived table has no TABLE at this point); for most cases it is
   more natural (less ->pos_in_table_list hops).
   - is_mergeable() is split in two (technical constraints and heuristics)
   - sql_resolver.cc: need to process hints before "if(table==NULL)continue;"
   as derived tables have table==NULL.
   
   derived.result shows the hint overriding the heuristic of
   BUG#79294, so the user can have total control.