WL#1110: Subquery optimization: Materialization

Affects: Server-6.0   —   Status: Complete

Speed up subquery execution by materialising the subquery as a
temporary (normally in memory) table. The materialized temporary table
must be indexed so that the subquery predicate can be computed
efficiently via this index.

This task is a complement to WL#2980 semi-join, as it covers most of
the cases not covered by semi-join.
========================================================================
HLS for WL#1110 Subquery optimization: Materialization
========================================================================

CONTENTS
========================================================================
1. Requirements
   1.1 External requirements (from management)
   1.2 Internal requirements (from the developers)
   1.3 Semantics of [NOT] IN
2. Introduction
3. Applicability
4. Query processing
5. Explain
6. Related work
7. Future work


1. Requirements
========================================================================

1.1 External requirements (from management)
--------------------------------------------
- The task should be designed so it can be done short time frame so
  it can be implemented in 5.2.

- Monty required that we add the new optimizations incrementally
  without first removing the current (often wrong) subquery
  transformations, and without first simplifying the current
  code.


1.2 Internal requirements (from the developers)
-----------------------------------------------

- Limit the scope to only IN subqueries. We do this because the support
  of NOT IN in the general case when there are NULL tuple components
  is rather complex, and is a well-defined separate task.

- Limit the scope to only top-level predicates, where NULL <==> FALSE.
  This requirement results from the same problem as above - handling
  of the general case with NULLs on either side of [NOT] IN is a separate
  relatively complex task.


1.3 Semantics of IN
-------------------

Let's denote with:
- "*_i"           - any value at position "i",
- "NULL_i"        - NULL at position "i",
- "complete match"- means that all components match, and there are
                    no NULL components,
- "partial match" - means that two tuples match except in those
                    components that contain NULL (that is, NULL
                    acts as wildcard),
- "A | T"         - means ANY expression in query | TOP-LEVEL expression
- "T"             - TRUE
- "F"             - FALSE
- "N"             - NULL (UNKNOWN)

Then the truth table in the general case of [NOT] IN is:

Table 1:
================================================================================
Case    | Left operand          | Right operand                 | IN    | NOT IN
================================================================================
        | ,   |                               | C1|   |   |
        | Exists i, v_i == NULL | Any or Top-level expression ->| A | T | A | T
--------------------------------------------------------------------------------
A.1     |                       |                    | F | F | T | T
--------------------------------------------------------------------------------
A.2     |                       | exists complete match         | impossible
--------------------------------------------------------------------------------
A.3     |                       | (...  ...)    | N | F | N | F
        |                       | exists partial match          |   |   |   |
--------------------------------------------------------------------------------
A.4     |                       |                | F | F | T | T
        |                       | no partial match              |   |   |   |
-------------------------------------------------------------------------------
        | ,      |
        | v_i != NULL, i = 1..n |
-------------------------------------------------------------------------------
B.1     |                       |                    | F | F | T | T
        |                       |                               |   |   |   |
-------------------------------------------------------------------------------
B.2     |                       | (...  ...)     | T | T | F | F
        |                       | exists complete match         |   |   |   |
-------------------------------------------------------------------------------
B.3     |                       | (...  ...) | N | F | N | F
        |                       | exists partial match, but no  |   |   |   |
        |                       | complete match                |   |   |   |
-------------------------------------------------------------------------------
B.4     |                       | , neither      | F | F | T | T
        |                       | complete, nor partial match   |   |   |   |
-------------------------------------------------------------------------------


From the requirements in Sect. 1.2 it follows that Table 1 can be reduced
to the much simpler Table 2 for top-level IN:

Table 2
===============================================================
Left operand          | Right operand                 | IN    |
---------------------------------------------------------------
,   |                               |       |
Exists i, v_i == NULL |                               |       |
---------------------------------------------------------------
TL.1A                 | Anything                      |   F   |
---------------------------------------------------------------
,      |
v_i != NULL, i = 1..n |
---------------------------------------------------------------
TL.2A                 | (...  ...)     |   T   |
                      | exists complete match         |       |
                      -----------------------------------------
TL.3A                 | All other cases               |   F   |
                      |                               |       |
---------------------------------------------------------------


This WL will implement IN according to Table 2.


2. Introduction
========================================================================

The SQL subquery predicates (IN, ANY, EXISTS, ALL) are variants of the
*logical* semi-join operation, where the left operand is a tuple from
an outer query (or queries), and the right operand is the result of the
subquery under the predicate.

There are various *physical* semi-join algorithms that can compute a
semi-join, all of them being variants of well known join algorithms.
In MySQL currently we consider the semi-join variants of [block-]
[index-] nested loops join, and single-pass hash join.

Some of these are applicable in all cases, while others require some
specific properties of the queries, and/or the schema. Depending on
syntactic properties of the query and its subqueires, on data
statistics, and index availability, one or another physical semi-join
can be the cheapest for a particular query and table instances.

The goal of this task is to implement an efficient physical semi-join
implementation for the cases when a subquery predicate cannot be
"flattened" to an nested-loops semi-join due to operation(s) in the
subquery operand that require its materialization.

The general idea is to materialize the result of the subquery into a
table, and to create a hash index on all materialized fields so that
the subquery predicate can be computed via hash lookups. Effectively,
this results in a single-pass hash semi-join specialized for
subqueries.

Since the result of semi-join consists only of the tuples of the left
operand that pass the join condition, the right operand of the
semi-join should not contain duplicates so that each left tuple is
tested exactly once against each unique right tuple. This guarantees
that each of the left operands of a subquery predicate is present in
the result at most once.

Thus we define "subquery materialization" as an operation that
materializes all distinct result tuples of a subquery when it is the
right operand of a semi-join. When we use the term "subquery
materialization" we will assume that it includes duplicate elimination
as well.


3. Applicability
========================================================================

For this task, the decision how to execute a subquery predicate will be
made in two steps.

We will first decide whether to execute a subquery via a logical
semi-join operation based on syntactic properties of the query. If
semi-join was chosen, then we will further decide whether to execute
the semi-join through subquery materialization (that is hash semi-join,
the topic of this WL), or through index-nested loops semi-join
(described in WL#2980).

The decision about logical semi-join will be made during the PREPARE
phase of the outer query, just before the IN => EXISTS transformation,
so that it is possible to disable the transformation if semi-join is
chosen.

The decision about the physical semi-join method has to be made in the
beginning of the OPTIMIZE phase of the outer query, because we need to
be able to analyze the available indexes in all subqueries, but the
join method must be chosen before equality propagation, and join
simplification, because they depend on the chosen physical semi-join.

Below we describe these rules in more detail. Let the subquery Q_i
is a direct child of an outer query Q_[i-1].

3.1 During the PREPARE phase of Q_[i-1] we determine that:
----------------------------------------------------------------------

The subquery predicate SQP_i(outer_ref, Q_i) should be executed via
logical semi-join if:

3.1.1 SQP_i is IN (or "= ANY").

3.1.2 SQP_i is a top-level predicate, that is:
      - referenced in an expression in the WHERE or ON clauses,
      - not an argument of an expression
      - not under NOT
      so that for the subquery predicate, FALSE <=> NULL.

3.1.3 Q_i is not a UNION, and is non-correlated.

TODO:
In addition, 3.1.3 can be extented so that Q_i is either of:
- correlated to Q_[i-1] but
  - has none of GROUP BY, ORDER BY [LIMIT], aggregate functions,
    and
  - is not under NOT IN
    (such that we can execute Q_i without materialization),
or
- correlated to queries that contain Q_[i-1], thus the
  left IN operand is constant inside Q_[i-1], which
  allows us to materialize Q_i.


3.2 In the beginnig of the OPTIMIZE phase, we determine that:
----------------------------------------------------------------------

If in step 3.1 above the query preprocessor chose to execute SQP_i via
logical semi-join, then the optimizer applies the following rules
to choose the physical semi-join algorithm:

3.2.1 SQP_i is executed via subquery materialization (that is,
      via single-pass hash semi-join) if Q_i contains at least one of:
      - GROUP BY
      - ORDER BY ... LIMIT
      or
      - there is no index for the SELECT fields of the subquery

      TODO:
      Check why do we have the limitation:
      mysql> select * from t1 where a1 in (select b1 from t2 order by b1 limit 1);
      ERROR 1235 (42000):
      This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

3.2.2 Otherwise SQP_i is executed via index-nested loops semi-join, or
      IN=>EXISTS transformation, whichever is applicable as described
      in WL#2980 "Subquery optimization: Semijoin".

The above rules will be applied during the OPTIMIZE phase of the outer
query Q_[i-1], before before equality propagation, and simplify_join.

QUESTION:
- ANY (SOME), ALL are handled by MIN/MAX rewrite.
  - Are there any cases when it is better to materialize?
  - What about "= ALL (...)" ? It should be handled as MIN = MAX.
    Is that so?
  - Check if it is always the case that ALL is rewritten into MIN/MAX.
    If that is the case, it should be always cheaper to execute as
    MIN/MAX without materialization, no matter if there is a suitable
    index or not.


4. Query processing
========================================================================

Since the subject of this WL is semi-join execution via subquery
materialization, this section provides an outline of the processing of
IN subqueries with materialization. The implementation details are
given in the HLS.

For an outer query Q_[i-1], having a top-level IN predicate SQP_i in an
ON or WHERE clause, with a subquery Q_i:

4.1 The query preprocessor:
---------------------------

- Decides whether to use logical semi-join or not for the predicate,
  as described in Sect. 3.1 above.
- If it chose semi-join, it blocks the transformation of
  non-correlated IN into correlated EXISTS.
- Prepares the subquery Q_i.

4.2 The query optimizer:
------------------------

- Chooses the physical semi-join method for the subquery predicate
  as described in Sect. 3.2.

- If the chosen execution method is index-nested loops semi-join
  then
    proceed as described in WL#2980 "Subquery optimization: Semijoin".
  else
    /* subquery materialization */
    - Set up subquery execution-related structures.
      - Creates a temporary table for the materialized result of the
        subquery.
      - Creates a unique hash index on the SELECT list of the subquery
        so that we skip duplicate subquery results.
      - Sets up the execution plan so that when it executes subquery
        for the first time, its result is stored in the temoprary table.

4.3 The execution engine:
-------------------------

- At the first execution of SQP_i (Item_in_subselect::val_int):
  - Optimize the subquery Q_i.
  - Execute the subquery through the JOIN object created for the
    subquery during PREPARE and materialize its result in the temporary
    table created during the prepare phase. While materializing, skip
    all duplicates through the unique hash-index on the temporary table.
  - Use the existing uniquesubquery engine to do index lookup into the
    temporary table's unique hash index.
  - Perform a lookup with the left IN operand into the temporary table,
    and compute the result for the IN predicate.
  - Cache the left IN operand, and the predicate result.
- At each subsequent execution of the IN predicate:
  - If the left IN operand is the same as the cached one,
    - return the corresponding cached IN result
  - else
    - Perform a lookup with the left IN operand into the temporary
      table, and compute the result for the IN predicate.
    - Cache the left IN operand, and the predicate result.


5. Explain
========================================================================

The output of EXPLAIN for a subquery must be changed so that it displays
"materialized" for the subquery.

TODO: check the approach for WL#2980, and make it similar.


6. Related work
========================================================================
- PostGresSQL uses hash indexes on disk with multi-pass hashing which
  in general will give them better performance.

- We have materialization via hash tables only if in the table with
  index fits in main memory, while otherwise MySQL swaps the table
  to disk and indexes it with a Btree index.

- Didn't check it, but it is very likely the other DB vendors also
  use similar technique to PG.

=>
Therefore to be competitive we need multi-pass hashing as described
in Sect. 7.5 below.


7. Future work
========================================================================
7.0 Implement correct NULL semantics according to Table 1 in Sect. 1.3.

7.1 Choose the subquery execution method on a cost-based basis during
    the cost-based optimization phase.

7.2 Consider correlated (dependent) subqueries for which it is possible
    to isolate a selective non-correlated fragment (invariant), and
    materialize this fragment.
    See the paper (implemented in Sybase):
    "Reusing invariants: A new strategy for correlated queries"
    http://portal.acm.org/citation.cfm?doid=276304.276309

7.3 Consider correlated subqueries in the SELECT clause which can't be
    transformed to joins and for which there is no index that can be
    used to find the rows in the subquery tables, e.g.:
    SELECT A IN (SELECT no_key FROM t2 WHERE key=10 AND t1.a=t2.b)
    FROM t1;

7.4 If there is no aggregate function, GROUP BY <=> DISTINCT, then it
    should be possible to use semi-join instead of materialization

7.5 The current implementation is limited by the current implementation
    of memory hash indexes. Once the table and index don't fit into main
    memory, the table is written to disk, and the index is changed to
    B-tree. This will result in much worse performance compared to
    competitive products. To remedy the situation, we need an
    implementation of multi-pass hashing as described in WL#2106.

7.6 Detect if the subquery produces 1 or 0 rows, and use this
    information to avoid the overhead of temporary table creation.
    This can be detected based on various properties:
    - subquery syntax (e.g. LIMIT 1)
    - conditon analysis of the subquery
      (e.g. "WHERE col > 1 and col < 1")
    - based on index analysis (e.g. "WHERE pk_col = const")
    - first execution

7.7 If a subquery is executed as a regular query, and it needs a temp
    table, e.g. to compute GROUP/ORDER BY, then we will create two temp
    tables - one to compute the operation, and one into which we will
    materialize the final result. In some cases it is possible to reuse
    the first table as the materialized table. Figure out a way to
    reuse the first table as a materialized table to avoid double
    materialization.

7.8 Improve the performance of lookups for partial matches.

    The current sequential scan method (LLD, Sect. 2.2.2) to perform
    partial matches for tuples with NULL components may be quite
    inefficient for tables with large numbers of NULLs. This approach
    can be improved in the following ways.

    7.8.1 Nested loose scans

    This is a modification of LLD.2.2.2.1, where we use a B-tree to
    store tuples with NULLs instead of sequential storage. Then we can
    use the loose index scan approach as follows.

    Depending on the lenght of an index keypart prefix, we can look
    at all keys with this prefix as a sub-grouping of all keys.
    Then, for each key part from left to right, we:
    - check if the index contains a NULL for this keypart,
      - if so, set the current keypart of the lookup key to NULL,
        move to the next key part, and move to the next key part for
        a new nested loose scan.
      - otherwise look for the first index key that has the same
        value in the current keypart as the lookup key.
        - if one is found, then set the current keypart of the lookup
          key to this value, and move to the next key part for a
          new nested loose scan.
        - otherwise return "not found"

    The above algorithm tries to find a partial match for each keypart,
    and if not found, tries to find a complete match for the keypart,
    and then to expand it with a partial match.

    7.8.2 IO_CACHE with offsets pointing to next groups

    The idea in 7.8.1 can be applied also to a sequential sorted
    storage of keys. We can consider such storage as a linear
    representation of an ordered index. Then for each key that is the
    beginning of a new group with some common prefix, we can add a
    tuple  that tells us what is the offset of the
    first tuple in the next group in the same subgroup.

    This would allow us to "jump" from one group to another inside an
    outer group, in the same way as we would jump with the nested loose
    scans approach above.

    This can be implemented by modifying the key comparison function
    used by the Unique class, so that it stores its data in reverse
    order. Then we could fill the IO_CACHE buffer from the end,
    and set all offsets in one pass.

    WARNING:
    According to Guilhem, it is not clear whether IO_CACHE can be used
    for seek-ing if there is no pre-created file for the IO_CACHE.
    There is a confusing comment re seeks: "** Should be used when no
    seeks are done (only reinit_io_buff)".

    7.8.3 Role inversion of IN operands (block hash semi-join)

    If the left IN operand has no NULL components (or has few tuples
    with NULLs), then we could invert the roles of the left and right
    operands, and instead perform lookups from the right into the left
    operand. One way of doing that would be to accumulate the tuples
    of the left operand into an in-memory hash table, and once this
    table is filled, test all its tuples by scanning the materialized
    right operand, and doing lookups into the partially materialized
    left operand. This in effect would implement a block hash
    semi-join.
========================================================================
LLD for WL#1110 Subquery optimization: Materialization
========================================================================


CONTENTS
=========================================
1. Query execution
   1.1 Subquery engine for hash semi-joins
   1.2 Naive IN execution with top-level predicate semantics
   1.3 Item_in_subselect shortcut evaluation
2. Query compilation
   2.1 Applicability test
   2.2 Disabling IN => EXISTS transformations
   2.3 Optimization
   2.4 Code generation
3. Explain
4. Re-engineering
5. Schedule
   5.1 Phases and time estimates
   5.2 Plan with dates


1. Query execution
===================

We first describe how do we plan to execute IN predicates via hash
semi-join. Then in Sect. 2 we describe the changes to the optimizer
needed to generate such query plans.

Hash semi-join execution of IN predicates follows the same approach as
currently - it based on execution via a subclass of subselect_engine.

1.1 Subquery engine for hash semi-joins
---------------------------------------

All execution details of subquery materialization, and subsequent index
lookups are encapsulated in the class subselect_hash_semi_join_engine.
This class behaves as subselect_uniquesubquery_engine in the sense that
during query execution it performs index lookups in the same way.
However, this new class differs from subselect_uniquesubquery_engine in
two ways:

- Before the subquery predicate is evaluated for the first time, it has
  to first materialize the subquery into a temporary table.
- When performing lookups it must take care of subquery result tuples
  with a NULL component. Currenlty if the temporary table contains such
  tuples, a lookup will return no result, while subquery predicate
  semantics requres the lookup to return NULL as its result. That is,
  in general:
  <... val ...> [NOT] IN <... NULL ...> => NULL, instead of "no match".
  For top-level queries, NULL is mapped to FALSE, and there is no
  problem with the IN predicate. Still, there is a problem with
  top-level NOT IN predicates, because if a NULL result of IN is
  mapped to FALSE, then the result of NOT IN will be
  NOT(FALSE) = TRUE, instead of FALSE (since NOT(NULL) = NULL, which
  is mapped to FALSE).
  NOTE:
  This issue will be solved by a separate task:
  WL#3830: Subquery optimization: Materialization:
           Partial matching of tuples with NULL components

Based on these similarities and differences, we design
subselect_hash_semi_join_engine as a subclass of
subselect_uniquesubquery_engine.

class subselect_hash_sj_engine: public subselect_uniquesubquery_engine
{
protected:
  /* TRUE if the subquery was materialized into a temp table. */
  bool is_materialized;
  /*
    The old engine already chosen at parse time and stored in permanent memory.
    Through this member we can re-create and re-prepare materialize_join for
    each execution of a prepared statement. We also resuse the functionality
    of subselect_single_select_engine::[prepare | cols].
  */
  subselect_single_select_engine *materialize_engine;
  /*
    QEP to execute the subquery and materialize its result into a
    temporary table. Created during the first call to exec().
  */
  JOIN                           *materialize_join;

public:
  subselect_hash_sj_engine(THD *thd, Item_subselect *in_predicate,
                               subselect_single_select_engine *old_engine)
    :subselect_uniquesubquery_engine(thd, NULL, in_predicate, NULL),
    is_materialized(FALSE), materialize_engine(old_engine),
    materialize_join(NULL)
  {}
  ~subselect_hash_sj_engine();

  bool init_permanent(List *tmp_columns);
  bool init_runtime();
  void cleanup();
  int prepare() { return 0; }
  int exec();
  void print (String *str);
  uint cols()
  {
    return materialize_engine->cols();
  }
  virtual enum_engine_type engine_type() { return HASH_SJ_ENGINE; }
};

In this section we will describe only the methods related to execution.


1.2 Naive IN execution with top-level predicate semantics
---------------------------------------------------------

The execution of a [NOT] IN predicate via hash semi-join is
encapsulated in the exec() method below.

The algorithm below implements the IN truth Table 2 from HLS.Sect.1.3,
but it doesn't implement correct NULL semantics as described in Table 1
in the HLS. The reason is that lookups with keys that do not contain
NULLs into a hash table that contains keys with NULL components will
return FALSE (no match) instead of NULL.

int subselect_hash_semi_join_engine::exec()
{
  /*
    Optimize and materialize the subquery during the first execution
    of the subquery predicate.
  */
  if (!is_materialized)
  {
    ASSERT(subquery_plan is not optimized);
    subquery_plan->optimize();
    subquery_plan->exec();
    is_materialized = TRUE;
    if (the subquery returned no rows)
    {
      There is no need to perform lookups for empty subqueries,
      mark IN as false, and mark that subquery result is empty;
      return FALSE;
    }      
  }

  /*
    Perform a hash-index lookup if the left predicate operand changed.
    Notice that the exec() method below updates
    item::value, and item::null_value, thus if we don't set these,
    the next call to item::val_int() will return whatever result
    was computed by its previous call.
  */
  if (test_if_l_operand_changed())
    return subselect_uniquesubquery_engine::exec();

  return FALSE;
}


/*
  Check if any of the components of the left operand cache changed,
  and update the cached values of each one.

  TODO: this method does the same as test_if_group_changed() -
        needs refactoring.
*/

bool subselect_hash_semi_join_engine::test_if_l_operand_changed()
{
  for each l_val in l_operand_cache
    if l_val->cmp()
      return TRUE;
  return FALSE;
}


1.3 Item_in_subselect shortcut evaluation
-----------------------------------------

TODO: THIS SECTION IS NOT IMPLEMENTED YET!!!

From the [NOT] IN truth table in the HLS, Sect. 1.3, it follows that
when [NOT] IN is a top-level predicate, we may compute the result of
the predicate without performing lookups into the materialized
subquery. We distinguish two cases:

A) The left operand contains a NULL component, and the predicate is a
   top-level IN.
   - In this case the result of IN is always FALSE.
   - This case is currenlty (incorrectly in the case of NOT IN)
     handled by Item_in_optimzer, covering the following cases of the
     IN truth table (Sect. 1.3 in the HLS):
     A.1, A.3, A.4, column 2.

As noted in Sect. 4.3 we will remove Item_in_optimzer and will merge it
with Item_optimizer at a later implementation phase.

B) The materialized subquery has an empty result.
   - In this case the result of IN is always FALSE, and for NOT IN
     is always TRUE, and there is no need to invoke the subquery engine
     once we find this out.
   - In the IN truth table in the HLS, these are cases: A.1, B.1.

For case (B) we override Item_subselect::exec(), and add a new member:

class Item_in_subselect
{
  bool empty_subquery; /* Set to FALSE initially. */

  bool exec()
  {
    if (empty_subquery)
    {
      /*
        If the subquery engine detected that the result is empty, it
        set correctly the Item's value as FALSE, so no need to
        recompute it.
      */
      return FALSE;
    }
    else
      return Item_subselect::exec();
  }
}


2. Query optimization
=====================

2.1 Applicability test
----------------------

The decision whether to execute an IN predicate via hash semi-join is
made during the JOIN::prepare phase of the outer query. This approach
uses the fact that for each Item_in_subselect there is a default
subselect_engine that contains a JOIN object for the subquery of the
IN predicate. Item_in_subselect::fix_fields() invokes the creation
of the JOIN object for the corresponding subquery, and subsequently
invokes JOIN::prepare for the subquery. This results in a recursive
invocation of the prepare phase for all subqueries.

The applicability test is positioned after the recursive call, thus we
decide whether to use a hash semi-join in a post-order manner. The
result of the test is recorded in a new member of Item_in_subselect:
  Item_in_subselect::use_hash_sj

The test is implemented inside JOIN::prepare, as follows:

JOIN::prepare
{
  ............

  /* recursive call to Item::fix_fields()
  setup_without_group()

  ............

  if (!thd->lex->view_prepare_mode)
  {
    Item_subselect *subselect;

    /* If we are in a subquery predicate. */
    if ((subselect= select_lex->master_unit()->item))
    {
      /*
         Determine an execution method for the subquery predicate
        'subselect'.

        Let SP_i is the subquery predicate (called 'subselect' in the code,
        and Q_i is its right operand (subquery).
        Then P_i is executed via a hash semi-join under the following
        conditions:
     */

      if (
          1. SP_i is an IN predicate &&
          2. Q_i  is not a UNION &&
          3. Q_i  is not a table-less query &&
          4. the outermost SQL statement is a SELECT &&
          5. SP_i is a top-level item &&
          6. Q_i is not correlated)
      {
        /* In this case we record this decision: */
        SP_i->use_hash_sj = TRUE;
      }

      transform the subquery predicate and its subquery
    }
  }

  .........
}


The result of the test above is used to block the current transformation from
non-correlated IN => correlated EXISTS, as described in the next section. 


2.2 Disabling IN => EXISTS transformations
------------------------------------------

To make it possible to execute an IN predicate via semi-join, we have
to disable the current transformation of non-correlated IN predicates
into correlated EXISTS, so that we can materialize the unmodified
subquery.

We modify Item_in_subselect::[single | row]_value_transformer() so that
it still wraps Item_in_subselect in an Item_in_optimizer, but doesn't
invoke the IN => EXISTS transformation in the case of semi-join as
follows:

Item_in_subselect::[single | row]_value_transformer()
{
  Wrap 'this' into a new Item_in_optimizer as it is done now.

  if (use_hash_sj)
    return OK

  Transform IN => EXISTS as it is done now.
}

TODO:
The previous version of the WL said that it isn't necessary to change
Item_in_subselect::single_value_transformer() because this WL
will not change the execution of single-row subqueries.

I don't understand why this was decided. Is it known to be generally
more efficient to use IN=>EXISTS for IN with one column?


2.3 Subquery optimization
-------------------------

Currently no cost-based optimization will be performed for subquery
execution. Once the applicability test in 1.1 decides in a rule-based
manner to use hash semi-join to execute an IN predicate, the decision
is irreversible.

The subqueries themselves are optimized in a lazy manner, during the
first execution of the IN predicate. For details see Sect. TODO.


2.4 Code generation
-------------------

We add a new phase that creates and initializes all necessary
structures for hash semi-join execution of IN close to the end of
JOIN::optimize, after all optimizations completed, particularly after
substitute_for_best_equal_field(), so that we get all field references
correct. The general outline of the change in JOIN::optimize is:

bool JOIN::optimize()
{

  .......

  substitute_for_best_equal_field()

  .......

  make_join_readinfo()

  ........


  /* Create all structures needed for materialized subquery execution. */

  if (this is the outer-most query)
  {
    For each subquery SQ_i at any level in the global subquery list
    {
      subquery_predicate = predicate that SQ_i is an operand of
      if (subquery_predicate is IN && subquery_predicate->use_hash_sj)
        if (in_subs->setup_hash_sj_engine()
          return 1; // error
    }
  }

  Create other types of subselect_engines.

  .......................
}


The new method Item_in_subselect::setup_hash_sj_engine is the entry
point for all code generation needed for hash semi-join execution.
The outline of this method is:

Item_in_subselect::setup_hash_sj_engine()
{
  if (the current engine is a single_select_engine)
  {
    set memory root to permanent memory
    current engine = new hash_sj_engine
    perform all permanent initialization of current engine
    restore old memory
  }
  perform all per-execution initialization of the current engine
}

With the method above:
- all code generation is in one place
- there is a clear distinction of what data structures and initialization
  is done in permanent memory, and what is repeated with each subquery
  (re-)execution.


2.4.1 Permanent initialization

All initialization that should live across prepared statements
(re)execution is done in the method:

bool subselect_hash_sj_engine::init_permanent(List of subselect fields)
{
  1. Create/initialize materialization related objects.

    Create and initialize a select result interceptor that stores the
    result stream in a temporary table. The temporary table itself is
    managed (created/filled/etc) internally by the interceptor.

  2. Create/initialize execution related objects.

    - Create the JOIN_TAB to be used by unique_subquery_engine.
    - Set its table to be the temp table with the subquery result.
    - Set its TABLE_REF (JOIN_TAB::ref) to represent index lookup
      into the temp table.
}


2.4.1 Runtime initialization

Due to the fact that the current design is such that the default
subquery engine subselect_single_select_engine recreates and
reoptimizes its JOIN object before each execution, and since we
reuse this object to materialize the subquery, some parts of the
initialization of our class need to be repeated for each execution.

These are seprated in the method:

bool subselect_hash_sj_engine::init_runtime()
{
  /*
    Create and optimize the JOIN that will be used to materialize
    the subquery if not yet created.
  */
  materialize_engine->prepare();

  /* Let our engine reuse this query plan for materialization. */
  materialize_join= materialize_engine->join;
  materialize_join->change_result(result);

  /* Initialize the cache of the left predicate operand. */
  return parent_item->init_left_expr_cache();
}

Above we do the initialization of the left operand cache here, because
the way Cached_item is architected, it needs to be reallocated before
each execution.


3. Explain
==========

TODO

Currently the only change related to explain is the new method:
subselect_hash_sj_engine::print(). One specific thing with it is
that it may be called by DBUG_PRINT before the engine is completely
initialized.


4. Re-engineering
=================

* Must do for this task:

4.1 Move the choice and creation of a subquery execution engine from
    the parsing phase (Item_subselect::init is called at parse time)
    to the prepare phase of the outer select. Most likely this decision
    can be done in Item_subselect::fix_fields and/or
    Item_subselect::init_hash_semi_join().

4.2 Rename (and eventually re-engineer) the 'select_union' result sink
    to a generic 'select_materialize' class that stores all results in
    temporary table.


* Good to do, but not crucial:

4.3 Merge the functionality of Item_in_optimizer with Item_in_subselect,
    and remove Item_in_optimizer.

4.4 Move the optimization of independent subqueries outside of the
    execution phase to either the prepare or optimize phase of the
    outer query.


5. Schedule
===========

5.1 Phases and time estimates

The general approach is to get as quickly as possible to the execution
code in order to reduce the project risk, and then to go back and
refine the conditions under which we choose to use hash semi-join.

Phase 1: Prototyping - unconditional hash semi-join execution
----------------------------------------------------------------------
- Unconditionally set hash semi-join as a subquery execution method -
  via a stub for Item_in_subselect::is_semi_join_applicable().
  completed: 100%
- Unconditionally disable the IN => EXISTS transformation (Sect. 2.2).
  completed: 100%
- Unconditionally choose hash semi-join via a simplified version of
  Item_in_subselect::best_exec_method()
  completed: 50%
- Hook Item_in_subselect::best_exec_method() to the optimizer.
  completed: 100%
  time: 24 h = 3 days

Implement:
- class subselect_hash_semi_join_engine, and methods:
  - subselect_hash_semi_join_engine::init()
    The main risk here is temporary table creation.
    time: 32 h = 4 days
  - subselect_hash_semi_join_engine::exec ()
    The risk here is filling the temporary table.
    time: 24 h = 3 days
- Item_in_subselect::exec() according to LLD.2.0.
  time: 1 h ~= 0
time: 56 h = 7 days

- Add regression tests for IN queries, fix all discovered problems.
  time: 16 h = 2 days

TOTAL: 96 h = 12 days

M1: Query materialization and execution code implemented according to
    the current design, and it is possible to execute typical (simple)
    subqueries with top-level IN. Since hash semi-join is chosen
    unconditionally, other queries will fail (or crash).

=> Submit the code for Review 1.


Phase 2: Design & schedule re-evaluation and code review
----------------------------------------------------------------------
- Update the LLD with more detailed description of temporary table
  creation.
  - 8 hours = 1 day
- Analyze the implementation of Phase 1 for design flaws.
- Update the LLD with more detailed description of other critical parts.
- Update the WL schedule according to lessons from Phase 1.
  - 8 hours = 1 days
- Overall design (LLD) review
  - 8 hours = 1 day
- Post-review 1 and post-re-design code changes.
  time: 16 h = 2 days
TOTAL: 40 hours = 5 days

M2: Updated and approved design & schedule.

Phase 3: applicability conditions and conditional hash semi-join
----------------------------------------------------------------------
- Implement the applicability test per Sect. 2.1
  except for the branch for nested-loops semi-join.
  time: 8 hours = 1 day

M3.a: At this point it should be possible to execute any query with
      subqueries, however I expect to encounter many unforceen problems
      with other old code.

- Add new test cases that test the selection of the new execution
  method. Fix all encountered problems.
  time: 16 hours = 2 days
- Run the regression test suite, investigate the failed test cases, and
  fix problems related to the new code.
  time: 24 hours = 3 days
TOTAL: 48 hours = 6 days

M3: Working implementation of the WL.

=> Submit the code for Review 2.

Phase 4: EXPLAIN support and second code review
----------------------------------------------------------------------
- Add EXPLAIN support, and test cases with the new EXPLAIN output that
  make sure materialization is applied when it had to.
  time: 16 hours = 2 days
- Post-review 2 changes
  time: 40 h = 5 days
TOTAL: 56 hours = 7 days

M4: Selected plan visible in EXPLAIN, all tests pass, code review passed.

Phase 5: additional tests & documentation
----------------------------------------------------------------------
- Add tests for various uses of subqueries inside viewes, update
  statements, etc.
  time: 8 hours
- Write documentation for User's manual.
  time: 8 hours
TOTAL: 16 hours = 2 days

M5: Task complete.

Total time for the whole task:
----------------------------------------------------------------------
optimistic: 256 hours (based on the estimates above)           = 32 days
pessimistic: add 80 hours for unpredicted problems = 328 hours = 42 days


5.2 Plan with dates
-------------------

TODO