WL#7870: Create JOIN object after query preparation

Status: Complete   —   Priority: Medium

The main goal of this worklog is to make it possible to prepare a query without
having a JOIN object available. The JOIN object is hereafter created just before
a query is optimized, and can thus be initialized directly from a prepared
SELECT_LEX object.

An important part of this WL is also to refactor the functions used to prepare,
optimize, execute and explain queries. The reason for this refactoring is that
it turned out to be complicated to carry out the main goal with the inherent
complexity of these functions.

Resolving is refactored so that we now resolve GROUP BY, then HAVING, then ORDER
BY, while previously ORDER BY was resolved before HAVING. This might cause some
queries with aggregation only in HAVING to appear non-aggregated when processing
ORDER BY.

The refactoring also fixes the following bug:

BUG#70553: EXPLAIN UPDATE SHOWS "USING JOIN BUFFER" WHILE IT IS NOT USED
This is a refactoring worklog. A small performance increase is expected because
we avoid some double work (some preparation code was earlier applied both to the
SELECT_LEX object and to the JOIN object), however, it is probably not possible
to recognize. We also eliminate creation of a JOIN object when semi-join and
derived table query blocks are merged into the outer query block.

There are no functional requirements.
Function handle_query()
-----------------------

This function replaces mysql_select(), mysql_union() and handle_select(). It
walks through all phases of query execution for SELECT statements and
multi-table UPDATE and DELETE statements:

 - prepare
 - lock tables
 - optimize
 - execute or explain
 - cleanup

The function takes as argument a THD pointer, a LEX object (representing the
query), a query_result object and a set of query option modifiers.

See LLD for details.

Eliminate existing functions for preparation and optimization.
--------------------------------------------------------------

The following functions are called in various parts of query processing:

- mysql_prepare_select()
- mysql_prepare_and_optimize_select()
- mysql_optimize_prepared_inner_units()
- mysql_union_prepare_and_optimize()

It is possible to simplify processing by removing these functions, adding one
new function and reuse some existing functions for query preparation and
optimization.

The new function is SELECT_LEX::optimize() that optimizes the query block, and
then all inner query expressions not yet optimized.

In addition, a few extra interfaces are added that change the state of the query
objects to prepared, optimized and executed, and interfaces to check these states.

Processing of select_result
---------------------------

Currently a select_result object is created during query preparation and
assigned to the JOIN object. When JOIN is created at a later stage, this is no
longer possible. Instead, the select_result object is attached to the
corresponding SELECT_LEX object with the new interface
SELECT_LEX::set_query_result(). Another interface query_result() is used to
retrieve the current select_result object for a given SELECT_LEX.

Processing of query options
---------------------------

Query options are copied from each SELECT_LEX object, modified with options from
session variables and some operation-specific options, and assigned to the
corresponding JOIN object and later retrieved from here.

We can easily use options directly from the SELECT_LEX object instead of from
the JOIN object. We will first describe the lifecycle of option handling, which
motivates the change, then the new handling will be described.

The lifecycle of option handling:

- When a query block is created, options are set to an empty set.

- After parsing, defined options are attached to designated query blocks.
  It is also checked that statement options are defined only in the
  outer-most, left-most query block.

- When processing a query, JOIN::select_options is formed by taking the union
  of respective SELECT_LEX::options, thd->variables.option_bits (most
  occurrences) and some operation-specific options, and sometimes also
  excluding some base options.

- Queries and subqueries containing UNION are processed further:
   . NO_UNLOCK is added for each query block and the fake when preparing.
   . SETUP_TABLES_DONE is cleared for second and subsequent query blocks in
     INSERT ... SELECT .. UNION queries.
   . UNIT::found_rows_for_union is calculated from FOUND_ROWS.
   . FOUND_ROWS is cleared for those query blocks that have no limit
     or have braces, during optimize()!!!
   . fake select_lex is modified according to UNIT::found_rows_for_union.

- For non-UNION subqueries, NO_UNLOCK is added.

- During analysis of a query block, SELECT_DISTINCT may be removed.

The new options handling:

Options will be separated into base_options and active_options.

base_options are created from options specified in the query, but may be
modified as follows:
- SELECT ... INSERT adds OPTION_BUFFER_RESULT to the outer-most query block.
- mysql_schema_table() adds OPTION_SCHEMA_TABLE to the local query block.
- mysql_make_view() propagates OPTION_TO_QUERY_CACHE to main query.

The interfaces add_base_options() and remove_base_options() are introduced
to manipulate base options.

active_options are created for each execution of the query based on
base_options, statement and session variables.

handle_query() sets active_options for the outer-most query block by
adding base_options together with the options provided by the session
(thd->variables.option_bits) and a statement-specific set of options:
- SELECT ... CREATE: adds SELECT_NO_UNLOCK.
- SELECT ... INSERT: adds OPTION_SETUP_TABLES_DONE and SELECT_NO_UNLOCK.
- MULTI DELETE: adds SELECT_NO_JOIN_CACHE, SELECT_NO_UNLOCK and
                     OPTION_SETUP_TABLES_DONE

In addition MULTI DELETE removes OPTION_BUFFER_RESULT.

SELECT_DESCRIBE is also added for all statements that are explained.

For UNION queries, the active_options are formed in SELECT_LEX_UNIT::prepare()
by adding and removing options passed to the calling handle_query().

The options for fake_select_lex is made from the base_options, possibly
OPTION_FOUND_ROWS and OPTION_NO_CONST_TABLES.

For scalar subqueries, active_options is set by adding SELECT_NO_UNLOCK.

active_options also needs to be adjusted in one case: OPTION_FOUND_ROWS
may be dynamically added or removed as part of UNION processing.

The interface make_active_options() is used to form the active options from the
base options, plus modifications. In addition, add_active_options() and
remove_active_options() can later be used to change the set of active options.

The code never accesses base_options, only active_options. The options are
accessed through two interfaces: SELECT_LEX::active_options() and
JOIN::options(). JOIN::options() simply forward to the corresponding
SELECT_LEX::active_options(), so there is no confusion over which option set
is used.

Processing of item_list
-----------------------

The third object that is assigned to the JOIN object during query preparation is
the item list, which is assigned as JOIN::fields_list. It is supplied as follows:

SELECT: select_lex->item_list
INSERT: select_lex->item_list
multi-UPDATE: empty list (total_list)
multi-DELETE: select_lex->item_list (emptied)

These objects are set when a query is prepared.

The ref array
-------------

Slice zero of the ref array is populated during query preparation, whereas
slices 1 through 4 are populated during query optimization. Hence, the ref array
needs to be available both from SELECT_LEX and JOIN, in particular slice zero.

The other slices need only be available from the JOIN object.

Explaining queries
------------------

After this refactoring, explain queries for own connection are handled
completely by handle_query(), ie this function will prepare, optimize and
explain complete queries.

Explain queries for other connections are assumed to be prepared and possibly
optimized in the other connection, before being explained by
mysql_explain_other(), which will eventually call explain_query().
Overall query processing
------------------------

execute_command() no longer calls mysql_select(), handle_select() and
explain_query(). All these calls are replaced with calls to handle_query().
EXPLAIN of a query is indicated by setting the EXPLAIN bit of the options flag.

As indicated in HLS, handle_query() goes through five phases for each query:
preparation, table locking, optimization, execution/explain and cleanup.

For efficiency, we perform general processing for a UNION query (in the
preparation, optimization and execution stages), and a specialized and more
efficient processing path for non-UNION queries.

Thus, we call SELECT_LEX_UNIT::prepare() to prepare a UNION query and
SELECT_LEX::prepare() to prepare a non-UNION query. Similar shortcuts are done
for optimization and execution as well.

SELECT_LEX_UNIT changes
-----------------------

These interfaces have been added:

- void set_prepared()
- bool is_prepared()
- void set_optimized()
- bool is_optimized()
- void set_executed()
- void reset_executed()
- bool is_executed()

The functions are used to set, reset and check the states of query expressions
(SELECT_LEX_UNITs). We enforce strict phase sequencing, so the optimize()
function will assert that a unit has been prepared, and the execute() function
will assert that the unit has been optimized. An exception exists within
subqueries, which still apply on-demand optimization for DO and SET commands.

reset_executed() is used after one execution of a query block, to prepare for
the next. It is necessary when execution a subquery block multiple times during
an overall query execution.

These interfaces have been deleted:

first_select_prepared(). This function is replaced with call to
SELECT_LEX_UNIT::is_prepared().

Changed interfaces:

bool prepare(thd, result, added_options, removed_options)

added_options and removed_options are used to modify the base_options to form
active_options.

The function no longer need to create JOIN objects before preparing SELECT_LEX
objects.

Preparation of fake_select_lex has been simplified: It is now handled by
prepare_fake_select_lex() instead of being spread over
init_prepare_fake_select_lex() and fake_select_lex::prepare().

bool optimize(thd)

For symmetry, the function now takes a THD pointer as argument.

As optimization creates a JOIN object, we no longer need to reset a JOIN object
before optimizing the unit.

bool execute(thd)

For symmetry, the function now takes a THD pointer as argument.

bool prepare_fake_select_lex(thd)

This call wraps the preparation call for the fake_select_lex object. It replaces
the old init_prepare_fake_select_lex() function.

void set_query_result(query_result)

Set the query result object do be used for this query expression.

select_result *query_result()

Return the query result object for the query expression. If this is NULL, assume
that the query expression consists of a single query specification, meaning that
the query result object can be fetched from the first (and only) underlying
SELECT_LEX object. See also SELECT_LEX::query_result() for more information.

SELECT_LEX changes
------------------

These interfaces have been added:

- void set_query_result(query_result)
- select_result *query_result()
- bool change_query_result(new_result, old_result)

These interfaces are used to set and retrieve the "result" object for a query
block. Generally, the result object is determined by the operation (SELECT,
UPDATE, DELETE and INSERT), and is also specific for subqueries. The
operation-specific result objects are assigned through the handle_query()
interface, whereas result objects for subqueries are assigned as part of
subquery preparation.

- empty_order_list() is a new member function of SELECT_LEX that completely
  removes the ORDER BY list of a query.

The SELECT_LEX::prepare() function has been refactored significantly. This was a
natural process as all the JOIN references have been removed from this function.

- return value is converted from int to bool.
- All JOIN initializations are moved to SELECT_LEX::optimize() and
  JOIN::optimize().
- setup_without_group() is inlined to avoid parameter passing.
- Resolve order is changed: GROUP BY, then HAVING, then ORDER BY.
- Local state needed for cleanup is eliminated when appropriate.
  (If an error is reported, we need not restore e.g allow_sum_func, as the
   object is discarded after the error.)

Interfaces used to process query options:

- void add_base_options(options)
- void remove_base_options(options)
- void make_active_options(added_options, removed_options)
- void add_active_options(options)
- void remove_active_options(options)
- ulonglong active_options()

These functions are used to set and modify the base options and active options
of a query block, and fetch the currently active options.

Functions that have been converted to SELECT_LEX member functions:

- change_group_ref()
- setup_group()
- remove_redundant_subquery_clauses()

Changes needed to assign JOIN during optimization
-------------------------------------------------

- JOIN constructor takes THD and SELECT_LEX pointers.
- Access all_fields from SELECT_LEX, not JOIN.
- subselect_engine no longer has a JOIN pointer.
- transformer functions take SELECT_LEX as argument instead of JOIN.
- SELECT_LEX contains ref_ptr, which is duplicated in JOIN.
- count_field_types() only need to be called from optimizer.
- JOIN::primary_tables can be replaced with SELECT_LEX::leaf_table_count in
  the resolver phase.
- ROLLUP processing is split into preparation part and optimization part.
- JOIN::change_result() is replaced by SELECT_LEX::change_query_result().

Changes to EXPLAIN
------------------

- explain_query() changes:
  . Pass in result always, except when explaining for "other" thread
  . Implement need_explain_interceptor()
  . Call to mysql_union_prepare_and_optimize() is no longer needed
    (when explaining own query, it is always in optimized state)
  . management of result object is self-contained

- mysql_explain_unit()

Result objects are always attached up-front, so there is no need to pass result
object to this function.

Changes in subquery resolving and transformation
------------------------------------------------

These interfaces are changed so that they no longer access a JOIN object.
Instead, they now take a SELECT_LEX pointer as argument, specifying the query
block that the subquery belongs to:

- Item_subselect::select_transformer(select)
- Item_singlerow_subselect::select_transformer(select)
- Item_in_subselect::select_transformer(select)
- Item_in_subselect::select_in_like_transformer(select)
- Item_in_subselect::single_value_transformer(select)
- Item_in_subselect::single_value_in_to_exists_transformer(select)
- Item_in_subselect::row_value_transformer(select)
- Item_in_subselect::row_value_in_to_exists_transformer(select)
- Item_allany_subselect::select_transformer(select)

Miscellaneous changes
---------------------

- Consistency in names and arguments:
  . exec() is renamed to execute()
  . All optimize() and execute() functions take THD pointer as argument

- mysql_insert: Call explain_query() after locking tables and pruning partitions.