WL#7870: Create JOIN object after query preparation
Status: Complete — Priority: Medium
The main goal of this worklog is to make it possible to prepare a query without having a JOIN object available. The JOIN object is hereafter created just before a query is optimized, and can thus be initialized directly from a prepared SELECT_LEX object. An important part of this WL is also to refactor the functions used to prepare, optimize, execute and explain queries. The reason for this refactoring is that it turned out to be complicated to carry out the main goal with the inherent complexity of these functions. Resolving is refactored so that we now resolve GROUP BY, then HAVING, then ORDER BY, while previously ORDER BY was resolved before HAVING. This might cause some queries with aggregation only in HAVING to appear non-aggregated when processing ORDER BY. The refactoring also fixes the following bug: BUG#70553: EXPLAIN UPDATE SHOWS "USING JOIN BUFFER" WHILE IT IS NOT USED
This is a refactoring worklog. A small performance increase is expected because we avoid some double work (some preparation code was earlier applied both to the SELECT_LEX object and to the JOIN object), however, it is probably not possible to recognize. We also eliminate creation of a JOIN object when semi-join and derived table query blocks are merged into the outer query block. There are no functional requirements.
Function handle_query() ----------------------- This function replaces mysql_select(), mysql_union() and handle_select(). It walks through all phases of query execution for SELECT statements and multi-table UPDATE and DELETE statements: - prepare - lock tables - optimize - execute or explain - cleanup The function takes as argument a THD pointer, a LEX object (representing the query), a query_result object and a set of query option modifiers. See LLD for details. Eliminate existing functions for preparation and optimization. -------------------------------------------------------------- The following functions are called in various parts of query processing: - mysql_prepare_select() - mysql_prepare_and_optimize_select() - mysql_optimize_prepared_inner_units() - mysql_union_prepare_and_optimize() It is possible to simplify processing by removing these functions, adding one new function and reuse some existing functions for query preparation and optimization. The new function is SELECT_LEX::optimize() that optimizes the query block, and then all inner query expressions not yet optimized. In addition, a few extra interfaces are added that change the state of the query objects to prepared, optimized and executed, and interfaces to check these states. Processing of select_result --------------------------- Currently a select_result object is created during query preparation and assigned to the JOIN object. When JOIN is created at a later stage, this is no longer possible. Instead, the select_result object is attached to the corresponding SELECT_LEX object with the new interface SELECT_LEX::set_query_result(). Another interface query_result() is used to retrieve the current select_result object for a given SELECT_LEX. Processing of query options --------------------------- Query options are copied from each SELECT_LEX object, modified with options from session variables and some operation-specific options, and assigned to the corresponding JOIN object and later retrieved from here. We can easily use options directly from the SELECT_LEX object instead of from the JOIN object. We will first describe the lifecycle of option handling, which motivates the change, then the new handling will be described. The lifecycle of option handling: - When a query block is created, options are set to an empty set. - After parsing, defined options are attached to designated query blocks. It is also checked that statement options are defined only in the outer-most, left-most query block. - When processing a query, JOIN::select_options is formed by taking the union of respective SELECT_LEX::options, thd->variables.option_bits (most occurrences) and some operation-specific options, and sometimes also excluding some base options. - Queries and subqueries containing UNION are processed further: . NO_UNLOCK is added for each query block and the fake when preparing. . SETUP_TABLES_DONE is cleared for second and subsequent query blocks in INSERT ... SELECT .. UNION queries. . UNIT::found_rows_for_union is calculated from FOUND_ROWS. . FOUND_ROWS is cleared for those query blocks that have no limit or have braces, during optimize()!!! . fake select_lex is modified according to UNIT::found_rows_for_union. - For non-UNION subqueries, NO_UNLOCK is added. - During analysis of a query block, SELECT_DISTINCT may be removed. The new options handling: Options will be separated into base_options and active_options. base_options are created from options specified in the query, but may be modified as follows: - SELECT ... INSERT adds OPTION_BUFFER_RESULT to the outer-most query block. - mysql_schema_table() adds OPTION_SCHEMA_TABLE to the local query block. - mysql_make_view() propagates OPTION_TO_QUERY_CACHE to main query. The interfaces add_base_options() and remove_base_options() are introduced to manipulate base options. active_options are created for each execution of the query based on base_options, statement and session variables. handle_query() sets active_options for the outer-most query block by adding base_options together with the options provided by the session (thd->variables.option_bits) and a statement-specific set of options: - SELECT ... CREATE: adds SELECT_NO_UNLOCK. - SELECT ... INSERT: adds OPTION_SETUP_TABLES_DONE and SELECT_NO_UNLOCK. - MULTI DELETE: adds SELECT_NO_JOIN_CACHE, SELECT_NO_UNLOCK and OPTION_SETUP_TABLES_DONE In addition MULTI DELETE removes OPTION_BUFFER_RESULT. SELECT_DESCRIBE is also added for all statements that are explained. For UNION queries, the active_options are formed in SELECT_LEX_UNIT::prepare() by adding and removing options passed to the calling handle_query(). The options for fake_select_lex is made from the base_options, possibly OPTION_FOUND_ROWS and OPTION_NO_CONST_TABLES. For scalar subqueries, active_options is set by adding SELECT_NO_UNLOCK. active_options also needs to be adjusted in one case: OPTION_FOUND_ROWS may be dynamically added or removed as part of UNION processing. The interface make_active_options() is used to form the active options from the base options, plus modifications. In addition, add_active_options() and remove_active_options() can later be used to change the set of active options. The code never accesses base_options, only active_options. The options are accessed through two interfaces: SELECT_LEX::active_options() and JOIN::options(). JOIN::options() simply forward to the corresponding SELECT_LEX::active_options(), so there is no confusion over which option set is used. Processing of item_list ----------------------- The third object that is assigned to the JOIN object during query preparation is the item list, which is assigned as JOIN::fields_list. It is supplied as follows: SELECT: select_lex->item_list INSERT: select_lex->item_list multi-UPDATE: empty list (total_list) multi-DELETE: select_lex->item_list (emptied) These objects are set when a query is prepared. The ref array ------------- Slice zero of the ref array is populated during query preparation, whereas slices 1 through 4 are populated during query optimization. Hence, the ref array needs to be available both from SELECT_LEX and JOIN, in particular slice zero. The other slices need only be available from the JOIN object. Explaining queries ------------------ After this refactoring, explain queries for own connection are handled completely by handle_query(), ie this function will prepare, optimize and explain complete queries. Explain queries for other connections are assumed to be prepared and possibly optimized in the other connection, before being explained by mysql_explain_other(), which will eventually call explain_query().
Overall query processing ------------------------ execute_command() no longer calls mysql_select(), handle_select() and explain_query(). All these calls are replaced with calls to handle_query(). EXPLAIN of a query is indicated by setting the EXPLAIN bit of the options flag. As indicated in HLS, handle_query() goes through five phases for each query: preparation, table locking, optimization, execution/explain and cleanup. For efficiency, we perform general processing for a UNION query (in the preparation, optimization and execution stages), and a specialized and more efficient processing path for non-UNION queries. Thus, we call SELECT_LEX_UNIT::prepare() to prepare a UNION query and SELECT_LEX::prepare() to prepare a non-UNION query. Similar shortcuts are done for optimization and execution as well. SELECT_LEX_UNIT changes ----------------------- These interfaces have been added: - void set_prepared() - bool is_prepared() - void set_optimized() - bool is_optimized() - void set_executed() - void reset_executed() - bool is_executed() The functions are used to set, reset and check the states of query expressions (SELECT_LEX_UNITs). We enforce strict phase sequencing, so the optimize() function will assert that a unit has been prepared, and the execute() function will assert that the unit has been optimized. An exception exists within subqueries, which still apply on-demand optimization for DO and SET commands. reset_executed() is used after one execution of a query block, to prepare for the next. It is necessary when execution a subquery block multiple times during an overall query execution. These interfaces have been deleted: first_select_prepared(). This function is replaced with call to SELECT_LEX_UNIT::is_prepared(). Changed interfaces: bool prepare(thd, result, added_options, removed_options) added_options and removed_options are used to modify the base_options to form active_options. The function no longer need to create JOIN objects before preparing SELECT_LEX objects. Preparation of fake_select_lex has been simplified: It is now handled by prepare_fake_select_lex() instead of being spread over init_prepare_fake_select_lex() and fake_select_lex::prepare(). bool optimize(thd) For symmetry, the function now takes a THD pointer as argument. As optimization creates a JOIN object, we no longer need to reset a JOIN object before optimizing the unit. bool execute(thd) For symmetry, the function now takes a THD pointer as argument. bool prepare_fake_select_lex(thd) This call wraps the preparation call for the fake_select_lex object. It replaces the old init_prepare_fake_select_lex() function. void set_query_result(query_result) Set the query result object do be used for this query expression. select_result *query_result() Return the query result object for the query expression. If this is NULL, assume that the query expression consists of a single query specification, meaning that the query result object can be fetched from the first (and only) underlying SELECT_LEX object. See also SELECT_LEX::query_result() for more information. SELECT_LEX changes ------------------ These interfaces have been added: - void set_query_result(query_result) - select_result *query_result() - bool change_query_result(new_result, old_result) These interfaces are used to set and retrieve the "result" object for a query block. Generally, the result object is determined by the operation (SELECT, UPDATE, DELETE and INSERT), and is also specific for subqueries. The operation-specific result objects are assigned through the handle_query() interface, whereas result objects for subqueries are assigned as part of subquery preparation. - empty_order_list() is a new member function of SELECT_LEX that completely removes the ORDER BY list of a query. The SELECT_LEX::prepare() function has been refactored significantly. This was a natural process as all the JOIN references have been removed from this function. - return value is converted from int to bool. - All JOIN initializations are moved to SELECT_LEX::optimize() and JOIN::optimize(). - setup_without_group() is inlined to avoid parameter passing. - Resolve order is changed: GROUP BY, then HAVING, then ORDER BY. - Local state needed for cleanup is eliminated when appropriate. (If an error is reported, we need not restore e.g allow_sum_func, as the object is discarded after the error.) Interfaces used to process query options: - void add_base_options(options) - void remove_base_options(options) - void make_active_options(added_options, removed_options) - void add_active_options(options) - void remove_active_options(options) - ulonglong active_options() These functions are used to set and modify the base options and active options of a query block, and fetch the currently active options. Functions that have been converted to SELECT_LEX member functions: - change_group_ref() - setup_group() - remove_redundant_subquery_clauses() Changes needed to assign JOIN during optimization ------------------------------------------------- - JOIN constructor takes THD and SELECT_LEX pointers. - Access all_fields from SELECT_LEX, not JOIN. - subselect_engine no longer has a JOIN pointer. - transformer functions take SELECT_LEX as argument instead of JOIN. - SELECT_LEX contains ref_ptr, which is duplicated in JOIN. - count_field_types() only need to be called from optimizer. - JOIN::primary_tables can be replaced with SELECT_LEX::leaf_table_count in the resolver phase. - ROLLUP processing is split into preparation part and optimization part. - JOIN::change_result() is replaced by SELECT_LEX::change_query_result(). Changes to EXPLAIN ------------------ - explain_query() changes: . Pass in result always, except when explaining for "other" thread . Implement need_explain_interceptor() . Call to mysql_union_prepare_and_optimize() is no longer needed (when explaining own query, it is always in optimized state) . management of result object is self-contained - mysql_explain_unit() Result objects are always attached up-front, so there is no need to pass result object to this function. Changes in subquery resolving and transformation ------------------------------------------------ These interfaces are changed so that they no longer access a JOIN object. Instead, they now take a SELECT_LEX pointer as argument, specifying the query block that the subquery belongs to: - Item_subselect::select_transformer(select) - Item_singlerow_subselect::select_transformer(select) - Item_in_subselect::select_transformer(select) - Item_in_subselect::select_in_like_transformer(select) - Item_in_subselect::single_value_transformer(select) - Item_in_subselect::single_value_in_to_exists_transformer(select) - Item_in_subselect::row_value_transformer(select) - Item_in_subselect::row_value_in_to_exists_transformer(select) - Item_allany_subselect::select_transformer(select) Miscellaneous changes --------------------- - Consistency in names and arguments: . exec() is renamed to execute() . All optimize() and execute() functions take THD pointer as argument - mysql_insert: Call explain_query() after locking tables and pruning partitions.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.