WL#9384: Prepare each DML statement once
This is part of Refactor DML statement preparation. It depends on the refactoring done in WL#5094: Create SQL command classes for DML statements.
WL#5094 created a uniform interface to all DML statements (except SET and CALL), which implemented a prepare() and an execute() interface for all such statements. Non-prepared statements are executed using the execute() interface, which internally calls the corresponding prepare() function before executing the statement.
However, prepared statements are still prepared once for every execution. The goal of this WL is that prepared statements are prepared only once: when the PREPARE command is performed on the statement. Likewise, statements inside stored procedures will also be prepared once, on their first execution.
The benefits of this work are:
- performance enhancement: avoid costly preparation per execution.
- code simplification: avoid tedious rollback of preparation structures, a source of several bugs.
As part of the work, the way dynamic parameters used in prepared statements are resolved will be changed. This will necessarily cause some behavioral changes in some prepared statement use cases, but we try to keep such changes at a minimum by performing dynamic repreparation of statements when actual dynamic parameter values are not in compliance with the resolved data types. The implementation is however tuned to avoid unnecessary repreparations, since they will affect performance negatively.
FR1 When a preparable statement is prepared, all dynamic parameters will have a derived data type depending on statement context and tables involved in the statement. These rules are carefully selected to be SQL standard compliant and cause as few repreparations as necessary (see FR2).
FR1.1 A dynamic parameter that is an operand of a binary arithmetic operator gets the same data type as the second operand of that operator.
FR1.2 If both operands of a binary arithmetic operator are parameters, the type of the parameters is decided by the type-determining context of the operator.
FR1.3 If a dynamic parameter is an operand of a unary arithmetic operator, the type of the parameter is decided by the type-determining context of the operator.
FR1.4 If an arithmetic operator has no type-determining context (e.g. it is a top-level node in a SELECT list or it is part of a comparison operator), the derived type for the involved parameters are DOUBLE PRECISION.
FR1.5 A dynamic parameter that is an operand of a character string operator gets the same derived type as the aggregated type of the other operands of the operator. If all operands of the operator are parameters, the derived type is VARCHAR with connection collation.
FR1.6 A dynamic parameter that is an operand of a temporal operator gets type DATETIME if the operator returns a DATETIME, TIME if the operator returns a TIME, and DATE if the operator returns a DATE.
FR1.7 A dynamic parameter that is an operand of a binary comparison operator gets the same derived type as the other operand of the comparison.
FR1.8 A dynamic parameter that is an operand of a ternary comparison operator (e.g. BETWEEN), the operand is given the same derived type as the aggregated type of the other operands.
FR1.9 If all operands of a comparison operator are dynamic parameters, the derived type for all parameters is VARCHAR with connection collation.
FR1.10 A dynamic parameter that is an output operand of a CASE-derived operator has the same derived type as the aggregated type of the other output operands of the operator. CASE-derived operators are CASE, COALESCE, IF, IFNULL, NULLIF.
FR1.11 If all output operands of a CASE-derived operator are dynamic parameters, or the NULL value, the type of the parameter is decided by the type-determining context of the operator.
FR1.12 If a CASE-derived operator has no type-determining context, the derived type for the involved parameters are VARCHAR with connection collation.
FR1.13 A dynamic parameter that is the operand of a CAST node is given the same derived type as the CAST specification.
FR1.13 If a dynamic parameter is an immediate member of a SELECT list that is not part of an INSERT statement, the derived type of the parameter is VARCHAR with connection collation.
FR1.14 If a dynamic parameter is an immediate member of a SELECT list that is part of an INSERT statement, the derived type of the parameter is the type of the corresponding column the parameter is inserted into.
FR1.15 If a dynamic parameter is used as source for an assignment in a SET clause of an UPDATE statement or in the ON DUPLICATE KEY UPDATE clause of an INSERT statement, the derived type of the parameter is the type of the corresponding column that is updated.
FR1.16 If a dynamic parameter is an argument of a function, the derived type is depending on the type of the function. There are too many functions to list in the functional requirements section, see actual code and test cases for details.
FR2 When a prepared statement is executed, all dynamic parameters will have an actual type. For most actual types, the statement can be executed immediately, however for some combinations of actual type and derived type, an automatic re-preparation of the statement is triggered, to ensure closer compatibility with previous versions of MySQL.
FR2.1 If a NULL value is used as actual parameter, no repreparation is performed.
FR2.2 If a dynamic parameter is an operand of a CAST node, no repreparation occurs, regardless of type. Instead, a cast into the derived type is attempted, and an exception is raised in case of failure.
FR2.3 If an actual dynamic parameter is a character string, no repreparation is performed. Instead, a cast operation is performed as if CAST(? AS <derived type>) was specified.
FR2.4 If the derived type is INTEGER, the actual type is INTEGER and the sign of the derived type is the same as the sign of the actual type, no repreparation occurs.
FR2.5 If the derived type is DECIMAL and the actual type is either INTEGER or DECIMAL, no repreparation occurs.
FR2.6 If the derived type is DOUBLE and the actual type is any numeric type, no repreparation occurs.
FR2.7 If the derived type is character string and the actual type is character string, no repreparation occurs.
FR2.8 If the derived type is temporal and the actual type is temporal (unless the derived type is TIME and the actual type is not a TIME, or the derived type is DATE and the actual type is not a DATE), no repreparation occurs.
FR2.9 if the derived type is temporal and the actual type is numeric, no prepreparation occurs.
FR2.10 If none of the FRs 2.1 to 2.9 hold, the statement is reprepared and the actual parameter types are used instead of the derived parameter types. One exception is if the parameter is a so-called pinned type, in which case an exception is raised. Pinned parameters are:- LIMIT and OFFSET in query specifications. - ROWS PRECEDING and ROWS FOLLOWING in window functions.
FR3 Proper metadata about dynamic parameters can be retrieved over the client/server protocol. Earlier all dynamc variables were reported as being of type character string.
FR4 Metadata for system variables of type character string have changed: They are now defined as VARCHAR(21845) (maximum size of a VARCHAR with UTF8MB3 character set). When used in CREATE TABLE statements, columns defined by selecting a system variable becomes type TEXT.
FR5 Invalid parameter values (range errors, truncation errors, character set conversion errors, temporal component errors) shall cause exceptions, regardless of strict SQL mode, except that NO_ZERO_IN_DATE, NO_ZERO_DATE and INVALID_DATES are adhered to.
NF1 Simple prepared statements should see significant speedup when executed multiple times. In particular statements that access max 1 row per table.
NF2 Regular (non-prepared) statements might see a marginal increase in execution time, limited to 2%.
1. State management of Sql_cmd objects
We distinguish between three classes of SQL statement, depending on how they are prepared and executed.
- regular statements, these are prepared and immediately executed and deleted.
- prepared statements, these are explicitly prepared, may be executed multiple times, and explicitly deleted. Notice also that if metadata of underlying tables, views or functions change, the statement is transparently reprepared. They may take dynamic parameters, whose values are taken from client programs (values sent over the client/server protocol), or from user variables.
- statements within stored routines: these are managed a lot like prepared statements: They are prepared on first execution. They may then be executed repeatedly, perhaps several times per execution of the routine they are contained within. They are deleted when the containing statement is deleted. Like prepared statements, transparent repreparation on metadata changes may occur. Dynamic parameter values may be taken from local variables and user variables.
Basically, Sql_cmd_dml::execute() is enhanced so that it can execute a statement that is already prepared. (This was not necessary before, because a preparation was an implicit part of execution).
See documentation of class Sql_cmd and Sql_cmd_dml for detailed changes.
2. Change lifecycle management of Item objects
Item objects will always be resolved with Item::fix_fields() before they are used (evaluated). There are some rare exceptions for e.g. constant items that are resolved by the constructor.
During resolving, no evaluation of data that may change between executions must take place. This is accomplished by checking Item::const_item(): If this function returns true, the value of the Item will be constant throughout its lifetime and may be evaluated and cached during resolving.
After an Item has been resolved, it can be used in execution. An Item's evaluation functions may be called repeatedly during an execution, and the code must ensure that values from one evaluation do not interfere with the next evaluation.
After an execution is complete, Item::cleanup() must be called for each Item to clean up any resources that may have been allocated during execution. Item::cleanup() will not change the resolved state of the Item (the fixed flag). After this cleanup, the Item is ready for another execution (but notice that some Items, such as Item_field, need some dynamically bound data before re-execution may be done). Notice that Item::cleanup() must preserve data set up by preparation, e.g Item_field::cleanup() will in most cases preserve the table_ref field, since the associated TABLE_LIST object is permanently part of the statement after resolving. On the other hand, the field pointer is in most cases set to NULL, since it points into a Field object belonging to a TABLE object that is detached after execution.
When a statement is deleted, the destructor must be called for all Items within the statement.
For a regular (non-prepared) statement, all Items are handled like this:
- The constructor creates an Item. - Item::fix_fields() is called to resolve the Item. - The evaluation functions for the Item are called zero or more times. - Item::cleanup is called after execution. - Query_arena::free_items() is called, which again calls the destructor for all Items in the list. (the statement is deleted immediately after execution).
For a prepared statement, Items are handled as follows during preparation:
- The constructor creates an Item. - Item::fix_fields() is called when the statement is prepared. - Item::cleanup is called immediately after preparation.
For each execution of a prepared statement, the following calls are made:
- Item::bind_fields() is called to bind with dynamically allocated resources, such as TABLE objects. - The evaluation functions for the Item are called zero or more times. - Item::cleanup() is called to free any allocated resources.
When the prepared statement is deleted, Query_arena::free_items() is called, which calls all Item destructors.
The lifecycle of Items inside stored routine statements is practically identical to the Items in a prepared statement.
We support transformations on Item trees by adding or substituting new Item nodes into the Item tree. These transformations can be permanent (valid for the lifetime of a statement) or transient (valid for one optimization and execution, and must be rolled back at some point in time).
We have traditionally supported transient and permanent transformations on statement preparation, but transient transformations are no longer needed when we prepare each statement only once.
However, we will still need to support transient transformations during optimization, since they have to be redone for every optimization. Notice though that the memory problems that traditionally have ridden this feature should be much smaller, since all optimization transformations are applied to the runtime mem_root only.
3. Resolve Item_param with constant metadata
Item_param objects are used to represent dynamic parameters in prepared statements. Traditionally, an Item_param object has not been resolved during statement preparation, but rather in the prepare() call from within execution. Metadata has been determined from the actual data provided from a connector or from user variables, and might even change during execution.
Since we only call Item_param::fix_fields() once, and we would generally want metadata to be constant during the lifetime of a statement, we will enforce constant metadata in fix_fields(). As default, metadata types will be propagated based on expression context, wherever such context exists. If there is no context, a default type will be provided. This is a process that is very close to how the SQL standard mandates that parameter type resolution should occur, except that the SQL standard would issue an error when type cannot be inferred from the context.
See section 4 for details on how type propagation is performed.
When a statement is executed, we walk through all Item_param objects and provide actual data values. The type of these data values are determined and provided as the actual values metadata. The actual metadata are then checked against the resolved metadata to see if the actual data can be used with the parameters. E.g, if an integer value is expected and the decimal value 3.14 is specified, the value would be truncated. If an unsigned integer value is expected and a negative number is specified, a range violation occurs. In this case, the actual data cannot be used without inducing result errors.
Traditionally, MySQL has been very relaxed about what actual data types can be used as dynamic parameters, and we want to continue this practice as much as possible. Thus, if truncations, range violations or other errors occur when assigning actual parameter values, we enforce an automatic reprepare of the statement and override the resolved metadata with the types of the actual metadata. Thus, we can optimize the statement with compliant metadata.
Note however that in some circumstances, we do not reprepare the statement. Some syntactical clauses require a specific data type, e.g LIMIT and OFFSET clauses must be integers, and the values must be non-negative. We call these data types "pinned", and we do no automatic reprepare when actual data do not match these data types. Instead, we enforce strict value and range checking at execution time for these values. Currently, we support pinning for INTEGER data types, both signed and unsigned ones. Actual parameters must be of type integer (signed or unsigned), and the actual value must fit the range of the declared type. An error is reported otherwise.
Some metadata are also classified as "inherited". They are typically used with CAST operations, where a parameter argument is marked with the same type as the type of the CAST node. On execution, all data types are possible for the dynamic parameter, and a dynamic check is made to see if the conversion from the actual data to the CAST data type is really possible. If not, an exception is raised. This may be considered a dynamic vs a static cast: A regular CAST from e.g. an integer expression into a character string is a static cast, since both the source and target data types are determined when the statement is resolved. But a CAST of a dynamic parameter may be considered a dynamic cast, since only the target data type is determined when the statement is resolved, the source data type is determined by the type of the actual parameter value at execution time.
As a consequence, Item_param::basic_const_item() no longer returns true, and Item_param::used_tables() returns INNER_TABLE_BIT (ie., const_for_execution() is true for Item_param). Item_param::type() returns PARAM_ITEM always, and hence Item_param::item_type is removed.
Item_param::result_type() is derived from Item_param::data_type().
Parameters are always nullable.
See sections 5 and 6 for more details about the process of deriving the type of dynamic parameters.
4. Propagate type from contextual item(s) into parameter items
- Set the initial data type of Item_param as MYSQL_TYPE_INVALID if no value is present (this is applicable for the initial prepare of an Item_param).
- When the type of a parameter item is resolved as MYSQL_TYPE_INVALID (aka "undefined", we must look into the context, or use a default type, to determine the resolved data type. Apply rules below until the parameter's type is defined.
* The operator itself may impose a type on the argument. For a CAST operator, the argument is given the same type as the CAST type. The metadata is also marked as "inherited". For any other operator or function, this type may be decided based on the type of other arguments with defined type (as in "?=3.6", the parameter will be DECIMAL), or based on how the argument will be used (in "COS(?)", COS always evaluates the parameter as DOUBLE, so the type is defined to be DOUBLE; in CONCAT(?), the parameter gets type VARCHAR), or the type may be a default. * Some operators are not capable of picking a default. For example in "?+?", the "+" operator may evaluate its arguments as INT, DOUBLE or DECIMAL, and that influences the output type of "+"; so the types of the parameters are not definable by the operator; the full list of such operators is given at the end of this paragraph. In that case we look at the operator's outer context, because how the operator's output is used suggests what type is intended to be used inside the operator. * For a unary operator on a parameter, type resolving is dependent on the operator's outer context. * For a binary operator where one argument has a deterministic type, apply the same type to the parameter argument. * For a binary operator where no arguments have deterministic types, type resolving is dependent on the operator's outer context. * If the operator is included in a parent operator or function, the parent provides the needed context. * If there is no parent operator or function, an outer context is provided like this: - If the operator is arithmetic (such as '+'), the context is DOUBLE. - If the operator is string (such as ||), the context is VARCHAR with connection character set. - If there is no operator, ie when simply referencing a parameter or an expression containing a parameter in the SELECT or GROUP BY or ORDER BY list, the data type is VARCHAR with connection character set. * For a comparison predicate (special case of binary operator with boolean type) where all arguments are parameters, the data type is VARCHAR with connection character set. * For a BETWEEN operator, resolve type based on common type of non-parameter arguments, then apply same type to parameter arguments. If all arguments are parameters, the data type for each is VARCHAR with connection character set. * For a CASE operator or similar, determine the type based on the non-parameter argument(s) and apply the same type to the parameter arguments. If all arguments are parameters, we are dependent on the outer context of the CASE operator. If there is no outer context, the data type for each parameter is VARCHAR with connection character set. * For functions that have deterministic argument types (e.g. CHAR_LENGTH, where the argument is a character string), set the data type of the parameter to be this type. * For functions that do not have deterministic argument types (e.g. ABS), type resolving is dependent on the outer context of the function. If there is no outer context, set type based on the type of function (e.g. DOUBLE for arithmetic, VARCHAR for string, etc). * Any predicates for which all arguments are undefined will have the type of all arguments set as VARCHAR with connection character set (the predicate node itself has a boolean type). * For Item trees where the root Item object has undefined data type after the above procedure is performed, all underlying parameter objects are given the same type: If the root node is an arithmetic node, the type given is double precision. If the root node is a string function or operator, the type given is character string. If the root node is a temporal function, the type given is the temporal type. If the root node is a simple parameter, the type given is character string when used in the select list and boolean (integer) when used in a search condition.- When type propagation is dependent on the context, processing occurs as follows for a given operator node:
* After regular type resolution, all arguments have undefined types, so Item::resolve_type() leaves without resolving the type for the Item representing the operator. * Some outer Item will determine the type and call Item::propagate_parameter_type() for this node. * The node will propagate this type to its arguments, so that all of them gets the same type derivation. * Finally, resolve the type for this operator node. * If there is no outer context that can provide a type, the operator node will provide itself with a default type. E.g, arithmetic nodes will have DOUBLE as the default type in this case. * Current resolver actions are collected in Item::resolve_type_inner(), which can be called both from Item::resolve_type() (when type is determined internally) or from Item::propagate_parameter_type() (when type is determined from context).- The following nodes may have their type determined by the context:
* Item_func_numhybrid (including all unary and binary arithmetic operators). * Item_func_min, Item_func_max * Item_func_coalesce, Item_func_ifnull, Item_func_if, Item_func_nullif, Item_func_case- The following operator nodes have special rules:
* Item_func_bit: Default type is binary string; if integer arguments are supplied, statement is reprepared. * Item_func_int_div: All arguments resolve as integer.
5. Convert parameter value from original representation to derived representation
We have considered two refactoring strategies for this task, a simple and an extensive one. The extensive one is considered better, as it will separate actual data from parameter items, but since it is more work, we have decided to carry out the simpler approach below:- The type of the actual data values are stored in members of Item_param named m_data_type_actual, m_unsigned_actual and m_collation_actual. - The actual data are stored in the member variables value, str_value_ptr and decimal_value as before. For character data, the three character set members character_set_client, character_set_of_placeholder and final_character_set_of_str_value are replaced with m_collation_actual (the collation of the original string) and m_collation_stored (the collation of the string as stored in the item, after possible conversion). - If m_collation_actual and m_collation_stored are different, the provided character string is converted from m_collation_actual to m_collation_stored. If this conversion cause an error, e.g. because the string contains characters that cannot be represented in m_collation_stored, execution is aborted and an exception is raised. - Privatize Item_param::state and rename it to m_param_state.
(This is an informational note for later consideration.
diff only: refactor-parameter-value.diff in mysql/wl/6570.- Refactoring Item_param object:
state - is removed from object, since we want no "state" in Item objects str_value_ptr - moved into stmt->parameter_values decimal_value - moved into stmt->parameter_values value - moved into stmt->parameter_values item_result_type - refactored in task 2.4 (Refactor result_type() as a non-virtual function) item_type - moved into stmt->parameter_values param_type - refactored in task 2.3 (Refactor field_type() to a non-virtual function) set_param_func - moved into stmt->parameter_values (or deleted...)
Create a new class named Parameter_value with the following members:
user_type - Data type of parameter. This is set initially in Prepared_statement::setup_parameter_values() based on parameter types derived from the resolved parameter markers of the prepared statement. When statement is executed, this is compared with actual parameters (user variables or parameters depending on statement type). If the result type is different or a different character set is provided, all parameter data types are re-resolved. This will also trigger a mandatory re-optimization.
decimals Decimals of parameter (necessary?)
unsigned_flag Unsigned flag of parameter (necessary if non-user variable?)
null_value True if the variable is true.
collation Collation of parameter.
value_buffer Value, or pointer to value. See internal functions.
length Length of string if value_buffer points to value.
Allocation strategies for values: TBD.- Implement new field conversion function
6. Reprepare statement when actual parameter types change
When executing a prepared statement with actual parameter types that differ from the resolved parameter types, the following procedure is carried out:
a. Resolver process ensures that all parameters have a defined data type, based on place in query and expression context (see sections 3, 4 and 5).
b. When a prepared statement is executed, actual parameter values are first copied to the Item_param objects (this might become a separate parameter value array, but currently we use the Item_param objects for this).
c. Prepared_statement::check_parameter_types() is then invoked to check that actual parameter types are compliant with the resolved parameter types. Compliance is determined as follows:
- NULL value is compliant with all resolved types. - Character strings are compliant with all resolved types. It is assumed that the character string value can be cast into the resolved type. (Note however that when the resolved type is itself character string but has a collation that is different from the provided string, the conversion may already have caused execution to fail.) - If the resolved type is integer, actual integer values with same signed-ness are compliant. Other numeric types are not compliant due to possible decimal truncation and range errors. Signedness difference causes non-compliance due to range mismatch. - If the resolved type is decimal, all integer and decimal actual values are compliant. - If the resolved type is floating point, all integer, decimal and floating point actual values are compliant. - If the resolved type is a string type, actual types that are string types are compliant (except for the collation issue mentioned). - If the resolved type is a temporal type, actual types that are temporal types are compliant, except that DATE and TIME values are not compliant with each other. In addition, numeric values on the form YYYYMMDD for date, HHMMSS[.ffffff] for time and YYYYMMDDHHMMSS[.ffffff] for datetime, are accepted as actual values. - If the resolved type is YEAR, actual integer values, both signed and unsigned, are accepted. - If the resolved type is BIT, actual unsigned integer values are accepted.
d. If any of the actual values are not compliant, and the resolved type of at least one of the corresponding parameters is pinned, an exception is raised. Currently, only parameters used in LIMIT and OFFSET clauses, and a few uses in window functions, are pinned. Pinned parameters are limited to INTEGER types.
e. If any of the actual values are not compliant and no other mismatch is detected, the statement is reprepared:
- A new clone of the statement is created. - All parameter items are resolved using the types of the actual provided parameter values. Thus, no context analysis is needed. An exception is made for parameters with pinned and inherited types: these are kept as the pinned/inherited metadata types. - The statement clone, including the mem_root and the parameter array, are swapped with the original statement. - Destroy the cloned statement (which now contains the original contents of the original statement). - Notice that, different from a reprepare caused by changed metadata, a reprepare caused by changed actual data is only allowed once. This is to catch failures in reprepare logic at an early stage.
f. Execute the query using the actual parameter values.
7. Resolve user variables similar to how parameters are resolved
Retrieval of a user-defined variable, implemented by class Item_func_get_user_var, is resolved as follows:- Set the type as MYSQL_TYPE_INVALID, regardless of whether the variable exists or not. This makes the type be propagated from context, just like for dynamic parameters. - System variables are resolved as before, except that string variables are given data type VARCHAR (as before) but max_length is set to 65535. Earlier, the max size was the same as the current size of variable string, which was obviously wrong, since the string might vary in size later. Another consequence of this is that these variables generate fields of type TEXT when selected in a CREATE TABLE ... SELECT statement, since the field becomes greater than CONVERT_IF_BIGGER_TO_BLOB.
8. Re-implement IN functions to have a single data representation- We would like to resolve the left-hand side (LHS) of the IN function and the values of the values list to have the same type. All values that fit in this scheme would then be put into a single ordered array, which would be used for fast lookup during execution. Values that are not const-for-execution would be used to generate equality predicates with the LHS, before they are ORed with the IN function. Example:
... WHERE int-col IN (3, '444', ?, x.col); - The resolved type is the type of int-col, e.g MYSQL_TYPE_LONG - The literal 3 is of same type, so it is added to the array. - The literal '444' is cast to an integer and then added to the array. - The dynamic parameter is type-determined as integer and added to the array. If the parameter is NULL, it is not added to the array, but the Item is marked as having NULL values in value list. - x.col (probably an outer reference) is used to form an equality int-col=x.col which is ORed with array lookup mechanism. - The array is sorted so that it can be used for binary search.
This work is postponed and currently not scheduled for inclusion in worklog.
9. Proper lifecycle management for procedures and functions
There are two important issues with procedures and functions (commonly known as routines):
1. All expressions and statements within a routine are prepared for every execution. This will be converted to preparing each object only once.
2. All tables, columns and functions accessed within a routine must be checked for privileges before each execution. Privilege checking is currently performed in the prepare stage for all SQL statements and in sp_lex_instr::reset_lex_and_exec_core() for expressions. Since preparation is done less often after this work, we cannot rely on the above mentioned privilege check for statements anymore. However, checking privileges for expressions is safe.
Since all expressions can be prepared only once, we must check whether an expression is prepared before calling fix_fields() on it. We must also ensure that the resolving is recorded in the mem_root associated with the expression, so we set up a Prepared_stmt_arena_holder before resolving. This has to be done in sp_prepare_func_item() for expressions within functions and sp_head::setup_trigger_fields() and Item_trigger_field::set_value() for management of trigger fields.
In addition, all tables and fields need to be rebound against dynamic table information. This is done by calling bind_tables() and bind_fields() on the complete Item list in sp_lex_instr::reset_lex_and_exec_core() for stored procedure instruction items.
Notice also that function objects (class sp_head) used to be rebound dynamically for each time Item_func_sp::fix_fields() was called. Since we call this function only once, the rebinding is now moved to Item_func_sp::bind_fields(), which calls sp_setup_routine() to assign a pointer to member m_sp. Accordingly, m_sp is set to NULL in Item_func_sp::cleanup().
We also mark statements as belonging to a stored routine by calling Sql_cmd::set_as_part_of_sp(). This marks that the statement is preparable, but still not exactly the same as a normal prepared statement (see section 1).
The added calls to Sql_cmd_dml::check_privileges() that are made by Sql_cmd_dml::execute() ensure that tables and columns referenced by statements, and functions being called, are checked for proper privileges.
Lifecycle for a call to a procedure containing a simple DML statement:- Sql_cmd_dml::execute() :
call Sql_cmd_dml::prepare() call lock_tables() call Sql_cmd_call::execute_inner() call unit->cleanup()- Sql_cmd_call::execute_inner() :
Lookup SP which has its own mem_root call sp_head::execute_procedure() my_ok()- sp_head::execute_procedure() :
Setup parent_sp_runtime_ctx Setup proc_runtime_ctx Setup parameters (with full cleanup) call sp_head::execute() Restore thd->sp_runtime_ctx->callers_arena Post-process OUT parameters delete parent_sp_runtime_ctx delete proc_runtime_ctx restore thd->sp_runtime_ctx- sp_head::execute() :
Create execute_mem_root for SP execution Setup thd->sp_runtime_ctx with arenas Repeat process instructions: Set arena from instruction call sp_instr_stmt::execute() call cleanup_items(i->item_list()) call thd->cleanup_after_query() Release memory allocated during execution thd->restore_active_arena execute_arena -> original thd->sp_runtime_ctx->pop_all_cursors() thd->pop_reprepare_observer() arena::state: INITIALIZED -> STMT_EXECUTED- sp_instr_stmt::execute():
call sp_lex_instr::validate_lex_and_execute_core()- sp_lex_instr::validate_lex_and_execute_core() :
thd->push_reprepare_observer() call sp_lex_instr::reset_lex_and_exec_core() clear sp_lex_instr::m_first_execution- sp_lex_instr::reset_lex_and_exec_core() :
call reinit_stmt_before_use() call sp_instr_stmt::exec_core() call thd->rollback_item_tree_changes() set thd->stmt_arena->state= INITIALIZED -> STMT_EXECUTEDreinit_stmt_before_use():
unit->clear_execution() tables->reinit_before_use() (table=NULL, check broken LEX)- sp_instr_stmt::exec_core() :
Setup thd->lex->sphead call mysql_execute_command() Reset sp_current_parsing_ctx()
9.1 Cursors within stored procedures
A cursor is created from Query_result_materialize::prepare(), which is called from Sql_cmd_dml::execute_inner(). It creates a temporary table and a Materialized_cursor. The temporary table is created on a dedicated mem_root, and the Materialized_cursor is created on the same mem_root.
The temporary table is instantiated by Query_result_materialize::start_execution().
The temporary table contents is deleted when the cursor is closed, ie from Materialized_cursor::close().
The temporary table is deleted from the destructor of Materialized_cursor, which is attached to the SQL statement object.
See also section 11.
9.2 UDF functions
In order to preserve compatibility with previous 8.0 versions, we trigger a mandatory reprepare operation for all executions of prepared statements containing UDF function references. In addition, we ignore the fix_fields() call for UDF function references for the first prepare() call.
UDF functions are implemented using three user-written functions:
1. an "init" function that is called during preparation which may allocate memory for the function and return metadata about the function result and argument. 2. A "deinit" function that deallocates any memory allocated by the "init" function. 3. An execution function that is called to actually evaluate the function result, based on a set of argument values supplied to it.
This means that we may regard const_for_execution parameters the same as purely const data, since they will not change between calling the "init" function and the "deinit" function. It also means that parameter values are always available when fix_fields() is called.
10. Proper lifecycle management for triggers
Triggers behave more or less like stored procedures. An important difference is that they are tied to a TABLE object, which may be associated with different sessions during its lifetime, whereas stored procedures exist in a session-local procedure cache and are always executed within the same session.
We must ensure that memory objects allocated during an execution are deattached before the TABLE (and trigger) is deattached from the session, to avoid that a subsequent session references memory that was allocated by this session.
On the other hand, Item_trigger_field objects, which are contained within triggers and refers to fields in the referencing TABLE, need no action in Item_trigger_field::bind_fields(), since they point into persistent TABLE objects.
Privileges for triggers are also checked the same way as for stored procedures.
11. Proper lifecycle management for cursors
Some cursors are materialized into a temporary table. The cursor object, as well as this temporary table, now has the same lifetime as the statement the cursor is created for, because it is created in the statement's mem_root. Cursors can be created for prepared statements (class Materialized_cursor) and for statements in stored procedures (class sp_cursor). Cursors can never be created for regular (non-preparable) statements.
The following objects are involved when dealing with a cursor:
- An Sql_cmd_select object that represents the statement being executed.
- A Prepared_statement object, when cursor is used through a prepared statement.
- An sp_cursor object, when cursor is used from a stored procedure.
- A protocol-specific Query_result object, used when sending data back to the client (Query_fetch_protocol_binary or Query_result_send), or to stored procedure variables (Query_fetch_into_spvars).
- A second Query_result object of subclass Query_result_materialize object that is used when populating the temporary table used to hold the query result and used when reading rows from the cursor.
- A Materialized_cursor object that represents the cursor object. This object references the temporary table described below.
- A TABLE object and a TABLE_SHARE object representing the materialized results of the cursor in a temporary table.
As a consequence, we must now handle three different scenarios for the state of a cursor when an object for materialization of query results is to be created or reused in function mysql_open_cursor():
1. If this is a preparable un-prepared statement (may happen for statements that are part of stored procedures), create the query result object in statement mem_root.
2. If this is a prepared statement but no query result object for materialization exists, create query result object in statement mem_root. Since the statement is already prepared, explicitly prepare the query result object, which includes creating the temporary table.
3. If this is a prepared statement for which a query result object for materialization exists, reuse this object. Only refresh the query result object for use by the cursor.
Cursor code is changed so that the contents of the materialized temporary table is instantiated per execution (in Query_result_materialize::start_execution()) and deleted in Materialized_cursor::close(). The temporary table metadata are deleted in the destructor of Materialized_cursor.
The lifetime management of temporary table handler objects is also changed: Since the temporary table is created in one execution and read in a subsequent execution, the handler object cannot be created on the execution mem_root. It is also problematic to create the object on the statement mem_root, since the mem_root size may thus increase without control. The solution chosen for this is to create a mem_root object in the base class Server_side_cursor. This object is initialized when Server_side_cursor is constructed and deleted when it is destructed. It is used for creation of transient objects associated with the temporary table, including Handler objects and copies of Items representing the retrieved columns. The mem_root object is erased when a Materialized_cursor is closed, thus it is reused for the next instantiation of the cursor.
(When a prepared SELECT statement which creates a cursor is executed, a temporary table is created with the result of that statement. In subsequent executions, we may FETCH rows from that temporary table. After retrieval, the cursor is closed which causes the temporary table contents to be deleted. Thus, the temporary table contents need to survive between executions).
In addition, the query result object created by Prepared_statement::execute() when a cursor is requested is now created only once, and not every time the statement is executed. Since this object is created on the statement mem_root, repeated executions might otherwise cause excessive memory use.
12. Proper lifecycle management for temporary tables
The main reason for this change is the ability to set up a derived table or a UNION materialized table during preparation, and then instantiate a new temporary table for every execution of the statement.
For temporary tables generated by the optimizer, these changes are not required, since these tables are always created and deleted as part of the execution. But with caching of plans, the change will become relevant here too.
Here are the main changes:
- Metadata handling and instantiation of temporary tables is now clearly separated. create_tmp_table() will create the metadata for a temporary table, and optionally instantiate it (as before). However, the old free_tmp_table() function is now split: close_tmp_table() deletes the temporary table contents and releases the file handler when TABLE_SHARE::tmp_handler_count is decremented to zero and free_tmp_table() deletes all metadata of the temporary table when TABLE_SHARE::ref_count is decremented to zero. instantiate_tmp_table() can be used to re-create a temporary table for which the contents have been deleted earlier.
- The TABLE, TABLE_SHARE and associated objects (e.g Field objects) are still created in a dedicated mem_root. The mem_root object is deleted together with the table metadata by free_tmp_table(). This is a viable strategy also when plans become shared, because we will then need to create TABLE and TABLE_SHARE per instance of a plan.
- Objects that are associated with the plan, like TABLE_LIST, copy_func and copy_field, are created in the current mem_root.
- tmp_handler_count counts the number of open TABLE handles to the temporary table. It is incremented by create_tmp_table() and open_table_from_share(), and it is decremented in close_tmp_table(). When tmp_handler_count is decremented to zero, the table contents is deleted and the file handler is deleted.
- is_created() is now a sub-state of has_storage_handler(): A table that has an associated storage handler may or may not be in the "created" state. close_tmp_table() exits if table has no storage handler, because in that case the contents cannot exist and tmp_handler_count is zero. If the table is created, the connection to it is dropped if there are more handlers (ha_close()), and the table is dropped if there are no more handlers (ha_drop_table()). The plugin is also unlocked when tmp_handler_count decrements to zero.
- ref_count is still related to number of associated TABLE objects. It is set to one when the temporary table is created and incremented by Common_table_expr::clone_tmp_table(). It is decremented by free_tmp_table(). The final free_tmp_table() call deletes the mem_root. free_tmp_table() ensures that a table must be closed before it can be freed.
- open_tmp_table() opens a table and sets state to "created" (no change).
- instantiate_tmp_table() sets the file handler using set_file_handler(), then creates the table and opens it.
- set_file_handler() is a new function that locks the proper engine plugin, allocates a new handler object and increments tmp_handler_cnt.
- There are similar changes within create_ondisk_from_heap().
- To really delete a temporary table, two calls are needed now: close_tmp_table() followed by free_tmp_table(). And if there are multiple TABLE objects to a temporary table, a close and a free is required for each TABLE.
- The table name for a temporary table is created by hex-coding the pointer to the TABLE_SHARE object. This ensures the name is unique within the running instance, without having to lock any data structures. The temporary table name is used in handler communication when opening a table.
Informally, a temporary table can go through a set of states: non-existing, created, instantiated and opened. It starts in the non-existing state. The create action brings it to the created state. Clone operations does not change the state, but increments ref_count. Instantiate brings it from created to the instantiated state. Open brings it from the instantiated to the opened state. Subsequent open actions does not change the state, but increments tmp_handler_count. Close action decrements tmp_handler_count. If it goes to zero, the state goes to created state. The free action decrements ref_count. When ref_count goes to zero, all metadata are deleted and the table goes to the non-existing state.
created -> instantiated -> opened ->
13. Lifecycle management for generated columns
* When TABLE objects are constructed, generated column expressions are parsed and resolved (to detect errors). The resolving is against a TABLE but no TABLE_LIST object, since the generated column is attached permanently to a TABLE. Hence, the field pointer can be attached permanently to a Field of that TABLE as well.
* In TABLE::init() (when binding TABLE and TABLE_LIST), call refix_gc_items() to assign relevant members according to the TABLE, including table_name and field_name.
* When finishing tables (in close_thread_table()/retire tmp table), call cleanup_gc_items(), which will walk through all Items of the GC and release resources allocated during execution.
14. Lifecycle management for partition expressions
There are two important aspects for partition expressions: use of tables and table creation.
First we describe use of tables containing partitioning expressions:
* When a table is opened, and there are no cached TABLE objects for the table in the server memory, the partitioning expression including the partitioning columns is created in TABLE object mem_root by calling mysql_unpack_partition(). * In fix_partition_func(), the expressions are fixed against the current TABLE object. The table_ref pointer for field items are NULL for such items, however the field pointer is permanently set to point to a field within the TABLE object. (This is contrary to regular fields, where table_ref points to a TABLE_LIST within the statement, but field is rebound for every new execution.) * The partitioning expressions are used by the table operation (INSERT, UPDATE, SELECT, ...). * No special action is taken when the TABLE object is released. Item_field::cleanup() is NOT called for items that are part of partitioning expressions during execution, so we must trust that they never allocate memory that needs cleanup (this is not a new restriction in the worklog). * On repeated execution, a free TABLE object may be assigned to the executing statement. The existing part_info object which is already resolved against the correct TABLE object will be reused. * Item_field::cleanup() is called for the partition expression when deleting the TABLE object it is associated with. * Notice also that evaluation functions on partition expression items must NOT allocate any memory (again, this is not a new restriction).
Lifecycle management is a bit more complicated when creating a table:
* When a table is created, the partitioning expression is first created in the THD::work_part_info object. * As part of creation, mysql_unpack_partition() is called. It will create the partitioning expressions in the TABLE mem_root as described for a regular table use operation. However, immediately afterwards this part_info object is freed, and we go back to using the THD::work_part_info struct. * In fix_partition_func(), the expressions are now fixed. * Just after creation, the TABLE object is deleted. The associated partitioning expressions are also deleted. * As part of statement cleanup, the partitioning expressions created in THD::work_part_info are also deleted.
15. Make sure TABLE::alloc_tmp_keys() is safe for re-execution
This is required because a temporary table that is used to materialize a derived table is now allocated for the duration of a statement. When the statement is optimized, new key definitions are added to the table. These keys are used during query execution, but are not deleted at the end of execution.
With the new temporary table implementation, the temporary table objects are allocated in a dedicated mem_root. Currently, the keys are allocated for each optimization, meaning that the mem_root will leak memory for long-lasting statements.
The solution is to introduce a new field alloced_keys which is never decremented, and reuse the memory for allocated keys from one optimization to the next.
- Reset alloced_keys in close_tmp_table(), besides TABLE::set_deleted(). - Never set key_info to nullptr, even when keys is zero.
16. Proper implementation for IODKU (INSERT with ON DUPLICATE KEY)
INSERT with ON DUPLICATE KEY uses a special row value buffer for the table to be inserted into, in connection with Item_insert_value objects. Before this worklog, the row value buffer was created for each prepare/execution, followed by a fix_fields() calls on the Item_insert_value object. When fix_fields() is called only once, this is no longer sufficient as the TABLE pointer and the row value buffer may vary between executions. The revised implementation creates a "cloned" Field object for the Item_insert_value in fix_fields() and also saves the current pointer to the row value buffer. The row value buffer is rebound for every execution in Item_insert_value::bind_fields(), and the field pointer inside the Field object is updated accordingly.
17. Re-check column privileges during execution
During preparation, we first call Sql_cmd_dml::precheck() for a basic privilege check on tables and views, then we check detailed view and column privileges during resolving. For regular statements, this is all privilege checking that is required.
During execution of a prepared statement, a complete privilege check will be performed by calling Sql_cmd_dml::check_privileges(). Since all expressions are completely resolved at this stage, this function can perform the same work as Sql_cmd_dml::precheck(), as well as the subsequent view and column privilege checks. This is implemented by traversing all resolved expressions, checking privileges for all columns that are encountered.
There may be a shortcut for this: if table and view privilege checking detects that existing table privileges are sufficient to cover all accessed columns, there is no need for the column privilege check. E.g, if we have a SELECT statement and we have SELECT privileges to all involved tables, checking column privileges will be redundant. Similar shortcuts can be added for INSERT, UPDATE and DELETE statements as well. However, this is an optimization that will not be carried out in this worklog.
18. Partition pruning of prepared statements
Partition pruning is currently performed in two phases, before and after locking of tables. The pruning performed before locking is used to reduce the amount of locking required. This pruning is based on constant values, so it can be performed completely during statement preparation. The pruning performed after locking is based on const_for_execution values and is performed during optimization.
This is how partition pruning will be performed in this worklog:
- A TABLE_LIST object possibly contains a partition list, as supplied in the SQL statement.
- A TABLE object contains lock_partitions, the set of partitions to be locked and read_partitions, the set of partitions to be read during a retrieval operation. read_partitions is used by SELECT, UPDATE and DELETE statements and in subqueries.
- open_table() sets lock_partitions and read_partitions as follows: If a partition list is supplied for a referenced table, this set, otherwise the full set of partitions.
- During query preparation, lock_partitions and read_partitions may be reduced further based on predicates against constant values. If no partitions qualify, all_partitions_pruned_away is set. If the prune condition does not contain a 'const_for_execution' component for the table, is_pruning_completed is set.
- Locking of tables take into account the lock_partitions set. However, after WL#6035 (native partitioning), apparently no actual locking of partitions is performed, so it seems lock_partitions is currently mostly used to initialize the read_partitions set.
- During optimization, a further partition pruning stage is performed based on const_for_execution data, ie values for dynamic parameters and session variables. Only read_partitions is updated during this stage, lock_partitions is not affected.
- Execute statement with current values for lock_partitions and read_partitions.
- If a statement is executed multiple times, the value for lock_partitions is saved in the TABLE_LIST object. When a new TABLE object is bound, its lock_partitions and read_partitions fields are initialized from the saved lock_partitions. Then, another partition pruning operation may be performed during optimization.
- INSERT ... VALUES statements are handled slightly different: A special pruning stage is performed during preparation, and read_partitions and lock_partitions are set to the same values.
In short, this means that pruning that updates lock_partitions is performed during preparation. Pruning that updates read_partitions is performed during optimization.
For statements with dynamic parameters, fewer partitions may be pruned away for locking than in previous versions: Earlier, dynamic parameter values were available during preparation and were taken into account during partition pruning. But when preparation is performed only once, such parameter values are no longer available for pruning during preparation. However, this is not a real problem because lock_partitions is no longer used for time-consuming operations. And during optimization, read_partitions is updated based on dynamic parameter values, meaning the semantics is unchanged.
However, for INSERT ... VALUES statements with ON DUPLICATE, since we only perform partition pruning during preparation, we cannot take advantage of dynamic parameter values when calculating read_partitions when reading the row to be updated. This is a regression in behavior.
19. Restore used tables information after const table optimization
During optimization, we may identify constant tables and read one row from each of these tables, and use corresponding column values in further optimization. The benefit of this is that we can evaluate more expressions in order to detect more "always false" conditions, and that we can remove tables from the greedy-search join optimization, which has exponential cost by number of tables.
When const tables are read, these tables have their used tables information removed, so that expressions referring columns from these tables will also have changed used tables information. Some of these expressions are specific to the JOIN object used in optimization, but some expressions are also part of the SELECT_LEX object.
The const table optimizations must be rolled back after execution of the optimized plan. As part of this, we must also ensure that used table information is restored for all expressions inside the SELECT_LEX.
We implement this in JOIN::destroy(): If we have detected any const tables (const_tables > 0), call SELECT_LEX::update_used_tables(), which will go through all item trees attached to the SELECT_LEX and recalculate used tables information.
20. Refactoring of LIMIT and OFFSET handling
It was noticed that the current resolving and evaluation functions for LIMIT and OFFSET expressions were imprecise: Function prepare_limit() was implemented for class SELECT_LEX_UNIT, even though the LIMIT and OFFSET expressions were attached to class SELECT_LEX. Thus, it seemed natural to replace SELECT_LEX_UNIT::prepare_limit() with SELECT_LEX::resolve_limits().
SELECT_LEX_UNIT::set_limit() is unchanged, since the LIMIT and OFFSET are actually applied against a query expression.
21. Performance Notes
Since we have fewer real const items now, some impact is expected on caching operations. After this worklog is implemented, the principle will be that- Caching of const expressions happen in the resolver with substitution of
Items with simpler expressions.- Caching of const-for-execution expressions happen during optimization and
execution, by adding Item_cache objects, or by other, more specialized, caching mechanisms.
Some minor regressions are expected in this area. We intend to re-evaluate possible regressions after this work is done. Specific areas to look into are:- Item_func_conv_charset (caches real consts, but we have less real consts now,
e.g. user vars are const-for-one-execution now) (testcase: packet.test, query DELETE FROM t1 WHERE c12 <=> REPEAT('ab', @max_allowed_packet) has three warnings after this WL, instead of one).- get_datetime_value (has no more caching at all)
Detailed class documentation
Function replace_rollback_place() is no longer needed and has been deleted.
functions change_item_tree() and rollback_item_tree_changes() are no longer needed during resolving, since we no longer roll back data structures created in this phase. However, they are still needed to roll back data structures added during optimization, to prepare for a new optimization. In addition we had to add new calls to change_item_tree() for Items generated during optimization.
Function Prepared_statement::copy_parameter_types() copies parameter metadata for a list of parameters into the Prepared_statement.
Function Prepared_statement::prepare_query() prepares the current statement. It is renamed from check_prepared_statement().
Function Prepared_statement::check_parameter_types() checks the resolved and actual metadata of all associated parameters for compatibility. Parameter metadata are compatible if we can convert actual data according to resolved metadata without loss of precision or range. E.g, we cannot convert the decimal number 3.14 to an integer value without loss of precision. Neither can we convert -3456 to an unsigned integer, since it is outside its range. The function is used to check whether the prepared statement has to be automatically reprepared based on metadata of actual parameters. Parameter metadata that are pinned or inherited will never cause a reprepare. The function is called from Prepared_statement::execute_loop().
Function Prepared_statement::insert_params() copies parameter values from protocol buffers in Item_param objects, according to m_data_type_actual, m_unsigned_actual and m_collation_stored (m_collation_actual is the collation of the protocol string, m_collation_stored is the expected collation in further processing). This function has been aligned with insert_params_from_vars() regarding how the logged query is generated.
Function Prepared_statement::insert_params_from_vars() has been rewritten to deal with "actual" data type, cf. Prepared_statement::insert_params().
Prepared_statement::execute_loop() must check parameter types and trigger a repreparation, in case actual parameter values do not comply to parameter types.
Function LEX::destroy() is used to destroy all resources associated with a LEX object and its attached query expression objects, including all temporary tables with lifetime = statement.
After a destroy call, the LEX object is ready for reuse.
Function LEX::bind_tables() is used to copy some persistent attributes from class TABLE_LIST to class TABLE, after having opened a table (which attached a free TABLE object to the TABLE_LIST). See description of TABLE_LIST::set_bitmaps() for a list of attributes that are copied. In addition, a default Record_buffer is created for the table. The function walks through all tables associated with all query expression objects of the statement.
Function LEX::bind_insert_tables() is similar to LEX::bind_tables(), but applies to the table being inserted into in INSERT and REPLACE statements.
Implemented SELECT_LEX_UNIT::destroy() which destroys resources belonging to a query expression object and all underlying query blocks.
The work was formerly done in SELECT_LEX_UNIT::cleanup(), but this function is now only responsible for cleaning out resources after one execution.
In function SELECT_LEX_UNIT::exclude_tree(), ensure to cleanup and destroy underlying query blocks.
Function SELECT_LEX_UNIT::unclean() is no longer needed and is removed.
Function SELECT_LEX_UNIT::reinit_exec_mechanism() is no longer needed and is removed. It basically walked through all items and cleared the fixed member.
Function SELECT_LEX_UNIT::prepare_limit() is removed due to refactoring of LIMIT and OFFSET handling.
Function SELECT_LEX_UNIT::bind_tables() does a bind operation for all tables associated with the current query expression object and all underlying query blocks. It copies bitmaps and assigns default record buffers for each table
Member materialized_derived_table_count now also count number of materialized tables from table functions, thus several places we only need to test materialized_derived_table_count > 0 instead of both table_func_count and materialized_derived_table_count.
Option OPTION_SETUP_TABLES_DONE is deleted. It was already obsolete after refactoring of derived tables, and is now completely removed.
Implemented SELECT_LEX::destroy() which destroys resources belonging to this query block and all underlying query expression objects. In particular, it destroys resources associated with window functions and permanent data structures of internal temporary tables.
The work was formerly done in SELECT_LEX::cleanup(), but this function is now only responsible for cleaning out resources after one execution.
Function SELECT_LEX::check_column_privileges() checks SELECT privileges for all columns referenced in a query block.
Function SELECT_LEX::check_privileges_for_subqueries() applies column privilege checks to all subqueries below a specific query block.
Function SELECT_LEX::update_used_tables() updates used tables information for all expressions referenced by a query block. It is used to update information after a query block has been optimized and used tables information is modified due to const table elimination. This is needed to ready the query block for a new optimization.
Function SELECT_LEX::bind_tables() is a helper function for SELECT_LEX_UNIT::bind_tables().
Function SELECT_LEX::resolve_limits() replaces the old SELECT_LEX_UNIT::prepare_limit(), since it was noticed that the "provider" argument could be eliminated. Function has also been enhanced with type propagation for dynamic parameters (which must be integer values).
Function SELECT_LEX::remove_hidden_items() removes items from a query block that were added to the all_fields list during optimization. It is called after execution of a query to re-initialize it before a new optimization.
Function SELECT_LEX::resolve_derived() could be simplified, since it will no longer be called multiple times per statement.
Member hidden_items_from_optimization was added to keep track of hidden items that were created during optimization. Such items are created in the execution mem_root and must be deleted before the statement is re-optimized.
Implemented type MYSQL_TYPE_INVALID (default type for Item_param) Implemented type MYSQL_TYPE_BOOL (unfinished, preparing for a true BOOL type).
Moved fulltext_searched from TABLE, renamed to m_fulltext_searched. Implemented new accessor functions:
Function is_updated() returns true if table is potentially updated in this statement (UPDATE statement).
Function set_updated() sets table as potentially being updated. Table should be a base table, and property is applied to all referencing views.
Function is_inserted() returns true if table is potentially inserted into in this statement (either INSERT or REPLACE statement).
Function set_inserted() sets table as potentially being inserted into. Table should be a base table, and property is applied to all referencing views.
Function is_deleted() returns true if table is potentially deleted from in this statement (DELETE statement).
Function set_deleted() sets table as potentially being deleted from. Table should be a base table, and property is applied to all referencing views.
Function save_bitmaps() saves bitmap information (read_set and write_set) and a few other properties generated during preparation from a TABLE object to the corresponding TABLE_LIST object. The purpose of this is to restore the same information into a TABLE object that is attached to the TABLE_LIST at start of each execution. save_bitmaps() is not called for regular statements.
The properties that are saved currently are the following: - read_set - write_set - covering_keys - merge_keys - keys_in_use_for_query - keys_in_use_for_group_by - keys_in_use_for_order_by - nullable - force_index - force_index_order - force_index_group - read_partitions - lock_partitions
Most of these properties really belong in TABLE_LIST and will be physically moved in followup work.
Function set_bitmaps() restores bitmap information previously saved using save_bitmaps(), called from LEX::bind_tables().
Function TABLE::has_storage_handler() returns true if table has assigned a storage handler object (member file is non-NULL). It can be false for temporary tables between executions.
Function TABLE::set_storage_handler() assigns a storage handler object for a temporary table.
Implemented function Handler::set_ha_table() that binds a TABLE object with the current Handler.
This is a new class that carries meta data properties for an Item or a Field. It is especially used for propagating type information to dynamic parameters and user defined variables.
const enum_field_types type; const bool unsigned_flag; const uint32 max_length; const DTCollation collation;
constructor - there are constructors for integer types, numeric types, string types, and derived from a resolved item.
Deleted member runtime_item, since all rollback of prepared items is eliminated. Default data_type for an Item is now MYSQL_TYPE_INVALID.
Function Item::result_to_type() translates a result type into a default data type (e.g INT_RESULT has data type MYSQL_TYPE_LONGLONG, REAL_RESULT has data type MYSQL_TYPE_DOUBLE).
Function Item::type_to_result() translates a data type to a default result type, the inverse of result_to_type().
Function Item::cleanup() is called at the end of execution of a statement. Since items are prepared once, there is no longer need to restore item_name from orig_name.
Function Item::propagate_parameter_type() is a new virtual function that is used for propagation of data type into parameter objects and user variable objects, based on expression context.
Function Item::set_data_type_inherited() is a new virtual function that has a non-empty implementation only for class Item_param.
Function Item::pin_data_type() is a new virtual function that has a non-empty implementation only for class Item_param.
Function Item::bind_fields() is called for every Item object before executing a statement in order to rebind the Item against the dynamically allocated TABLE objects. It has implementations for Item_field and related classes. Item::cleanup() is responsible for resetting the members that bind_fields() will assign.
Function Item::substitutional_item() is removed, it is replaced by its owner object, since the runtime_item member has been removed.
Function Item::set_runtime_created() is removed since the runtime_item member is gone.
Function Item::transform() has a changed semantic: It should handle all necessary transformations during resolving. As a consequence, all transformations handled by this function are now permanent and cannot be rolled back.
Function Item::compile() has a changed semantic: It should handle all necessary transformations during query compilation (optimization). Thus, it has to record added pointers so that they can be rolled back at end of execution.
Function Item::visit_all_analyzer() is a helper function to be used together with Item::compile() when all items in the tree should be visited.
Function Item::set_my_param_type() is used to propagate type information to parameters and user variable retrievals, by providing a data type, a pin and an inherit property. All properties may be overriden by defaults.
Function Item_sp_variable::used_tables() now returns INNER_TABLE_BIT, since such variables are constant for one execution.
Item_ident::cleanup() has been deleted since db_name, table_name, field_name no longer need to be restored since we only prepare once.
Member Item_ident::cached_field_index is renamed to Item_field::field_index. The field is not used for Item_ref objects, only for actual fields. Note that this is a constant value, as long as table's metadata is the same.
Likewise, the constant NO_CACHED_FIELD_INDEX is renamed to NO_FIELD_INDEX.
Function Item_field::set_field() sets table_name pointer from table_ref if it exists, otherwise from underlying field. Notice that field is short-lived (one execution), so it has to be reassigned in Item_field::bind_field().
Function Item_field::bind_fields() is called when a query plan is rebound with new TABLE object. The following logic is changed:- field is set to point at the Field object in the new TABLE object, as
indicated by field_index.- if table_name was nullptr (meaning it used to point to non-permanent memory),
make it point to the bound field's table_name member.- if field_name was nullptr (meaning it used to point to non-permanent memory),
make it point to the bound field's field_name member.
Function Item_field::cleanup() is changed as follows:- When TABLE represents a base table (ie not associated for the lifetime
of the statement), field is reset to NULL after each execution.- Ensure that table_name and field_name point to valid strings for each
execution. When table_ref is nullptr, table_name must be assigned to a per-execution string. When table is a schema table, the TABLE and field objects are created per execution and must be reassigned.- If Item_field::result_field is different from Item_field::field, it was
changed during optimization and is reset back to original value.
Function Item_field::reset_field() is a new temporary helper function to make prepared CREATE TABLE ... SELECT statements work.
function Item_insert_value::fix_fields() - sets "arg" to NULL when expression always returns NULL, to avoid interaction
with Item_insert_value::bind_fields().creates a clone of the Field object in statement mem_root to be used to
point into the value buffer.
function Item_insert_value::bind_fields() - Reuses the Field object created by fix_fields(). - Updates field offset to point into the insert_values buffer that is valid
for one execution.
member original_privilege is removed, want_privilege is always the desired privilege when executing a trigger procedure.
Function Item_trigger_field::used_tables() is reimplemented to indicate that parameter values are constant per execution.
Function Item_trigger_field::set_value() ensures that we check privileges whenever this is called.
Function Item_trigger_field::check_column_privileges() is a new function, required in order to check privileges for each execution.
Function Item_trigger_field::cleanup() sets table_ref= nullptr, since a trigger is bound to a TABLE, so the TABLE_LIST may vary between executions.
Function Item_trigger_field::bind_fields() is empty: Triggers are tied to a TABLE, so fields will never relocate.
Function Item_trigger_field::set_required_privilege() is deleted.
Function Item_default_value::fix_fields() - saves the row value buffer offset so it can be adjusted during subsequent executions.
Function Item_default_value::bind_fields() - Reuses the Field object created by fix_fields(). - Updates field offset to point into the insert_values buffer that is valid
for one execution.
member chop_ref is removed since it was only used when unpreparing a prepared statement. As a consequence, the "ref" pointer is now fixed for the lifetime of the statement.
Function Item_ref::cleanup() has no longer a need to check chop_ref and reset ref.
Class Item_param is modified so that it keeps two sets of metadata: the resolved metadata and the actual metadata. The resolved metadata are the same as for any other Item object. ie. data_type(), decimals, unsigned_flag, max_length and collation, and does not change after resolving.
The actual metadata are m_data_type_actual, m_unsigned_actual, m_collation_actual and m_collation_stored, and reflect the type of parameter data retrieved from protocol buffer or from local variables.
m_type_inherited - true if type of parameter is inherited from parent object (like a typecast). Reprepare of statement will not change this type.
m_type_pinned - true if type of parameter has been pinned, attempt to re-prepare statement will not change this type, and value is subject to range check.
m_data_type_actual - type of actual parameter (from user variable or protocol buffer).
m_unsigned_actual - unsigned indicator of actual parameter.
m_result_type - replaces item_result_type.
m_param_state - replaces state.
m_collation_actual - collation of string from user variable or protocol buffer.
m_collation_stored - collation of string as "stored" in the Item_param object, possibly after conversion from m_collation_actual.
state is replaced with m_param_state.
item_result_type is replaced with m_result_type.
item_type is obsolete, since an Item_param object is always of type PARAM_ITEM now.
limit_clause_param is replaced with the m_type_pinned property.
Function Item_param::fix_fields() is changed: if param_state() is NO_VALUE, keep data type = MYSQL_TYPE_INVALID. Otherwise, set data type according to type of actual data value. (This ensures an automatic reprepare based on actual values will work.)
Function Item_param::propagate_parameter_type() propagate parameter type based on expression context. See Functional Requirements for rules.
Functions set_null(), set_int(), set_double(), set_decimal(), set_time(),
set_str(), set_from_user_var(), set_value:
updates m_param_state but no longer modifies resolved metadata.
Function Item_param::set_data_type_inherited() sets the "inherited" property for the parameter object.
Function Item_param::is_data_type_inherited() retrieves whether data type is inherited.
Function Item_param::pin_data_type() sets the "pinned" property for the parameter object.
Function Item_param::is_type_pinned() retrieves whether data type is pinned.
Function Item_param::set_param_state() is a new function used to set the state of the parameter (i.e whether the parameter has a value, and in which case, what type it is).
Function Item_param::param_state() is used to retrieve the current parameter state.
Function Item_param::set_type_actual() sets the actual type and signedness of parameter value, used when setting value.
Function Item_param::data_type_actual() is used to retrieve the data type of current value associated with the parameter.
Function Item_param::is_unsigned_actual() retrieves whether parameter value is actually unsigned (if type is an integer type).
Function Item_param::copy_param_actual_type() copies type of actual parameter value from one parameter to another. This was renamed from copy_param_metadata.
Function Item_param::used_tables() is reimplemented to indicate that parameter values are constant per execution.
Function Item_param::basic_const_item() is reimplemented to indicate that parameter value is not a true constant value.
Function Item_param::set_from_user_var() is changed: limit_clause_param is replaced with "pinned" handling. It no longer changes resolved metadata.
Function Item_param::reset() is changed, it no longer changes resolved metadata.
Function Item_splocal::used_tables() returns INNER_TABLE_BIT so that we know the values are constant per execution.
Function Item_cache::cleanup() is changed: Structures set up when resolving a statement are permanent and need not be deleted after statement execution.
This is a new helper class that is inherited by class Item_func and Item_sum which is used to handle resolving of arguments for these classes.
Function Func_args_handle::param_type_is_default() is used to set metadata for a range of arguments.
Function Func_args_handle::param_type_is_rejected() is used to check for invalid use of dynamic parameters as function arguments.
Function Func_args_handle::param_type_uses_non_param() is used to propagate the metadata of non-parameter arguments into the remaining arguments that are dynamic parameters.
Function Item_func::propagate_parameter_type() is a new function, it calls propagate_parameter_type() on all arguments and then resolves the function's type by calling virtual Item_func::resolve_type_inner(). This function is only called if the function was unable to resolve it's own metadata during regular Item_func::resolve_call().
Function Item_func::resolve_type() may be split up, for all function classes they possibly may not determine their own metadata (e.g. when all arguments are dynamic parameters). In this case, the actual resolving is performed by the virtual function Item_func::resolve_type_inner(). If Item_func::resolve_type() was able to determine its own metadata, it calls Item_func::resolve_type_inner(), otherwise it relies upon Item_func::propagate_parameter_type() to call the inner function. This split has been performed for all arithmetic functions (add, subtract, etc), GREATEST, LEAST, as well as CASE and all CASE-derived functions like COALESCE.
Function Item_func_numhybrid::resolve_type() is changed: when an argument is a dynamic parameter, the type of the other parameter is propagated to the parameter. The function's main contents is split into resolve_type_inner() which performs the actual resolving.
Function Item_func_numhybrid::resolve_type_inner() is new, see above.
Function Item_func_set_user_var::set_entry() is changed: delayed_non_constness is removed, as we now always have a deterministic constness: We assume that Query_result_dumpvar can only output a single row before updating a variable, hence that variable can be considered to be const during the evaluation of that SELECT query.
Function Item_func_set_user_var::update() is changed as follows:- On first call during an execution, bind with a pointer to a user variable
entry by calling set_entry().
Function Item_func_get_user_var::resolve_type() is changed as follows:- Replace call to get_var_with_binlog() with get_variable(), since no
binlogging occurs during preparation.If variable exists, assign type of Item_func_set_user_var object from
type of variable entry.Otherwise, keep type as MYSQL_TYPE_INVALID and set a derived type with
Item_func_get_user_var::propagate_parameter_type(), based on the context of the function reference.
Function Item_func_get_user_var::propagate_parameter_type() propagates parameter type based on expression context. Different types are set based on whether the context has derived the type to be integer, decimal, float or string. This is only called when a user variable does not exist.
Item_func_get_user_var::cleanup() sets var_entry to nullptr so that a new variable entry is assigned on next execution.
Function Item_func_get_system_var::resolve_type() is changed: Resolving uses simpler logic than previous version.
Since preparation must last for a whole statement, the fulltext_searched member of class TABLE is moved to class TABLE_LIST.
Item_func_match::cleanup() sets ft_handler, concat_ws to NULL, join_key to false.
Item_func_match::~Item_func_match() is changed: If necessary, call close_search() and destroy() on hints.
Allocation of "dummy_table" is moved from Item_func_sp::itemize() to Item_func_sp::fix_fields().
Function Item_func_sp::cleanup() is changed: sp_result_field and sp_result_field are now permanent objects and will not be cleaned away.
Function Item_func_sp::bind_fields() calls sp_setup_routine() to bind a routine instance (obtained from the routine cache) with the Item object.
Function udf_handler::start_execution() is new: It alls the UDF init function and sets m_started.
Function udf_handler::cleanup() is changed:- When m_started is true, call the deinit function. - Do not call free_udf() and do not delete buffers allocated during resolving.
udf_handler::destroy() calls the free_udf().
Function udf_handler::call_init_func() is new: It prepares for calling the UDF init function and then calls it.
Function Item_func_rand::fix_fields() no longer initializes the random seed.
Function Item_func_rand::val_real() initializes the random seed on first call during execution (when first_eval is true).
Function Item_func_rand::cleanup() sets first_eval to false, so that first invocation in next execution will initialize a new seed value.
Function Item_func_case::cleanup() does not remove compare descriptors allocated during preparation.
Destructor does remove compare descriptors.
Function Item_func_in::populate_bisection() is new - it populates the bisection array with values. If all values are constant, it is called from Item_func_in::fix_fields(), otherwise it is called from the first val_xxx() function called during execution.
Function Item_func_in::cleanup_arrays() may now be called both from the destructor and from Item_func_in::cleanup().
Member variable is_const_during_preparation is true if all IN list values are constant.
Handle const ESCAPE items like this: - member escape_is_const is true if ESCAPE clause is const - When escape_is_const, evaluate ESCAPE clause during preparation. - member escape_evaluated is always true when escape_is_const is true,
otherwise it is set true on first execution and set false in Item_func_like::cleanup().
Call to my_regfree() is moved from function cleanup() to destructor.
get_normalized_field_type() is changed: For a PARAM_ITEM, the actual type of the parameter value is returned.
json_value() is changed: Do not call val_json() for a PARAM_ITEM.
get_json_atom_wrapper() is changed: Do not "give up" for a PARAM_ITEM.
Item_json_typecast::val_json() is changed: Do not call json_value() for a PARAM_ITEM.
Also includes classes Item_func_current_user and Item_func_current_role.
The value used to be evaluated during preparation. However, as the value may vary between executions, it is now evaluated on first invocation of val_string() and a member variable (m_evaluated) is set. m_evaluated is reset in Item_func_user::cleanup().
Because of this, function fix_fields() can be deleted for all these classes.
member saved_result is new, it is used to save the default Query_result for a subselect_hash_sj_engine execution, and restore it after execution, in subselect_hash_sj_engine::cleanup().
Now inherits from class Func_args_handle, which handles aggregate function arguments. (Note this would not have been necessary if Item_sum inherited from Item_func).
In Item_sum::used_tables(), return 0 when forced_const is true. Note that this strictly should have been const-for-execution, but since we only do this during optimization, we get away with setting it as "const".
The function check_wf_semantics() implemented for window functions is generally split in two: check_wf_semantics1() which is called during preparation and check_wf_semantics2() which is called just before execution. The reason for this is that we need to distinguish const data from const-for-execution data.
Function Item_sum_hybrid::fix_fields(): Type resolution is simplified (calls set_data_type_from_item()).
Function Item_sum_hybrid::cleanup() is changed: the cmp object should not be destroyed after each execution.
Function Item_func_group_concat::setup() is changed: a statement repreparation may be triggered if the variable group_concat_max_len has changed from last execution, since the metadata of the group concat function depends on this value.
Function Item_first_last_value::resolve_type() is changed: type resolution has been simplified (uses set_data_type_from_item()).
Function Item_nth_value::resolve_type() is changed: type resolution has been simplified (uses set_data_type_from_item()).
Function Item_lead_lag::resolve_type() is changed: type resolution has been simplified (uses set_data_type_from_item()).
const_item() is no longer true, but const_for_execution() is.
This class used to calculate the current date during resolving. Now, this logic has been moved to the evaluation functions (val_date_temporal(), get_date() and val_str()).
const_item() is no longer true, but const_for_execution() is.
This class used to calculate the current time during resolving. Now, this logic has been moved to the evaluation functions (val_date_temporal(), get_time() and val_str()).
const_item() is no longer true, but const_for_execution() is.
This class used to calculate the current time during resolving. Now, this logic has been moved to the evaluation functions (val_date_temporal(), get_date() and val_str()).
Function Item_func_str_to_date::val_datetime() now sends warning on invalid datetime value. Code cleanup.
Added member nodeset_func_permanent that tells if nodeset_func is assigned during resolving. If it isn`t, it's pointer is set to NULL in Item_xml_str_func::cleanup(), so that Item_xml_str_func::parse_xpath() will pick it up.
Added member m_part_of_sp so that we can distinguish between three classes of statements: regular, prepared and part of stored routines. Regular statements are prepared and executed immediately. The other two are prepared once and possibly executed multiple times, and explicitly deleted. They are mostly handled identically, but in some cases we need to distinguish between them.
Function Sql_cmd::needs_explicit_preparation() is implemented to distingush regular and prepared DML statements in Sql_cmd_dml::execute().
Function Sql_cmd::is_regular() returns true when statement is regular, false when it is prepared or part of stored routines.
Function Sql_cmd::set_owner() marks a statement as owned by a Prepared_statement, and implicitly as a prepared statement.
Function Sql_cmd::set_as_part_of_sp() marks a statement as part of stored routine.
Function Sql_cmd::unprepare() is no longer needed and is deleted.
Function Sql_cmd_dml::check_privileges() is declared as a virtual function. It must be implemented for all subclasses and is used to check privileges every time a prepared statement or statement that is part of a function is executed. It is functionally equivalent to calling Sql_cmd::precheck(), followed by subsequent column and view privilege checking during resolving, but the code became clearer by separating the calls out in dedicated functions.
Note that we do not have to call this function for regular (non-prepared) statements, as the above mentioned precheck() is sufficient.
Function Sql_cmd_dml::check_all_table_privileges() checks all tables and views contained in a DML statement for proper privileges, based on how the tables are used (selected from, inserted into, updated or deleted from). This is part of the implementation for Sql_cmd::check_privileges(), but in addition specific column privileges have to be checked.
Sql_cmd_dml::may_use_cursor() is a virtual function that is false by default and must be re-implemented to return true for all statements that may be used to return a cursor, e.g. SELECT statements.
Function Sql_cmd_dml::query_result() is a function that inspects the outer-most query expression object of the DML statement to retrieve the attached Query_result object.
Function Sql_cmd_dml::set_query_result() sets the Query_result object for the DML statement based on whether the expression is simple or not. If the expression is simple, the object is attached to the first query block of the statement, otherwise it is attached to the "fake" query block.
Function Sql_cmd_dml::set_lazy_result() is a helper function that is used in calling Query_result::prepare() in proper order for Query_result objects assigned to cursors.
Function Sql_cmd_dml::execute() is modified so that it can execute a previously prepared statement. It does this by checking the type and state of the statement, and only calling the necessary action functions.
All allocations performed inside query preparation should be persistent, so we set up a Prepared_stmt_arena_holder object before calling Sql_cmd_dml::prepare_inner().
Function Sql_cmd_select::check_privileges() checks privileges for a prepared SELECT statement.
Function Sql_cmd_call::check_privileges() is implemented.
Function Sql_cmd_call::prepare_inner() is changed: A call to SELECT_LEX::apply_local_transforms() had to be added to handle the possible case of subqueries in CALL arguments. Type propagation must be performed for dynamic parameters used as arguments to a function.
Handling of INSERT statements is slightly refactored. Because an INSERT statement may have parameter markers in the VALUES list or in the SELECT part, we may want to propagate type information from the column types of the table to be inserted into, to those parameter items. This is basically accomplished by extending function setup_fields() with an optional argument typed_items, a list of items representing type information to use for dynamic parameters. In Sql_cmd_insert_base::prepare_inner(), we prepare an Item list for the inserted columns, regardless of whether there is an explicit column list for INSERT or not, and this list is provided to setup_fields(). (Otherwise, we would have had to propagate type information also from a list of Field objects, which would have complicated interfaces quite a bit).
Similar type propagation is also applied for UPDATE part for INSERT ... ON DUPLICATE KEY ... statements.
This also simplifies other aspects of INSERT processing, since we unify processing of store assignments for INSERT: Now, only one implementations of function fill_record() is required for INSERT.
bulk_insert_started is set in Query_result_insert::start_execution() and reset in Query_result_insert::send_eof()/abort_result_set().
Function Sql_cmd_insert_base::check_privileges() is implemented to check all tables, views and columns used by a INSERT statement.
Function Sql_cmd_delete::check_privileges() is implemented to check all tables, views and columns used by a DELETE statement.
Function Sql_cmd_update::check_privileges() is implemented to check all tables, views and columns used by an UPDATE statement.
Type propagation for parameters in SET clause is performed corresponding to what is explained for class Sql_cmd_insert.
Extended function Query_result_insert::start_execution() by picking up dynamic TABLE pointer, setting up default value handling and resetting counters in "info" object. Some of these calls where earlier made in Query_result_insert::prepare().
Function Query_result_insert::cleanup() clears table pointer after execution.
Member empty_field_list_on_rset has been removed since it was only used in connection with repreparation of prepared statements.
Extended function Query_result_create::start_execution() by moving some logic from Query_result_create::prepare().
Member mem_root is a memory allocator for allocation of Handler objects for the temporary table. The prepared statement's mem_root cannot be used, since the handler may be created and deleted several times. The execution mem_root cannot be used since creation of and retrieval from a cursor are in different executions.
The mem_root is initialized in the constructor and freed in the destructor.
Function Materialized_cursor::close() only closes the temporary table, it no longer frees it. It clears the mem_root so that it is ready for reuse (creation of temporary table contents).
The destructor frees the temporary table.
Function Query_result_materialize::prepare() creates a Materialized_cursor, then creates the cursor temporary table.
Function Query_result_materialize::start_execution() instantiates contents for the cursor temporary table.
Function mysql_open_cursor() binds a Query_result_materialize object with the prepared statement and executes the statement (which also fills the cursor temporary table).
Function Table_function::destroy() calls destructor for this object.
Destructor calls destructors for all objects in m_all_columns list.
Function Table_function_json::do_cleanup() no longer clears m_all_columns and m_vt_list.
Function Window::check_window_functions() is split in Window::check_window_functions1() and Window::check_window_functions2(). The former is doing resolve-time checks, the latter is doing checks before execution.
Window::check_border_sanity() is split in Window::check_border_sanity1() and Window::check_border_sanity2(), similar to the above.
Window::setup_windows() is split in Window::setup_windows1() and Window::setup_windows2(), similar to the above.
Window::reset_execution_state() no longer empties m_partition_items and m_order_by_items.
Window::cleanup() no longer deletes cached items.
Function Window::destroy() calls destructors for all objects in m_order_by_items and m_partition_items.
SET statements are processed using handle_query(), also when executed as prepared statements. handle_query() had to be enhanced to accept statements that are already prepared, and performing opening of tables and binding of table and field objects in that case.
This will reset the most recent execution by calling SELECT_LEX_UNIT->clear_execution() and prepare for a new one by setting the current THD pointer.
No need to ready SELECT_LEX and Item objects for re-preparation.
No need to call cleanup() on condition objects and other items.
No need to call unit->types.empty().
Still needs to reset all ORDER objects.
function run_query_stmt() of mysqltest
Logic that was used to clean duplicate warnings from preparation and execution has been deleted, since we no longer do preparation as part of execution (and hence cannot have duplicate warnings because preparation is performed twice).
This is a new inline function in field.h that checks if a given data type represents a string.