WL#5094: Create SQL command classes for DML statements

Affects: Server-8.0 — Status: Complete

Create classes derived from Sql_cmd that represent all supported data manipulation (DML) statements.

It is also part of our work to refactor prepared statement handling. In order to do that, we need a consistent interface to all DML statements, and we need interfaces to prepare and unprepare such statements.

The key benefits of this worklog are:

Consolidation of all DML statements into a simple class hierarchy.
Clear state transitions for all DML statements (unprepared - prepared - executed).
Preparation code and execution code clearly separated
Lifecycle processing of Query_result objects prepared for single preparation.
Consolidation of preparation code - an example is INSERT that used to contain three different preparation codes depending on whether the preparation was for INSERT VALUES, INSERT SELECT, or for a prepared statement.

One side-effect of the worklog is that prepared statements will always go through a full preparation, before being unprepared and then re-prepared and executed. Later the redundant preparation step will be removed. However, in some cases, we will execute slightly more code in preparation steps than before, since some preparation code has been moved from the execution stage. It is not expected that this will cause a noticeable performance degradation for prepared statements.

User Documentation

No significant user-visible effects. No user documentation required.

Classes used to represent DML commands
DML statements not handled by these classes
Classes used for Query result handling
Refactoring of LIMIT and OFFSET handling
Behavioral changes
- Using UNIONed queries in INSERT ... ON DUPLICATE
- Class Reprepare_observer used for all DML queries
Minor incomplete parts
Possible future enhancements
- is_empty_query()

Classes used to represent DML commands

All DML statements will be implemented by classes that are inherited from Sql_cmd_dml, which is again inherited from class Sql_cmd.

Class Sql_cmd is extended with interfaces for preparation of a statement, and for associating an Sql_cmd object with a Prepared_statement.

Class Sql_cmd_dml gets a common implementation of prepare() and execute(). Thus, it is easy to enforce that all DML statements are treated similarly.

Class Sql_cmd_dml

Sql_cmd_dml::prepare() walks through these common steps:

precheck() - performs a coarse-grained authorization check of the statement.

open_tables_for_query() - opens the tables and views mentioned in the statement. Views are expanded so that underlying views and tables are opened too.

resolve_var_assignments() - resolves variable assignments in the statement.

prepare_inner() - performs statement-specific preparation of the statement and is implemented for every subclass of Sql_cmd_dml.

Sql_cmd_dml::execute() walks through these common steps:

set_statement_timer() - is called if a time limit is applicable to the statement.

prepare() is called if the statement is a regular (not preparable) statement.

If prepare() is not called, precheck() and open_tables_for_query() are still called since these actions are required also when executing already prepared statements.

run_before_dml_hook() is called if the statement is a data change statement, in order to prepare replication actions for the statement.

An IGNORE or STRICT mode error handler is set up if applicable. It will be active for the duration of the execution of the statement.

lock_tables() is called, unless the statement affects no rows or produces no rows in any tables.

Query_cache::store_query() is called to register the statement in the query cache, if applicable.

execute_inner() - performs statement-specific optimization and execution of the statement. Sql_cmd_dml::execute_inner() is an implementation for all SELECT statements, all INSERT statements that are based on a SELECT and all multi-table UPDATE and DELETE statements (ie all statements that are implemented using a JOIN object). For all other types of DML statements, a separate implementation for execute_inner() exists.

Class Sql_cmd_select

This is a new class used to implement SELECT statements.

It has an implementation of prepare_inner() to prepare SELECT statements.

It uses Sql_cmd_dml::execute_inner() to execute SELECT statements.

Class Sql_cmd_insert_base

This is a base class for implementation of all INSERT and REPLACE statements. It is an abstract class and has the derived classes Sql_cmd_insert_values and Sql_cmd_insert_select.

The class implements the interfaces precheck() and prepare_inner(), which are common for the derived classes. Notice that preparation is now common for all INSERT and REPLACE statements.

Class Sql_cmd_insert_values

This class implements INSERT and REPLACE statements that are directly based on a VALUES clause. It contains an implementation of execute_inner().

Class Sql_cmd_insert_select

This class implements INSERT and REPLACE statements that are directly based on a SELECT. This class has no significant implemented functions, as it uses Sql_cmd_dml_insert_base::prepare_inner() for preparation and Sql_cmd_dml::execute_inner() for execution.

Class Sql_cmd_update

The functions precheck(), prepare_inner() and execute_inner() are added to the class. prepare_inner() is now a consolidated function for all types of UPDATE statements.

When faced with an update to a multi-table view, transformation from a single-table update to a multi-table update is performed inside this function. The design is also changed so that it will later be easier to convert other single-table UPDATE statements into multi-table statements. One possible example of this is UPDATE statement that implements a semi-join transformation.

The execute_inner() function will directly call Sql_cmd_dml::execute_inner() for a multi-table UPDATE. It will call Sql_cmd_update::update_single_table() for a single-table UPDATE.

The function is_single_table_plan() is used to distinguish between single-table and multi-table UPDATE statements.

Class Sql_cmd_delete

The functions precheck(), prepare_inner() and execute_inner() are added to the class. prepare_inner() is now a consolidated function for all types of DELETE statements.

DELETE statements cannot currently be transformed from single-table operations to multi-table operations, but like UPDATE statements, the preparation code is now easier to change into allowing this, e.g. for DELETE with a semi-join transformation.

The execute_inner() function will directly call Sql_cmd_dml::execute_inner() for a multi-table DELETE. It will call Sql_cmd_delete::delete_from_single_table() for a single-table DELETE.

The function is_single_table_plan() is used to distinguish between single-table and multi-table DELETE statements.

Class Sql_cmd_do

This class is used to implement DO statements. It extends from Sql_cmd_select, and the only difference between DO and SELECT is that it assigns a Query_result object of class Query_result_do to the statement, which does not send any data to the client.

Class Sql_cmd_call

This class is used to implement CALL statements. It contains implementations of the functions precheck(), prepare_inner() and execute_inner().

DML statements not handled by these classes

The following statements are currently not handled by this class:

TRUNCATE

LOAD

LOCK TABLE, LOCK TABLES

UNLOCK TABLE, UNLOCK TABLES

HANDLER ... OPEN

HANDLER ... CLOSE

HANDLER ... READ

Classes used for Query result handling

A small refactoring of class Query_result is also needed: Currently, Query_result objects are often generated during preparation of queries and deleted after execution is completed. But notice also that some Query_result objects are created during parsing and have the same lifetime as the statement. Because of the requirement to prepare a statement only once, Query_result objects should become an integral part of statements after preparation, and deletion should be delayed until the statement is deleted.

A consequence of this is that some actions that were earlier done in Query_result::prepare() must be postponed until Query_result::send_result_set_metadata() or ::prepare2(), because they must be performed for every execution of the statement. Likewise, actions earlier performed in the destructor must instead be done in Query_result::cleanup(), and cleanup must be called for every execution, since the destructor is now called only when the statement is destructed.

Refactoring of LIMIT and OFFSET handling

The preparation and assignment of values for LIMIT and OFFSET has been quite ad-hoc. This WL refactors the handling by implementing a prepare_limit() function that is called during preparation of a statement, and a set_limit() function that is called before optimization of a statement and before every execution. Notice that these calls must be performed for every query block of the statement.

Behavioral changes

Some changes are necessary because the current behavior is buggy and hard to maintain.

Using UNIONed queries in INSERT ... ON DUPLICATE

In the INSERT ... ON DUPLICATE statement, it is currently possible to update the current row (the row that was attempted to be inserted but turned out to be a duplicate) based on column values from the query expression and from the current row itself. There are however restrictions on the query expression. If this is a grouped query, it is not possible to use columns from the query expression.

However, it is possible to use a query expression that is a UNION. But if we analyze this further, we see that values are taken only from the first part of the UNION, so this feature is clearly not working.

If we compare this statement to the SQL standard MERGE statement, we see that MERGE is using a single table as update source. Our implementation of INSERT can be considered to be an extension to this, since we allow joined tables and UNIONed queries. Notice though that the standard is not severely limiting, since it is possible to use a derived table as the single table.

It turns out that we can quite easily continue to use the extension of supporting a joined table. However, supporting UNIONed tables is incompatible with how we resolve column names and will no longer be supported. Thus we implement the following rules for what query expressions to allow to accept columns from:

a single table (which may be a derived table)
a join over multiple tables
DISTINCT queries

If one needs to take values from grouped queries or UNIONed tables, it will be necessary to specify this as a derived table and use a simple selection from that derived table.

There is an implementation restriction for this: The grouped queries use a temporary table to evaluate their input, but resolving the columns used in the ON DUPLICATE part is against the base tables. Likewise, the UNION queries would resolve against only the first query block specified in the UNION, and nit against any subsequent ones. Thus, when updating the rows, updated column values would be taken from some arbitrary data in the input table buffers. It is possible to fix this, but we think it is better to encourage rewriting the query using a derived table instead.

The behavioral change is thus that column values from a UNION cannot be used to specify update operations in INSERT ... ON DUPLICATE. The suggested replacement is to wrap the UNION query expression inside a derived table.

Class Reprepare_observer used for all DML queries

This class was previously used only for SELECT statements, multi-table statements and subqueries. In order to make all DML operations behave consistently, the class will now be enabled for single-table operations as well.

There is one important change of behavior due to this: When opening a view that contains references to a permanent table, the open_type for the TABLE_LIST representing that table is set to OT_BASE_ONLY, meaning that the next time the view is opened, it is "tied" to the same permanent table, and not a temporary table with the same name that may have been created later. One particular test (BUG#47313 in sp.test) is utilizing this property to avoid inserting into a temporary table. However, when the Reprepare_observer class is used also for INSERT statements, effectively this binding is lost because preparing the query causes an error (Insert into a non-insertable table), which causes a subsequent re-parse of the query and the affinity against the permanent table is lost.

Hence, when this WL is implemented, repreparing the query causes the newly created temporary table to be picked up.

Notice that this has always been unreliable, for the following two reasons:

If some underlying tables of a query/SP is changed, then a full reprepare is triggered for the query. This means that we lose the affinity to the view t1, and the next table binding for table t1 will pick up the temporary table. Hence, a change of view t1 would totally blow up this query.
The Reprepare_observer class was used for all SELECT statements, all multi-table UPDATE and DELETE statements and all subqueries of other DML statements. The observer was triggered whenever a query preparation or optimization fails, and instead of implementing a tedious rollback of data structures in an error situation, a full repreparation of the query was forced. The consequence was the same as above; the affinity to the view was lost. This happened for any such statement that detected an error in preparation.

Minor incomplete parts

Some parts of the code are still not in perfect condition and must be further cleaned up later. Here is a list of the known issues:

Asserts in Item::transform() have been commented out. This is a temporary action due to the need to prepare a query every time it is executed. This will be cleaned up later.
Some tests are failing with --ps-protocol due to repeated warnings. These warnings will be eliminated later.
Sql_cmd_update::prepare_inner() cleanup is incomplete. Look at covering_keys, check_view_privileges(), setup_fields(), prune_partitions().
COPY_INFO use for UPDATE and DELETE statements: Avoid having separate objects for preparation and execution.
Eliminate need for the field LEX::result.

Possible future enhancements

is_empty_query()

A new property function is_empty_query() has been implemented on preparation of a SELECT_LEX. It is currently set when static partition pruning ends up with no actual partitions, and only for single-table DELETE and UPDATE statements.

An equivalent property is also implemented for class Sql_cmd_dml, which is set based on SELECT_LEX::is_empty_query().

The property can be used to notice several other properties too:

Check empty query also for multitable UPDATE and DELETE.
Check empty query also for INSERT and SELECT.
Set empty query if WHERE clause is always false.
SELECT: statement is empty if all outer-most query blocks have is_empty_query() property and are not implicitly grouped.

Class Sql_cmd and derived classes

Class Sql_cmd

class Sql_cmd : public Sql_alloc { public:

 virtual enum_sql_command sql_command_code() const = 0;

 bool is_prepared() const { return m_prepared; }

 virtual bool prepare(THD *thd);

 virtual bool execute(THD *thd) = 0;

 virtual void cleanup(THD *thd) {}

 void set_owner(Prepared_statement *stmt) { m_owner= stmt; }

 virtual bool is_dml() const { return false; }

 virtual bool is_single_table_plan();

 virtual void unprepare(THD *thd);

protected:

 bool is_preparable() const { return prepare_only; }

 void set_prepared() { m_prepared= true; }

private:

 Prepared_statement *m_owner; /// Owning prepared statement, NULL if non-prep.
 bool m_prepared; /// True when statement has been prepared

};

The is_preparable() and is_prepared() interfaces are new in this class, and so is the possibility to attach the class to a Prepared_statement using set_owner().

is_preparable() is used to check whether a statement can be prepared separately. It returns false if statement is a regular type (ie it is prepared and executed with the function Sql_cmd::execute()).

is_prepared() is used to check whether a function has been prepared already.

unprepare() is a temporary function that will be removed later. Because we still preserve the lifecycle management that execution of a prepared statement causes both a call to prepare() and a call to execute(), we need to "unprepare" a DML statement after every preparation and execution, so that the following execution starts with a call to prepare().

is_dml() is a function that returns true if the SQL statement is a DML statement (see class Sql_cmd_dml for details).

is_single_table_plan() is a function that returns true if the plan is not based on a outer level JOIN object. It is used by EXPLAIN code to choose whether to inspect a JOIN object or not.

Class Sql_cmd_dml

class Sql_cmd_dml: public Sql_cmd { public:

 virtual bool is_data_change_stmt() const { return true; }

public:

 bool prepare(THD *thd)
 {
   if (precheck(thd))
     return true;

   if (open_tables_for_query(thd, lex->query_tables,
                            is_preparable() ? MYSQL_OPEN_FORCE_SHARED_MDL : 0))
     return true;

   if (lex->set_var_list.elements && resolve_var_assignments(thd, lex))
     return true;

   if (prepare_inner(thd))
     return true;

   set_prepared();

   return false;
 }

 bool execute(THD *thd, parameters)
 {
   start_stmt_dtrace(const_cast<char *>(thd->query().str));

   if (!is_prepared())
   {
     if (prepare(thd))
       return true;
   }
   else
   {
     if (precheck(thd))
       return true;

     if (open_tables_for_query(thd, tables, 0))
       return true;
   }

   THD_STAGE_INFO(thd, stage_init);

   if (is_data_change_stmt())
   {
     if (run_before_dml_hook(thd))
       return true;

     // Push ignore / strict error handler
     if (lex->is_ignore())
     {
       thd->push_internal_handler(&ignore_handler);
       error_handler_active= true;
     }
     else if (thd->is_strict_mode())
     {
       thd->push_internal_handler(&strict_handler);
       error_handler_active= true;
     }
   }

   if (!is_empty_query())
   {
     if (lock_tables(thd, tables, table_count, 0))
       return true;

     query_cache.store_query(thd, tables);
   }

   if (execute_inner(thd))
     return true;

   if (error_handler_active)
     thd->pop_error_handler();

   THD_STAGE_INFO(thd, stage_end);

   if (unit->cleanup())
     return true;

   return false;
 }

 virtual bool is_dml() const { return true; }

 virtual bool is_single_table_plan() const { return false; }

 virtual bool precheck(THD *thd);

protected:

 bool is_empty_query();

 void set_empty_query() { m_empty_query= true; }

 virtual bool precheck(THD *thd)= 0;

 virtual bool prepare_inner(THD *thd);

 virtual bool execute_inner(THD *thd);

 virtual void start_stmt_dtrace(char *query) = 0;
 virtual void end_stmt_dtrace(int status, ulonglong rows, ulonglong chg) = 0;

 LEX *lex;
 Query_result *result;
 bool m_empty_query; // True if query will produce no rows

};

The precheck() function is a new function here that is used to perform a coarse authorization check against a statement. If no column-level privileges are required and no views are present in the statement, this will be a sufficient authorization check, otherwise more specific checks must be performed during preparation of the statement.

is_empty_query() and set_empty_query() are used to single out statements that are identified as not affecting any rows during preparation. For UPDATE and DELETE statements, the WHERE clause must qualify no rows. For SELECT statements and INSERT statements based on a SELECT, all query blocks must qualify no rows and cannot be implicitly grouped (the latter requirement is not yet implemented).

Class Sql_cmd_select

class Sql_cmd_select: public Sql_cmd_dml { public:

 virtual bool is_data_change_stmt() const { return false; }

protected:

 virtual bool precheck(THD *thd);

 virtual bool prepare_inner(THD *thd);

 virtual void start_stmt_dtrace(char *query)
 {
   MYSQL_SELECT_START(query);
 }
 virtual void end_stmt_dtrace(int status, ulonglong rows, ulonglong changed)
 {
   MYSQL_SELECT_DONE(status, rows);
 }

};

The precheck() function ensures that we have some SELECT privilege to all involved tables of the statement.

The prepare_inner() function prepares the query expression of the SELECT statement, and recursively all contained expressions within the outer-most expressions(s).

The statement is executed using Sql_cmd_dml::execute_inner(), so this function is not implemented for this class.

Class Sql_cmd_insert_base and derived classes

class Sql_cmd_insert_base: public Sql_cmd_dml { protected:

 virtual bool precheck(THD *thd);

 virtual bool prepare_inner(THD *thd);

};

class Sql_cmd_insert_values : public Sql_cmd_insert_base { public:

 virtual bool is_single_table_plan() const { return true; }

protected:

 virtual bool execute_inner(THD *thd);

};

class Sql_cmd_insert_select : public Sql_cmd_insert_base { };

The INSERT/REPLACE statements are the only ones that are implemented by several classes: Class Sql_cmd_insert_base provides the common authorization check and preparations. Class Sql_cmd_insert_values implements optimization and execution of INSERT/REPLACE using a VALUES clause. Class Sql_cmd_insert_select implements optimization and execution of INSERT/REPLACE using a general query expression (ie a SELECT clause, possibly also containing a UNION).

Class Sql_cmd_insert_values contains an implementation of execute_inner(). Class Sql_cmd_insert_select uses the general Sql_cmd_dml::execute_inner().

Class Sql_cmd_update

class Sql_cmd_update: public Sql_cmd_dml { public:

virtual bool is_single_table_plan() const { return !multitable; }

protected:

 virtual bool precheck(THD *thd);

 virtual bool prepare_inner(THD *thd);

 virtual bool execute_inner(THD *thd);

private:

 bool update_single_table(THD *thd);

 bool multitable;
 List<Item> *update_value_list;

};

The class is common for both single-table and multi-table UPDATE statements. The benefit of this is that we can easily convert a single-table operation into a multi-table one if that is reasonable. It happens today when attempting an UPDATE against a multi-table view. It might also be done for UPDATE operations containing subqueries that may be transformed to semi-join.

The function is_single_table_plan() is used to check whether the statement is implemented as a single-table or multi-table operation.

Class Sql_cmd_delete

class Sql_cmd_delete: public Sql_cmd_dml { public:

virtual bool is_single_table_plan() const { return !multitable; }

protected:

 virtual bool precheck(THD *thd);

 virtual bool prepare_inner(THD *thd);

 virtual bool execute_inner(THD *thd);

private:

 bool delete_from_single_table(THD *thd);

 bool multitable;
 SQL_I_List<TABLE_LIST> *delete_tables;

};

The class is common for both single-table and multi-table DELETE statements. The benefit of this is that we can easily convert a single-table operation into a multi-table one if that is reasonable. It might be done when attempting a DELETE against a multi-table view. It might also be done for DELETE operations containing subqueries that may be transformed to semi-join.

The function is_single_table_plan() is used to check whether the statement is implemented as a single-table or multi-table operation.

Class Sql_cmd_do

class Sql_cmd_do: public Sql_cmd_select { };

This class is mostly similar to Sql_cmd_select.

Class Sql_cmd_call

class Sql_cmd_call: public Sql_cmd_dml { public:

 virtual enum_sql_command sql_command_code() const
 { return SQLCOM_CALL; }

 virtual bool is_data_change_stmt() const { return false; }

protected:

 virtual bool precheck(THD *thd);

 virtual bool prepare_inner(THD *thd);

 virtual bool execute_inner(THD *thd);

private:

 sp_name *proc_name;
 List<Item> *proc_args;

};

The class is used to implement CALL statements.

The precheck() function validates that the user has EXECUTE privileges for the procedure to be called and SELECT privileges for any subqueries that are evaluated as part of the function arguments.

The prepare_inner() function validates that the procedure exists and that arguments to the procedure are well-formed. Notice that sp_find_routine() only retrieves a procedure description, it no longer sets up the procedure for execution.

The execute_inner() function sets up the procedure for execution (using the new sp_setup_routine() function), then calls sp_head::execute_procedure() to execute the procedure.

Class Query_result and derived classes

There are no changes to the Query_result base class.

Class Query_result_to_file

Closing the file was previously done in destructor but is now done in cleanup().

Class Query_result_export

Opening the file was previously done in ::prepare() but is now done in prepare2().

Closing the file was previously done in destructor but is now done in cleanup().

Class Query_result_dump

Opening the file was previously done in ::prepare() but is now done in prepare2().

Closing the file was previously done in destructor but is now done in cleanup().

Class Query_result_insert

A lot of code has been moved from ::prepare() to Sql_cmd_insert::prepare_inner(). This also avoids quite a lot of code duplication.

Cleanup code is moved from the destructor to cleanup().

Class Query_result_delete

prepare() has been shrunk: Only code related to preparation remains.

Cleanup code is moved from the destructor to ::cleanup().

Class Query_result_update

prepare() has been shrunk: Only code related to preparation remains.

Cleanup code is moved from the destructor to ::cleanup().

Class Parse_tree_node and derived classes

The following changes have been done to classes derived from Parse_tree_node:

Class PT_into_destination

Added properties for destination (query result object and exchange).

Class PT_select_stmt

Added make_cmd() interface.

Class PT_delete

Added delete_tables list, removed corresponding list from class LEX.

Added is_multitable() interface.

Class PT_call

This is a new class for contextualization of CALL statements.

It contains a contextualize() function for processing the procedure name and procedure arguments.

It also contains a make_cmd() function for creation of an Sql_cmd_call object.