WL#4897: Add EXPLAIN INSERT/UPDATE/DELETE

Status: Complete   —   Priority: Medium

Currently MySQL supports EXPLAIN SELECT, but it doesn't support EXPLAIN for
INSERT, REPLACE, UPDATE and DELETE.

An execution plan for the complex (and probably multi-table)
INSERT/REPLACE/UPDATE/DELETE may not coincide with a plan of respective SELECT
query (see BUG#36569 etc), so even EXPLAIN SELECT may not help to optimize
database/debug queries.

We should implement support for EXPLAIN INSERT/REPLACE/UPDATE/DELETE. 

Note: also there was a similar WL#706 (cancelled as as duplicate).

Also see WL#6033: Add transformed query to EXPLAIN EXTENDED INSERT/UPDATE/DELETE

Other DBMSs
-----------

DB2 supports EXPLAIN for SELECT, INSERT, UPDATE, DELETE, SELECT INTO,
and VALUES [INTO] (VALUES is similar to our SELECT with no FROM clause).

Oracle supports EXPLAIN PLAN for SELECT, INSERT, UPDATE, DELETE, CREATE TABLE,
CREATE INDEX, and ALTER INDEX ... REBUILD.

EXPLAIN for the rest of MySQL statements may be implemented in separated WLs by 
request.
There is no intent to support EXPLAIN for table-maintenance,
data-description-language, transaction-control, backup-restore, utility,
or non-preparable statements.
New Syntax
----------

  EXPLAIN [ EXTENDED | PARTITIONS ] `statement'

where `statement' is any valid MySQL INSERT, REPLACE, UPDATE or DELETE
statement.

Note: similar to EXPLAIN SELECT "You cannot use the PARTITIONS and EXTENDED 
keywords together in the same EXPLAIN ... SELECT statement. Attempting to do so 
produces a syntax error".

Examples:

  EXPLAIN INSERT INTO t1 SELECT * FROM t2;
  EXPLAIN EXTENDED UPDATE t1 SET a = 10 WHERE b = 20;
  EXPLAIN DELETE FROM t1 WHERE a > 10;


Semantics
---------

The result is comparable to what MySQL does for EXPLAIN SELECT,
as described in the MySQL Reference Manual section
"Optimizing Queries with EXPLAIN"
http://dev.mysql.com/doc/refman/5.5/en/using-explain.html

While INSERT/REPLACE/UPDATE/DELETE statements may update or delete table rows
and change user variable values, EXPLAIN INSERT/REPLACE/UPDATE/DELETE doesn't
evaluate INSERT/REPLACE/UPDATE/DELETE statement, so the query doesn't modify
table data as well as it doesn't invoke triggers and data/statement
replication (depends on WL#5274)..


Privileges
----------

The user must have the same privileges as would be required
to actually execute INSERT/REPLACE/UPDATE/DELETE respectively.
Additionally SHOW VIEW privilege is needed on any used view.


Error messages
--------------

None: the new statements may return same error messages (parse errors etc),
like underlying INSERT/REPLACE/UPDATE/DELETE statements do on parse,
preparation and optimization phases.

Also the new statements may return notes (warnings) in free form.
Details:

* New EXPLAIN DELETE-specific message: "Deleting all rows"

  Some engines (for example MyISAM) support handler::delete_all_rows()
  method that removes all table rows in a simple and fast way.
  If the query uses this optimization, we output "Deleting all rows"
  extra message.
  We use this optimization is the user wants to delete all rows and
  deletion doesn't have any side-effects (because of triggers).
  We can use delete_all_rows() if and only if:
   - We allow new functions (not using option --skip-new), and are not
     in safe mode (not using option --safe-mode)
   - There is no limit clause
   - The condition is constant
   - If there is a condition, then it it produces a non-zero value
   - If the current command is DELETE FROM with no where clause, then:
     - We should not be binlogging this statement in row-based, and
     - there should be no delete triggers associated with the table.

* New EXPLAIN DELETE and UPDATE-specific message: "No matching rows
  after partition pruning"

  Obviously, this message means that there is nothing to DELETE/UPDATE
  after partition pruning (kind of "Impossible WHERE"). 

* EXPLAIN INSERT/REPLACE-specific use of old message: "No tables used"

  This message is not new, but current EXPLAIN INSERT/REPLACE output it
  in a bit unusual way:
  currently EXPLAIN INSERT/REPLACE output result set describes _only_
  SELECT clause execution plan.
  Thus, SELECT-less EXPLAIN INSERT/REPLACE
  (such as "EXPLAIN INSERT INTO t VALUES (1),(2),(3)") outputs this message.
  Motivation: "EXPLAIN INSERT INTO t VALUES (10)" has the same output
  as an equivalent "EXPLAIN INSERT INTO t SELECT 10 FROM DUAL".


Motivation/Workarounds
----------------------

There is no acceptable way to replace the new statements with existent tools:

While EXPLAIN SELECT statement provides us with detailed QEP prior to SELECT
query execution, INSERT/REPLACE/UPDATE/DELETE statements may be analyzed only
post factum -- usually with the help of "Handler%" and "Sort%" server status
variable values.

Such an INSERT/REPLACE/UPDATE/DELETE analysis has significant drawbacks:
  a) it has side effects (table data and in-query user variable modification),
  b) it is complicated for INSERT/REPLACE/UPDATE/DELETE with nested queries and
  c) it is insufficient: for ex. used index name is not available this way.

OTOH, use of the "equivalent" EXPLAIN SELECT output as a QEP for
INSERT/REPLACE/UPDATE/DELETE is not acceptable in the most common case too:
  a) not every INSERT/REPLACE/UPDATE/DELETE may be converted into an obvious
     equivalent SELECT query, and
  b) even very similar SELECT statement and a query part of
     INSERT/REPLACE/UPDATE/DELETE may have different QEP since some access
     methods are not applicable to INSERT/UPDATE/DELETE queries.
     For example: UPDATE/DELETE doesn't use covering key scan for the
     updating table.
Subtasks
--------

UPDATE and DELETE statements have two syntax forms: single- and multi-table.

Multiple-table statement implementation deals with a top-level JOIN structure
like SELECT statements do, so the same approach may be reused with a minimal
effort -- this is the 1st subtask of the WL.

However, single-table statements are implemented in a very different way: they
don't construct top-level JOIN, so direct reuse of EXPLAIN SELECT code is not
applicable and a special improvement of server code is required -- this is the
2nd subtask.

The 3rd subtask is to send a note (warning) string of transformed statement like
EXPLAIN EXTENDED SELECT does. The task is TBD later. (The note was almost never 
documented before, see WL#5144 for further documentation.)

The 4th subtask is am implementation of privilege checks for new commands.
It is achieved already by having EXPLAIN simply going through the same code
paths as INSERT/REPLACE/etc. The only exclusions are checks for views:
we need the SHOW VIEW privilege for every explained view.


INSERT and REPLACE also have two syntax forms/implementations: with or without
SELECT. The first form implementation is similar to multitable UPDATE/DELETE,
so EXPLAIN works the same way. The second form doesn't access tables like
SELECT, so EXPLAIN is trivial there: QEP consists of "No tables used" line.


Multiple-table UPDATE/DELETE and INSERT/REPLACE ... SELECT
----------------------------------------------------------

The main idea is to reuse mysql_select() infrastructure to output QEP of
INSERT/REPLACE/UPDATE/DELETE like EXPLAIN SELECT does.

Pseudocode for EXPLAIN SELECT execution:

1  if (lex->describe)
2  {
3    select_send explain_result_data_set;
4    thd->send_explain_fields(&explain_result_data_set);
5    mysql_select(...&explain_result_data_set);
6  }

The only differences from the normal SELECT evaluation are:

1  if (lex->describe)

(The server also uses the lex->describe flag inside mysql_select() to recognize
EXPLAIN mode) and

4    thd->send_explain_fields(&explain_result_data_set);

where the server substitutes output column metadata with QEP table column info.

Unfortunately, we can't use the same code for EXPLAIN
INSERT/REPLACE/UPDATE/DELETE even if it produce some probable QEP: actually
that QEP is not a plan of INSERT/REPLACE/... but a different plan of the
similar SELECT query since modifying queries (INSERT/REPLACE/...) have
different requirements to involved tables.

Pseudocode for multi-table UPDATE evaluation (other statements are same as for
SELECT):

1  mylti_update update_result;
2  mysql_select(...&update_result);

This update_result:
  a) controls the preparation of "equivalent" SELECT to make the QEP
     acceptable for further table data modification (prepare() and prepare2()
     methods of multi_update/multi_delete classes),
  b) it intercepts "equivalent" SELECT data to make decisions what table data
     to change and updates/deletes this selected data (send_data() methods).

I.e.:

 1  multi_update update_result;
 2  mysql_select(...&update_result)
 3  {
 4    JOIN::prepare()
 5    {
 6      update_result.prepare() || update_result::prepare2();
 7    }
 8    JOIN::optimize(); // there we have actual QEP
 9    JOIN::exec()
10    { // data modification, replication etc
11      update_result.send_data();
12    }
13    if (ok)
14      update_result.send_ok()
15  }
16  if (!ok)
17    update_result.abort_result_set()

It is good to have the QEP from the listing 1..8 above, but also we need to
avoid side effects of line 11..17. As well we have to output the QEP like
EXPLAIN SELECT does.

The solution is a new explain_send class that initializes tables like
INSERT/REPLACE...SELECT and multi-table UPDATE/DELETE do but doesn't
modify tables.
This class extends select_send class (from EXPLAIN SELECT implementation)
with prepare(), prepare2() and initialize_tables() method functionality
from select_insert/multi_update/multi_delete interceptor classes:

 1  class explain_send : public select_send {
 2  protected:
 3    select_result_interceptor *interceptor;
 4  public:
 5    explain_send(select_result_interceptor *interceptor_arg) 
 6    : interceptor(interceptor_arg) {}
 7  protected:
 8    virtual int prepare(List<Item> &list, SELECT_LEX_UNIT *u)
 9    {
10      return select_send::prepare(list, u) || interceptor->prepare(list, u);
11    }
12    virtual int prepare2(void)
13    {
14      return select_send::prepare2() || interceptor->prepare2();
15    }
16    virtual bool initialize_tables(JOIN *join)
17    {
18    return select_send::initialize_tables(join) ||
19           interceptor->initialize_tables(join);
20    }
21    ...

where select_result_interceptor is the common parent class for multi_update,
multi_delete and select_insert data interceptor classes.

So we can modify UPDATE implementation and add EXPLAIN UPDATE as:

1  multi_update update_result;
2  select_send *result= lex->describe ? &update_result 
3                                     : new explain_result(&update_result);
4  mysql_select(...result...);
  

Single-table UPDATE/DELETE
--------------------------

Unlike multiple-table UPDATE/DELETE statements, the single-table forms have
particular implementations (see mysql_update()/mysql_delete()) where we directly
prepare/optimize condition item tree, analyze ORDER BY list and choose access
methods.

Fortunately, the parser constructs SELECT_LEX_UNIT tree for nested subqueries in
the query condition (if any), then we fix the condition item tree running a
subquery preparation/optimization and filling related SELECT_LEX/JOIN
structures.

So, the actual UPDATE/DELETE QEP is available in mysql_update() and
mysql_delete() methods, *but* excluding the 1st QEP line.

I.e. to implement EXPLAIN mode in mysql_update()/mysql_delete():
  1) we need to construct select_send object for QEP output and
     send_explain_fields() there,
  2) gather table/access method information and
  3) send it in QEP form to select_send object too,
  4) traverse SELECT_LEX_UNIT list (like select_describe() does) and call
     mysql_explain_union() on each unit to send QEP parts,
  5) avoid side effects: data/user variable modification, trigger invocation
     and replication procedures.

The select_describe() function implements #3 an #4 for EXPLAIN SELECT but
there is no direct way to reuse it for other "single-table" EXPLAINs, since
select_describe() required top-level JOIN object.
In the context of this WL that function has been improved to serve single-
table EXPLAINs and refactored into a few front-end functions:
  a) select_describe() -- old EXPLAIN SELECT functionality;
  b) table_describe() -- new EXPLAIN single-table UPDATE/DELETE;
  c) msg_describe() -- widely used trivial part of old select_describe().

These functions has been implemented with the help of a few new classes:
  - the base class Explain guarantee the correct ordering of columns in the
    EXPLAIN result set,
  - the Explain_table class implements new single-table EXPLAINs,
  - Explain_join, Explain_union and Explain_msg classes are results of a
    decomposition of the old complicated select_describe() function.

These new classes share common parts of decomposed select_describe() function
by method overloading when applicable.