WL#4644: Online Backup: Fix the DDL blocker

Affects: Server-6.0   —   Status: Complete

RATIONALE
---------
To fix BUG#32702.

DETAILS
-------
DDL blocker is used inside backup system to prevent any metadata changes during
backup/restore process. It blocks DDL statements which could change the
metadata. DDL blocker is implemented but it is incomplete - it does not block
changes for all types of objects being backed-up (for example it does not block
DROP VIEW and many others). BUG#32702 contains a list of what is and what is not
blocked by the current implementation.

ALTERNATIVES
------------
There are 2 alternative understandings of what a DDL blocker should do:

A) Block all known DDL statements.
B) Block these (and only these) SQL statements which need to be blocked to
ensure BACKUP/RESTORE consistency.

Ad A) Note that it is not completely clear what a DDL statement is. There are
some statements which could be objected as being DDLs, such as TRUNCATE TABLE,
OPTIMIZE/REPAIR TABLE, DROP/CREATE INDEX. Nevertheless, they must be blocked for
BACKUP to work correctly. If DDL blocker does not block them, it becomes useless
for the backup subsystem.

Ad B) For the analysis what needs to be blocked, see HLS. If this semantics is
implemented then a better name for the service could be picked. For example
"Backup Metadata Lock", or BML for short.

This worklog will implement alternative B.

NOTE: The "DDL blocker" service is considered a temporary solution. In future
versions it is planned to replace it by more appropriate server mechanisms such
as fine-grained metadata locks.
WHAT STATEMENTS NEED TO BE BLOCKED
----------------------------------
Here we identify a *minimal* set of statements which should be blocked to ensure
correct functioning of BACKUP/RESTORE commands.

Backup image stores metadata for the following types of objects:

- databases
- tablespaces
- privileges
- tables
- views
- stored functions
- stored procedures
- events
- triggers

Hence, the following metadata changes should be frozen during backup operation:

1. Databases being backed-up should not disappear or be changed.
2. In case of "BACKUP DATABASE * ..." new databases should not appear.
3. List of objects inside each database should not change.
4. The objects in the databases should not change (their metadata).
5. The set of privileges for each database should not change.
6. Users for which privileges are stored should not disappear or change.
7. Tablespaces used by tables being backed-up should not disappear or change.

The above can be achieved by blocking the following statements (they will block
more than required, but I believe it is a minimal set for ensuring 1-7).

DROP/ALTER DATABASE               (for 1)
CREATE DATABASE                   (for 2 [b])
CREATE/DROP per-db-object         (for 3)
ALTER/RENAME per-db-object        (for 4)
CREATE/DROP INDEX                 (for 4 [a])
GRANT/REVOKE                      (for 5)
DROP/RENAME USER                  (for 6)
DROP/ALTER TABLESPACE             (for 7)

per-db-object is one of: TABLE, VIEW, FUNCTION, PROCEDURE, EVENT, TRIGGER.

[a] This is because CREATE/DROP INDEX alters the definition of the table.
[b] This could be made optional, to be used only when "BACKUP DATABASE * ..." is
    executed. That can greatly improve user experience since in the typical case
    when specific database list is given for BACKUP, users will be allowed to
    create new databases.

Additionally, these statements must be blocked for technical reasons:

TRUNCATE TABLE  - because it does DROP TABLE internally.
OPTIMIZE TABLE  - because it does ALTER TABLE internally.
REPAIR TABLE - because it can remove/rename files storing physical table data 
              (e.g. for MyISAM tables).

The complete list of statements which need to be blocked by BML is as follows:

  DROP   DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TRIGGER/INDEX
  DROP   USER/TABLESPACE
  CREATE DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TRIGGER/INDEX
  ALTER  DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TABLESPACE
  RENAME TABLE/USER
  GRANT/REVOKE
  TRUNCATE/OPTIMIZE/REPAIR TABLE

DDL blocker rename
==================
To avoid confused objections of the type "why DDL blocker is not blocking this
or that DDL statement" the service will be renamed to "Backup Metadata Lock",
BML in short.

Here is a list of proposed renames (only public members mentioned - for private
members, names along the same lines will be chosen).

ddl_blocker.{h,cc}  -> bml.{h,cc}

DDL_blocker_class -> BML_class
DDL_blocker_class::check_DDL_blocer() -> BML_class::bml_enter()
DDL_blocker_class::end_DDL()          -> BML_class::bml_leave()
DDL_blocker_class::block_DDL()        -> BML_class::bml_get()
DDL_blocker_class::unblock_DDL()      -> BML_class::bml_release()

In si_objects.h:

obs::ddl_blocker_enable()  -> obs::bml_get()
obs::ddl_blocker_disable() -> obs::bml_release()
obs::ddl_blocker_exception_on()  -> obs::bml_exception_on()
obs::ddl_blocker_exception_off() -> obs::bml_exception_off()

Changes in sql_parse.cc
=======================

The infrastructure ensuring blocking correct statements in sql_parse.cc will be 
changed. 

Currently methods  bml_enter() and bml_leave() are called inside the big switch 
in the cases corresponding to the statements which should be blocked.

Instead of this, the statements to be blocked will be marked with a new 
CF_BLOCKED_BY_BML flag:

+
+  /*
+    Mark commands to be blocked by Backup Metadata Lock. See bml.cc.
+  */ 
+  sql_command_flags[SQLCOM_DROP_USER]|=         CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_DB]|=           CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_TABLE]|=        CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_VIEW]|=         CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_FUNCTION]|=     CF_BLOCKED_BY_BML;  
+  sql_command_flags[SQLCOM_DROP_PROCEDURE]|=    CF_BLOCKED_BY_BML;  
+  sql_command_flags[SQLCOM_DROP_EVENT]|=        CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_TRIGGER]|=      CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_DROP_INDEX]|=        CF_BLOCKED_BY_BML;
+
+  sql_command_flags[SQLCOM_CREATE_DB]|=         CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_TABLE]|=      CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_VIEW]|=       CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_SPFUNCTION]|= CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_PROCEDURE]|=  CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_EVENT]|=      CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_TRIGGER]|=    CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_CREATE_INDEX]|=      CF_BLOCKED_BY_BML;
+
+  sql_command_flags[SQLCOM_ALTER_TABLESPACE]|=  CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_DB]|=          CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_DB_UPGRADE]|=  CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_TABLE]|=       CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_FUNCTION]|=    CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_PROCEDURE]|=   CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_ALTER_EVENT]|=       CF_BLOCKED_BY_BML;
+
+  sql_command_flags[SQLCOM_RENAME_USER]|=       CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_RENAME_TABLE]|=      CF_BLOCKED_BY_BML;
+
+  sql_command_flags[SQLCOM_GRANT]|=             CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_REVOKE]|=            CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_REVOKE_ALL]|=        CF_BLOCKED_BY_BML;
+
+  sql_command_flags[SQLCOM_REPAIR]|=            CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_OPTIMIZE]|=          CF_BLOCKED_BY_BML;
+  sql_command_flags[SQLCOM_TRUNCATE]|=          CF_BLOCKED_BY_BML;


Then the calls to bml_enter() and bml_leave() for the marked statements will be 
done inside mysql_execute_command() before and after the big switch, 
respectively.

@@ -1993,6 +2042,15 @@ mysql_execute_command(THD *thd)
   if (opt_implicit_commit(thd, CF_IMPLICT_COMMIT_BEGIN))
     goto error;
 
+  if (is_bml_blocked(lex->sql_command) && 
+      !(bml_registered= BML_instance->bml_enter(thd)))
+    goto error;
+
   switch (lex->sql_command) {
 
   case SQLCOM_SHOW_EVENTS:
...

@@ -4670,6 +4675,10 @@ create_sp_error:
     thd->row_count_func= -1;
 
 finish:
+
+  if (bml_registered)
+    BML_instance->bml_leave();
+
   if (need_start_waiting)
...


Testing strategy
================

After fixing, BML will block 38 statements. To test them all, the current
backup_ddl_blocker test must be refactored as straightforward extension would
make it unmaintainable.

The test runs different DDL statements in parallel with BACKUP/RESTORE command
and checks that the required blocking happens. The main part of the test, which
is executing and synchronizing the statements in several parallel threads will
be moved into a "subroutine" in the form of an include file to be sourced by the
main test. The main test will setup a context for the "subroutine" and source it
several times to test all affected statements.

Few test will be added to check that statements not handled by BML will not be
blocked.