WL#3303: RBR: Engine-controlled logging format

Affects: Server-5.1   —   Status: Complete   —   Priority: Medium

SUMMARY
-------

Add flags for engine to control the low-level binlog format.
Related to BUG#23051.

REQUIREMENT
-----------

The following requirements are necessary for the implementation to be
considered a success:

- It **shall** be possible for a storage engine to allow or forbid the use of
  row-based replication in replicating a table.
- It **shall** be possible for a storage engine to allow or forbid the use of
  statement-based replication in replicating a table.

Definitions
-----------

- A to-write table is a table that we execute  F_WRLCK or similar on.

The *logging format* is the format used to represent the effects of a statement.
The only available logging formats are *row* and *statement*. In statment
logging format, a statement is written to the binary log. It is usually the
statement that was issued, but in some cases the statement is rewritten. In row
logging format, the rows that are written, updated, or deleted are written to
the binary log.

The *logging mode* is the policy used by the server to decide what logging
format to use for representing a statement. Observe that the logging mode
result in a logging format decision for a statement that is either "row" or
"statement" or a mixture of them (e.g., for the CREATE-SELECT statement).

BACKGROUND
----------

In an ideal world, the logging format for a statement can be decided
by many factors:

1. The user can control logging format at different times.
2. The SQL statement developer can say that his types of statements
   should be replicated by some particular means
3. The Engine developer may decide what his engine can support
4. The DBA can decide that a certain table is better to log 
   in some particular way.

This WL is only for item 3.


RATIONALE
---------

Several storage engines are limited in the way they replicate:

- Federated cannot be replicated by row.
- Blackhole cannot be replicated by row.
- NDB Cluster cannot be replicated by statement

In the current system, the logging format is decided on a per-thread basis, with
a configuration option to decide what is the default. This means that in order
to compile the server with NDB (which can handle only row-based replication), it
is necessary to make that the default for the *entire* server. In a similar
manner, adding a storage engine that only can handle statement-based replication
would then force the default for the entire server to be statement-based.

This clash of interests between storage engines (who should win?), the fact that
we want MIXED to be the default, and that it is a very strange notion that the
default is decided on what storage engines are compiled in, leads to that we
have to add means for each storage engine to decide on the formats that is *can*
handle (not what it prefers to handle, which is a different story).


OPEN ISSUES
-----------

None currently.


IMPLEMENTATION
--------------

1. Add two flags for logging:
   HA_BINLOG_STMT_CAPABLE and HA_BINLOG_ROW_CAPABLE.

2. Let each engine define the flags as it sees fit.


SETTINGS FOR CURRENT ENGINES
-----------------------------

-------------  ---------------------  -------------------------
Engine         Row-logging capable    Statement-logging capable
-------------  ---------------------  -------------------------
ha_ndbcluster  HA_BINLOG_ROW_CAPABLE
ha_archive     HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
ha_blackhole                           HA_BINLOG_STMT_CAPABLE
ha_tina        HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
ha_example     HA_BINLOG_ROW_CAPABLE
ha_federated   HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
ha_heap        HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
ha_myisam      HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
ha_myisammrg   HA_BINLOG_ROW_CAPABLE   HA_BINLOG_STMT_CAPABLE
-------------  ---------------------  -------------------------


Mats notes:
> It involves adding the flags to the handler, and then implement the
> logic that decides which format to use. Part of the information is
> available in the parser (safe/unsafe), part with the handler (the
> flags), so the business logic should be placed parallel with the logic
> for XA

Definitions
===========


Safe and unsafe statement
-------------------------

A statement is *safe* if it is deterministic, otherwise it is *unsafe*. The
decision on what statement are safe or unsafe is by nature done by ad-hoc
methods, hence not exact (neither sufficient nor complete in the logical sense).
This decision is what categorizes the statement as safe or unsafe henceforth.

Some statements are considered as intrinsically unsafe and should therefore be
logged row-based. Some examples of statements considered as unsafe are
[incomplete list /Matz]:
- Statements including UDFs


Engine capabilities and restrictions
------------------------------------

We say that a set of tables are *row logging capable* (or *statement logging
capable*) if they all can be replicated using row-based replication (or
statement-based replication, respectively). We use RLC to denote "row logging
capable" and SLC to denote "statement logging capable".

We say that a set of tables are *row logging restricted* (RLR), if the tables
are row logging capable but not statement logging capable. We define *statement
logging restricted* (SLR) similarly.

Intuitively, if a set of tables is row logging restricted it means that it only
possible to log the statement using row-based logging.


Errors and warnings
-------------------

The following only holds when the binary log is enabled, i.e., when
``SQL_LOG_BIN`` is true.

If a combination of engines entirely rules out the possibility of logging the
statement, an error is given and the statement is not executed. If a user wants
to execute the statement despite this, the binary log has to be disabled by
setting ``SQL_LOG_BIN`` to false.

If a combination of engines allow logging the statement in an unsafe manner
(which always mean as a statement), a warning is given and the statement is
executed and logged to the binary log.
Changes to stuctures/classes
============================

This section describes changes to classes/structures


class handler
-------------

The handler class need to be extended with two new table flags:

- ``HA_BINLOG_ROW_CAPABLE``
- ``HA_BINLOG_STMT_CAPABLE``

The flags are placed in the handler instead of the handlerton since the decision
on the capabilities can be a run-time decision and is not a static feature of
the storage engine. A typical case is the partition engine, which has different
capabilities depending on what storage engine are used as the underlying engine.


Changes to functions
====================

This section describes changes to individual functions.

We say that a set of tables are *row logging capable* (or *statement
logging capable*) if they all have the ``HA_BINLOG_ROW_CAPABLE`` flag
set (or the ``HA_BINLOG_STMT_CAPABLE`` flag set, respectively). We use
RLC to denote "row logging capable" and SLR to denote "statement
logging capable".

We say that a set of tables are *row logging restricted*, if the
tables are row logging capable but not statement logging capable. We
define *statement logging restricted* similarly.

If a set of tables is row logging restricted, it means that it only
possible to log the statement using row-based logging.

A statement is *safe* if it is deterministic, otherwise it is *unsafe*. The
decision on what statement are safe or unsafe is by nature done by ad-hoc
methods, hence not exact (neither sufficient nor complete in the logical sense).
This decision is what categorizes the statement as safe or unsafe henceforth.

mysql_lock_tables()
-------------------

This function shall be extended to also check that all tables locked
are compatible with respect to the use of the flags above.

- When locking tables: the ``THD::current_stmt_binlog_row_based``
  shall be set to do row-based or statement-based replication.

- If the tables being locked are not logging format compatible, an
  error is generated.


.. table:: Decision table 

    =========== ============= === ==== ====================== ======
          Condition                                 Action
    ---------------------------------- -----------------------------
    Safe/unsafe BINLOG_FORMAT RLC SLC  Error/Warning          Log as
    =========== ============= === ==== ====================== ======
    Safe        STATEMENT      N   N   Error: not loggable
    Safe        STATEMENT      N   Y                          STMT
    Safe        STATEMENT      Y   N   Error: not loggable
    Safe        STATEMENT      Y   Y                          STMT
    Safe        MIXED          N   N   Error: not loggable
    Safe        MIXED          N   Y                          STMT
    Safe        MIXED          Y   N                          ROW
    Safe        MIXED          Y   Y                          STMT
    Safe        ROW            N   N   Error: not loggable
    Safe        ROW            N   Y   Error: not loggable
    Safe        ROW            Y   N                          ROW
    Safe        ROW            Y   Y                          ROW
    Unsafe      STATEMENT      N   N   Error: not loggable
    Unsafe      STATEMENT      N   Y   Warning: unsafe        STMT
    Unsafe      STATEMENT      Y   N   Error: not loggable
    Unsafe      STATEMENT      Y   Y   Warning: unsafe        STMT
    Unsafe      MIXED          N   N   Error: not loggable
    Unsafe      MIXED          N   Y   Error: not loggable
    Unsafe      MIXED          Y   N                          ROW
    Unsafe      MIXED          Y   Y                          ROW
    Unsafe      ROW            N   N   Error: not loggable
    Unsafe      ROW            N   Y   Error: not loggable
    Unsafe      ROW            Y   N                          ROW
    Unsafe      ROW            Y   Y                          ROW
    =========== ============= === ==== ====================== ======

Regarding warnings, see WL#3339.