WL#4832: Improve scalability of binary logging

Affects: Server-5.6   —   Status: Complete   —   Priority: Medium

GOAL
====

In order to ease the development of WL#5223, regular events (e.g.
Rows_log_event, Query_log_event, etc) must be written into a cache and
eventually flushed to the binary log. An incident event that denotes a problem
while using the cache, however, must be written directly into the binary log.

The idea is to not keep a lock on the binary log while encoding events and know
the size of the transaction or statement before actually starting to write it.
Doing so, WL#5223 can be easily implemented.

CONTEXT
=======

In 5.1, the binary log is locked when writing events, even in the case that
the event goes into the cache and not to the binary log. This causes a
significant bottleneck when it comes to the scalability of replication, since
every thread has to acquire the binlog lock for every statement executed.

In 5.5, WL#2687 fixed this behaviour and every DML is either written into the
trx-cache or stmt-cache before being flushed to the binary log upon commit or
rollback. BEGIN, COMMIT, and ROLLBACK events are written together with the
contents of the cache when the cache is flushed to disk. DDL is written directly
to disk without going through a cache.

In 5.6, in order to ease the development of WL#5223, the idea is that, in most
cases, a cache contains all that is written into the binary log: including
BEGIN, COMMIT, ROLLBACK and DDLs. Incident events, however, will still be
written directly into the binary log.


ASSOCIATED BUGS
===============

BUG#42757:
Redundant use of LOCK_log in MYSQL_BIN_LOG::write(Log_event*)

BUG#43362:
Missing implicit commit after statements that commit active transactions
BACKGROUND
==========

The binary log is composed by a stable storage file and two caches, trx-cache
and stmt-cache. The trx-cache stores changes to transactional engines and
statements classified as unsafe. The stmt-cache stores changes to
non-transactional engines.

The binary log is implemented as a storage engine and may receive, for example,
notifications on the drop down of a connection, definition of a savepoint,
commitment of a transaction, etc. We exploit these notifications to implement
the routines that store information into a cache and flushes its content to disk.

Let's assume that a transaction T1 executes statements S1, S2 and then commits, i.e.

  . T1 (BEGIN S1 S2 COMMIT).

T1's execution generates the following calls/notifications to the binary log:

  . WRITE  S1
  . COMMIT S1
  . WRITE  S2
  . COMMIT S2
  . COMMIT T1.

The COMMIT/ROLLBACK per statement is used by the stmt-cache that flushes its
content into the binary log as non-transactional changes must be flushed ahead
of any transaction. Then the COMMIT/ROLLBACK per transaction is used by the
trx-cache to flush its content.

When WRITE is called the event is appended to the appropriated cache. When
COMMIT/ROLLBACK is called, the BEGIN is written into the binary log, the
appropriated cache is flushed to it and the COMMIT is written to it.

Let's assume now that a DDL is executed. In this case, the following
calls/notifications to the binary log are generated:

  . WRITE DDL

When WRITE is called, the event is appended directly into the binary log.
Incident and Rotate events behave similarly.


PROPOSAL
========

. Write BEGIN, COMMMIT/ROLLBACK into the appropriated cache.

When WRITE is called the appropriated cache is initialised, if this was not done
yet. If the cache is empty, the BEGIN is written into the cache and right after
S1. When COMMIT/ROLLBACK is called, the token is written into the appropriated
cache and then it is flushed.

. Write DDLs into the stmt-cache and flush it immediately.

When WRITE is called the appropriated stmt-cache is initialised and the DDL
appended to the cache. Right after the cache is flushed.

. Incident events are written directly into the binary log.

Incident events that denote problems while using the cache are still written
directly into the binary log. Although there is no reason to write Rotate events
directly to the binary log, we keep this behaviour.


KEY POINTS
==========

. DDL could follow the same life-cycle of a DML, i.e. write --> commit/rollback
--> flush. However, it sounds strange to have a commit/rollback associated with
a DDL.

. Some DMLs must be written into the binary log following the life-cycle of a
DML, because COMMIT/ROLLBACK notifications are never generated.

. Before introducing MDL, i.e. Meta Data Locking, DDLs may be written to the
binary log with a wrong order. WL#4986 must carefully check if MDL is properly
working.
PROPOSAL
========

. Refactor the sql/binlog.cc and sql/log_event.cc to achieve the goals.

. Use the possible values to define if an event should go through a cache or not:

    /* 
      If possible the event should use a non-transactional cache before
      being flushed to the binary log. This means that it must be flushed
      right after its correspondent statement is completed.
    */
    EVENT_STMT_CACHE,
    /* 
      The event should use a transactional cache before being flushed to
      the binary log. This means that it must be flushed upon commit or 
      rollback. 
    */
    EVENT_TRANSACTIONAL_CACHE,
    /* 
      The event must be written directly to the binary log without going
      through any cache.
    */
    EVENT_NO_CACHE,

. Use the possible values to define when an event should be flushed to disk:

    /*
      The event must be written to a cache and upon commit or rollback
      written to the binary log.
    */
    EVENT_NORMAL_LOGGING,
    /*
      The event must be written to an empty cache and immediatly written
      to the binary log without waiting for any other event.
    */
    EVENT_IMMEDIATE_LOGGING,
    /*
       If there is a need for different types, introduce them before this.
    */