WL#5493: Binlog crash-safe when master crashed

Affects: Server-5.6   —   Status: Complete

RATIONALE
=========
* To make binlog itself safe from crash. So that the content in the binlog
  can be recovered/commited/rolledback together with the changes to the
  table (transactioal)


CONTEXT
=======
  It's related to BUG#53530, BUG#53527 and WL#5440.
  In the vicinity: WL#1790.
  Look at BUG#52620 too. /Alfranio

  When pushing this WL, the test for WL#5440 should be pushed 
  as well.

COMMENTS
========
By Yoshinori Matsunobu (12/08/2010):

 The following 1-4 failure scenario should be considered.
 +----------------------+-----+-----+-----+-----+-----+
 |                      |  0  |  1  |  2  |  3  |  4  |
 +----------------------+-----+-----+-----+-----+-----+
 | Binlog Event Header  |  o  |  x  |  xx |  o  |  o  |
 | Binlog Data          |  o  |  -  |  -  |  x  |  xx |
 +----------------------+-----+-----+-----+-----+-----+

 o: correctly written (normal behavior)
 x: invalid (partly written)
 xx: wrongly written (written with 0x33, etc)

 0: No error (normal behavior)
 1: Failed to read header
 2: Read header, but the value (i.e. data_len) was incorrect
 3: Succeeded to read header, but failed to read body
 4: Succeeded to read header, but body was incorrect

 If 1-3 happens, current Log_event::read_log_event() return 0 so it's easy to
handle inside MYSQL_BIN_LOG::recover(), without changing binlog format. 
 If 4 happens, Log_event::read_log_event() will return the invalid event so if a
slave catches the event the slave will fail. My understanding is that Andrei's
WL#2540 (Binlog Checksum) will fix this issue.

 Currently three implementation plans can be considered.
   1) Extending Binlog File Header
      Write "actual binary log size" info into binlog file header. When doing
crash recovery, check actual size, and trim binlog if binlog size is larger than
the value. Finally binlog file size will be the actual size maintained in the
binlog header. 
     Cons 1: Replication compatibility is broken because additional info needs
to be added in the binlog header
     Cons 2: Performance will be suffered because writing to two locations is
required per event. When I tested, appending 256bytes could be done 15000
times/sec, but (writing 8 bytes + appending 256bytes) could be done only 10000
times/sec. 
     Cons 3: sync-binlog should be 1. Otherwise binlog size info inside header
is no longer trustful. For example, lots of "successfully written" events might
be trimmed if binlog size info is not written to a disk for a long time.
     Cons 4: It is not necessary an extra write per event. In fact, we should
flush the binlog cache as usual and sync the file, do the extra write and
sync again. The extra activity happens everytime we need to flush the binlog
cache and not when we need to write an event.  


   2) Writing binlog event size info into binlog event header *after* binlog
data is written (From Mats) 
      Use zero for length (or event type) and write this single byte
      after the entire event has been written. This will allow the file
      to be scanned for the first zero length (or type) and decide the
      length of the file based on that.

   3) Using binlog checksum functionality (WL#2540)
      Use checksums on each event and find the first event that does not
      have a correct checksum and decide the length of the file based on
      that.


PROPOSED SOLUTION
=================
Trim the crashed binlog file to last valid transaction or event
(non-transaction) base on binlog size(valid_pos) when MYSQL server
crashed in the middle of binlog. And add a temp file to make binlog
index file to be crash safe.