WL#2540: Replication event checksums

Affects: Server-5.6   —   Status: Complete

SITUATION
=========

Events are written to the binary log as they are executed. They are
then sent to the slave on an event-by-event basis either through a
socket or over a network.


RATIONALE
=========

This is some fundamental validity checking that would check that
replication works correctly.

It would make it much more clear to our customers when the replication
failure is due to network/disk/memory failures and when the failure is
due to bugs in the servers.  See amazingly long list below for
potentially previously affected customers.

WISHFUL REQUIREMENT
===================

After this is implemented, it should (preferingly) not be possible 
to ever crash the slave due to corrupted events.  To make this happen,
one could go through all values in all events and check that they can't
be illegal.  This could possibly be a second patch.

PROBLEM
=======

Some customers get very strange replication failures and 
it is impossible to know what causes them.  Sinisa says
that they could be "network problems" (e.g. CSC#4792).

The failures causes the slave to corrupt the data rather than stop,
which would be the appropriate action.

Some reported incidents (please update these once the patch is pushed):

- BUG#25737
- BUG#26123
- BUG#27048
- BUG#23619
- BUG#29309
- http://forums.mysql.com/read.php?26,148423,148423
- BUG#22889
- BUG#5116
- BUG#38718

The WL is expecting fixes from BUG#49741 to make tests to pass
with the new options supplied to the servers.

SUMMARY
=======

Add checksum to binary log events.

SINISA WRITES (2005-04-13, CSC#4792):
This points to the missing reliability features, that SHOULD be
implemented in 5.0.

* checksum stored in binary and relay log to check for RAM / disk
  corruption. 

* checksum sent to slave for each event to check for network
  corruption. 


TYPE OF CHECKSUM
================

Alternatives:

  1. 8-12 bit checksum
     PRO: Can use the flags part of common event header 
          and thus not need to change binary log format
     CON: has higher than CRC32 probability of undetectable error
 
  2. CRC32
     PRO: Standard, implemented with multiple alg:s.
          Generally computational inexpensive
 
  3. SHA or MD5
     PRO: Can also be used for authentication of events

     CON: Are very long (at least 128 bits)
          Computationally expensive.

Mats and Lars are currently considering 32 bit checksum 
(described in ISO 3309).


REPAIRING EVENTS
================

- Should the checksum be capable of also repairing the event?
  PRO: That would be really nice, especially if the mysqlbinlog 
       client can be extended to do repairs of binary logs
  CON: Then we can't use the bits for authentication

Mats and Lars thinks that we can have the type of the checksum in the
format_description log event (or implicitly as the binary log version).
This so that we, in the future, can change to an error correcting 
checksum.


SUGGESTED SOLUTION (By Mats)
============================

To ensure the integrity of each event arriving to the slave, a checksum should
be added to each event.  This allows the slave to check that the event was
transmitted correctly and written correctly to the relay log. If it was not, the
slave can stop indicating an error rather than trying to process the event.

This is particularly important when using row-based replication, since subtle
transmission errors can be applied without any form of error.

In my opinion, we should focus on the integrity of the events, and ignore issues
that relates to authenticity, since methods for handling that are
computationally expensive, and can be achieved through other means (e.g., using
SSL). Note that the suggestion below (using SSL) does not solve the issue of
corruption in the binary log or in the relay log: it just handled the corruption
during transfer of the event.


INTERESTING WORK-AROUND (ANDREI'S TEXT)
=======================================

From reading about SSL's features [1] and simulating corrupted packets
on slave via changing data just after libc's recv function returns the
buffer we can conclude that this idea (using SSL to create and verify
a checksum) should work.

Of course I can not check all the possible situations, for that we
would need to study algorithms that are used.  Almost obviously the
alg of encryption in ssl take care of a checksum.

Reference: 

    [1] 5.1 Manual, 5.8.7.1. Basic SSL Concepts

    SSL is a protocol that uses different encryption algorithms to
    ensure that data received over a public network can be trusted. It
    has mechanisms to detect any data change, loss, or replay

Since SSL is not used for every connection, we still need to consider per-event
checksums.


OPEN ISSUES
===========

1. How about only having checksums at commit time?  How would that work?
   What about non-trx tables, these would probably need a 
   checksum in that event.  

   Probably the solution is that *whenever* something is committed, 
   the checksum is needed, be it a statement (due to autocommit), 
   a non-transactional table update, or a real transaction.