WL#6056: Read/write layers in replication

Affects: Server-Prototype Only — Status: Un-Assigned

Description
High Level Architecture

The goal of this worklog is to create uniform interfaces for reading and writing 
binary log data. This is a step that will allow very simple implementation of the 
following features:
 - master with no binary log (for sync or semi-sync replication) and/or slave
   with no relay log
 - master uses table for binary log and/or slave uses table for relay log
 - compress binary log and/or relay log
 - apply filters at any place in the replication chain (on master before/after
   writing to binlog, on slave before/after writing to relay log, or in
   mysqlbinlog)
 - implement checksums in a much cleaner and efficient way
Without this worklog, each of the above features would be big, difficult and 
error-prone, as the features interact in complicated ways. With this worklog, each 
feature will be relatively easy to implement and there will be no cross-
dependencies between the features.

Other similar features are likely to benefit from this worklog in similar ways.

======== BACKGROUND ========

Transactions are read and written in five places: client threads, dump threads, 
IO thread, SQL thread, and mysqlbinlog. More precisely:

 - client that is going to write to the binary log "reads" transaction from 
server (i.e., generates a set of events from the THD object)
 - client writes to binary log
 - dump thread reads from binary log
 - dump thread writes to net
 - IO thread reads from net
 - IO thread writes to relay log
 - SQL thread reads from relay log
 - SQL thread "writes" to server (i.e., executes event)
 - mysqlbinlog reads from binary log, relay log, or net
 - mysqlbinlog "writes" transaction as text

Currently, each of these five places has its own ad-hoc code for encoding and 
decoding transactions, iterate over transactions, retrieve information from 
transactions, etc.

We will propose interfaces that are the same for all "read" operations 
respectively all "write" operations.

======== PROPOSAL ========

Add two layers: layer 0 and layer 1.

-------- Layer 0 --------

In layer 0, a transaction has the form of a 2-tuple:

  (length, data), where:

 - length is an integer
 - data is a sequence of 'length' bytes that contains both transaction data 
(e.g. the statement) and stream meta-data (e.g., global transaction ID).

Layer 0 is used as follows:
 - The binary log is implemented as a back-end capable of serializing and de-
serializing a stream of such tuples on disk.
 - The relay log uses the same back-end as the binary log, only plugged in at 
another place in the code.
 - For the dump thread and IO thread, we implement a 'net' component. The net 
component is a back-end capable of serializing and de-serializing a stream of 2-
tuples to/from a client connection.
 - Checksumming is trivial to implement as an operation on such 2-tuples.
 - Compression is trivial to implement as an operation on such 2-tuples.

-------- Layer 1 --------

In layer 1, a transaction has the form of a 4-tuple:

  (length, data, ID, timestamp), where:

 - length is an integer
 - data is a sequence of 'length' bytes that contains transaction data (e.g. the 
statement)
 - ID is the global transaction identifier for the transaction
 - timestamp is the time when the transaction was applied on the nearest 
upstream server (i.e. this server if the tuple appears in a binary log, the 
master if the tuple appears in a relay log, the slave if the tuple appears in 
the binary log of the slave, etc)

Thus, layer 1 has separated stream meta-data from transaction data.

Layer 1 is used as follows:
 - There will be a canonical way to encode 4-tuples in layer 1 into 2-tuples in 
layer 0.
 - The dump thread de-serializes the binary log into layer 0 data; then decodes 
the data into layer 1 data; then filters out the transactions with IDs that the 
slave requested not to get; then encodes the remaining transactions into layer 0 
data; then serializes them to the net.
 - The IO thread de-serializes the net into a stream of layer 0 data; then 
serializes the layer 0 data into the relay log
 - The SQL thread de-serializes the relay log into layer 0 data; then decodes 
the data into layer 1 data; then applies the data to the server.
 - mysqlbinlog de-serializes transactions from the binary log into layer 0; then 
decodes the data into layer 1; then filters out transactions with IDs or 
timestamps that should be filtered out according to command-line arguments; then 
encodes into layer 0; then serializes as text.
 - To implement binlog-free master, make the dump thread read directly from the 
server instead of from the binary log (details need to be sorted out carefully 
here).
 - To implement relay log-free slave, make the SQL thread read directly from the 
net instead of from the relay log.

======== Open questions ========

The precise interfaces to read and write to each layer remains to be done. 
Considerations to take into account:
 - must support parallel binlog write
 - must support waiting or not waiting for ack (for sync and semisync to work)