WL#3662: Refactoring: Replication Modules

Affects: Server-5.5 — Status: On-Hold

Description
Dependent Tasks
High Level Architecture

RATIONALE
---------
The replication code is integrated with the server. This work is to
have it as separate modules to make it easier to maintain and to make
it possible to release updates more often than the server.

SOLUTION
--------
The two solutions are to make it either a static module, or a
dynamically loadable module. 

Making it a dynamically loadable module has several advantages and
will still allow the use of the replication module as a statically
linked library, while requiring minor additional work. 

For that reason, it makes sense to go for a dynamically loadable
module directly.

USE CASES
---------
Following are some example solutions that can serve as uses cases.

## Pure row-based replication with side channel for DDL

It is possible to use another transport format than the binary log as we are
using it. For example, it is possible to use a shared disk to handle parts of the
replication, and it could be an advantage to handle the replication of the table
changes using the normal channel, but allow the DDL:s to be executed using a
side channel, such as a shared disk.

## Automatic fail-over replication

In some cases, it might be an advantage to have a separate replication solution
that handles fail over transparantly. Possibly containing customer-specific
solution to detect master failure and finding new masters.

## Group communication protocols for replication

In this case, it is essential to be able to time the execution of the
statements, since replication using group commits delay the execution of
statements to achive a global ordering of the statements. Hence it is not
possible to use a execute-log sequence, but it should rather be a
prepare-execute-log (or prepare-execute-finalize, to be more generic) sequence
to allow the replication machinery to delay the execution of a statement.

# Open issues

We have the following (incomplete) list of open issues:

- The slave side of the code might use threads in an arbitrary manner.
  Do we need to refactor the slave thread handling system to allow
  arbitrary use of threads? Especially consider the case of dynamically
  spawning new threads to parallelize the database update on the slave.

- What use cases do we expect to handle with this worklog?

- Can we use the existing plug-in code, or do we need to extend it?

REFERENCES
----------
See also WL#5377

WL#2582: Handler interface to binary log
WL#2761: MySQL plugin interface
WL#3663: Plug-in interface for slave replication module
WL#3665: Plug-in interface for master replication module

We are planning to create a set of sub tasks to handle this big work. The first
sub task will be refactoring the replication code into separate libraries, which
will be handled by WL#5385. Other tasks will be created and handled later.