WL#3610: Multi-master upgrade 5.1

Affects: Server-6.0   —   Status: Complete

SUMMARY
-------

The purpose of this worklog is to outline the necessary changes to 
make in order to upgrade a system consisting of several masters 
from version 5.1 to any version after 5.1, or to outline situation 
where this is not possible. This worklog only contain work to do for 5.1.

Note added by Trudy Pelzer, 2006-12-08
Per Rafal Somla, this WL is mostly documentation and specification work. 
Any concrete solution of the problem will be put as a separate WL task.

REQUIREMENT
-----------
Assume that OLD version < NEW version and that the major 
version number of OLD and NEW differ with at most 1.

R1. It should always be possible to replicate OLD -> NEW,
    e.g. 4.1.10 -> 5.0.12  or  4.1.10 -> 4.1.12.

R2. It should always be possible to replicate NEW -> OLD, 
    where NEW is the latest minor version within its 
    major version, e.g. 5.1.20 -> 5.0.12 (if 5.1.20 is the 
    latest 5.1 release).

R1 is old requirement, R2 is new requirement.

Counter-examples to these requirements
--------------------------------------
- BUG#24674 : Not possible to replicate 5.0.20->5.0.19 due to
  new DEFINER clause in SP.


USE CASE
--------

The following use case is due to Jimmy.

(In the description below read s/Cluster/Server/.)

- Replication of Server A (Cluster 5.1) to Cluster B (Slave 5.1)
- Stop Cluster B
- Upgrade Cluster B to 5.2
- Start replication from Cluster A (Master 5.1) to Cluster B (Slave 5.2)
- Wait for Cluster B to 'catch up' with missed changes
- Stop replication at Cluster B
- Promote Cluster B to Master/Active
- All changes should be replicated to Cluster A in case upgrade is a failure
and must be backed-out (Master 5.2 to Slave 5.1)
- Stop replication at Cluster A
- Stop Cluster A
- Upgrade Cluster A to 5.2
- Start replication on Cluster A
- Cluster A (Slave 5.2) and Cluster B (Master 5.2)


------------------------------------------------------------------------------

Version Upgrade Discussion
Replication Meeting, Stockholm, 2 February 2007

Problem
-------

We want to have online upgradeability.

When we upgrade servers, the replication should result in a consistent
copy of the master's data on the slave, i.e. the slave converged to the
master.

In any topology, you can remove a server and change the topology such
that any servers that converged prior to removal can still converge. For
example, the removal of a server from a circular replication topology
cannot result in loss of convergence between the remaining servers.

Terminology
-----------

For this document, versions are defined as different releases of the
server executable with different semantics of the binary log.

Requirements
------------

The slave shall always have the same contents (state) as the master, or the
contents of the slave shall converge to the contents of the master.

In a circular replication topology, one shall be able to upgrade the
servers without taking all of them down.

Since the masters may be used for load-balancing writes, it is necessary to
change the topology to form a smaller circle, i.e., all the remaining servers
shall still move to converge to the same state.

Slave and master shall always be able to communicate. [What is the purpose of
this requirement? /Matz]

No event shall be reapplied to the server. For example, given a circular
replication topology with three servers, if one server is removed, the
other servers shall not apply events that have already been applied.

Implementation Details
----------------------

Record server_id.

Record global position.

Servers maintain a vector of {server_id, binlog_pos}* called a
binlog_vector.

When slave connects to master, it sends it?s binlog_vector to the master
requesting the latest position where binlog of the master converges.
The master returns the latest position of its binlog to the slave.
The slave processes the binlog from the position master.

Possible Solutions
------------------

1. New->Old, downgrade new.
2. Old->New, upgrade new.
3. Use a binlog_vector of all servers in the slave's master replication
   graph containing server_id of the master, binlog position, and slave's
   binlog position).

Solution 1
----------

Later versions of the server can always read older versions of the
binary log.

Later versions of the server can always produce older versions of the
binary log.

An upgraded server needs to have input and output transformers.
The input transformer can stop,


Scenario
--------

Consider servers {A, B, C} where all servers are the same version.
Assume old version can replicate to new version, but new version cannot
replicate to old version.

1. Initial topology is A->B, B->C, C->A.
2. When C is removed, A->B and B->A.
3. When C is upgraded to C?, A->C?. Note that changes in C? are not
replicated.
4. When B is upgraded to B?, B?<->C?.
5. When A is upgraded to A?, A?->B? B?->C? C?->A?.

Problems (Rafal)
----------------

A. Removing a server from replication setup.
1. Avoid re-execution of replication events.
2. When new master->slave connection established, where does replication
start?
B. Upgrading all servers in a replication setup.

Solution for Problem A.1.
-------------------------

Use 'global' replication event positions.

Remember position of last seen event for every replication node.

Don't execute events which were seen previously.

Solution for Problem A.2.
-------------------------

Use binlog_pos vector.

Slave connects to master.

Slave sends binlog_pos vector to the master.

Master searches its binlog for the oldest event that has not been seen
by the slave (via the slave?s binlog_pos vector).

Master begins sending binlog starting from the event identified.

Solution for Problem B (#1)
---------------------------

Use a loop to remove old node and reconnect using solution for A.

Upgrade then reconnect (one direction only).

Solution for Problem B (#2)
---------------------------

Allow new master to talk with old slave.

Rafal's Ideas for Version Numbering
-----------------------------------

There are 2 kinds of version problems:

1) If the binary is totally different and cannot be read (has a major
   change). Solution is to step binlog version.

2) A compatible extension to the format is created and permits an old
   slave to still read the event. How should a new slave know if it can use
   the extension or not? Solutions are:

   a) use the length of the event (current solution),

   b) use minor version number, or

   c) use server version number.

Mats will add a case where these solutions do not completely work.