WL#3568: Online backup: Stream format and server file handling

Affects: Server-6.0 — Status: Complete

Description
High Level Architecture
Low Level Design

SUMMARY
Module to handle the stream format and saving it/loading it to/from disk.

DELIVERABLES
- Documentation of stream format (in this WL)
- Class or similar to handle the save/load by the server 

NOTE: This WL documents stream format used in the prototype code. For the 
format of the first release of online backup see WL#4063

This is the format used in backup prototype tree mysql-5.1-backup-prototype.
Details of the format can be found in the Doxygen documentation for this tree.

Stream model used by backup kernel
===================================
 
The stream read/written by the backup kernel is divided into chunks of variable
length. These chunks help to encode/restore structure of the data stored in the
stream.

Access to the stream is fully sequential. It is not possible to seek or rewind a
stream. Application can read bytes from the stream until end of the current data
chunk is reached. When this happens, application can proceed to the next chunk
and continue reading bytes. Eventually, the end of last chunk will be reached
and then application will learn that the stream has ended.

Similar, when writing data, bytes are appended to the current data chunk. Upon
request, the current chunk can be "closed" and then new chunk is started.
Further writes append data to the new chunk. When all data is written,
application closes the stream which also closes the last chunk of data.

It is not relevant for the following description, how chunks are physically
implemented in a particular stream. We leave this design decision for later and
just assume that stream used by backup system consists of data chunks as
described above.

Format of the global backup image
==================================

Note: The following description describes logical structure of the data stream
produced by backup system. The actual bytes written will depend on particular
implementation of the underlying stream. Such implementation can use additional
bytes to implement checksums, chunk boundaries and similar artifacts. We leave
magic number detection as part of this low-level design. 

Backup kernel stores the backup image in a stream consisting of consecutive data
chunks. This stream consists of four main parts, each of which is further
divided into smaller subparts:

 1. header		
 2. catalog		
 3. metadata image
 4. sub-images

Header contains global version number and a list of sub-images present in the
image. Catalog lists tables stored in each of the sub-images. Metadata image is
stored before table data images so that when reading the stream, tables can be
created before they are filled with rows. 

Backup image header
-------------------

The header occupies exactly one chunk (the first chunk of the stream). It starts
with a backup system version number followed by a list of formats of the
sub-images present in the image. 
 
  +=========================+
  | Version number   : int4 |
  +-------------------------+
  | format of 1st sub-image |
  +-------------------------+
  |                         |
  |          ...            |
  |                         |
  +-------------------------+
  | format of Nth sub-image |
  +=========================+ 

Description of a sub-image format doesn't have a fixed length. The exact format
of this entry will be decided later. However, we assume application can read the
entries one by one. End of the list is detected by reaching end of the stream chunk.

Format of sub-image entry: see Image_info::write_description().

Backup image catalog
--------------------

Assuming that there are N sub-images in the backup image, catalog consists of N
parts, each part containing a list of tables which are stored in the
corresponding sub-image. The order of tables in this list is important: position
of a table in the list determines which stream of the sub-image contains data
for that table. 

Each table is represented by two strings: database name and table name. To save
some space, database names are stored in a separate list, and then only
positions inside that list are used when storing table coordinates. Table name
is stored as a full string. Thus an entry describing single table consists of
two fields: 

  +---------------------+
  | db name pos  : int2 |
  | table name   : str  |
  +---------------------+

A complete list of tables is stored in two chunks. First chunk is a list of
database names and second one contains list of table descriptions.

  +=================+
  | db name 1 : str |
  |      ...        |
  | db name k : str |
  +=================+
  |  table descr 1  |
  +-----------------+
  |      ...        |
  +-----------------+
  |  table descr m  |
  +=================+
 
The catalog consists of 2*N chunks where N is the number of sub-images. There
are 2 chunks per sub-image, used to store the table list.
 
Metadata
--------

See write_meta_data() and Archive_info::Item::save().
  
Sub-images
----------

The rest of the stream contains sub-images created by backup drivers. Each
sub-image consists of streams (one stream per table + one common stream).
Streams consist of data blocks. Each block is stored in a single stream chunk,
together with sub-image and stream numbers:

 +=========================+
 | sub-image number : int2 |
 | stream number    : int4 |
 +-------------------------+
 |                         |
 |      data payload       |
 |                         |
 +=========================+
 
Serialization of basic types
=============================
 
Numbers
 intX: where X in {2,4,8} fixed length unsigned integers in network transparent   
       format. X is the number of bytes used to store.
  int: variable length encoded (VLE) numbers (as coded/decoded by ...)

Strings
  str: length encoded strings. Consists of int field containing number of bytes,
       followed by the bytes of the string.
   
Prototype backup code stores strings as they are, without taking into account
the character encoding.

Prototype design for iterative development.  
No LLD needed.
-- Lars