WL#4212: Online Backup : Kernel updates for object metadata changes

Affects: Server-6.0 — Status: Complete

Description
Low Level Design

Implement stability and design improvements for the backup kernel and stream 
library. These changes are necessary to fully support the development of 
WL#3574.

The tasks needed include:
- improve source code organization: put definitions of main classes in separate
   .h/.cc files
- go trough the code and make sure that it is consistent
- replace ad-hoc hacks with better desinged code
- refactor memory allocation to use mem_root for faster deallocation
- refactor data structures to use common server datatypes
- reorganize storage of objects in the ctalogue to improve efficiency of
   iterators (implies refactoring of some iterator implementations)
- prepare Image_info class for dependency handling

Source organization
===================

To clarify the code, a convention will be adapted to keep definition and
implementation of the main classes inside separate header and source files. This
applies to the main classes only:

Image_info	- image_info.{h,cc}
Backup_info	- backup_info.{h,cc}
Restore_info	- restore_info.{h,cc}

The backup/restore context calss Backup_restore_ctx (see below) will be defined
in backup_kernel.h and implemented in kernel.cc. Classes related to Image_info
such as Snapshot_info and all the internal classes will be defined and
implemented in image_info.{h,cc} files.

Header files
============

There are two forms of #include directive:

a) #include 
b) #include "header.h"

Form a) searches for the header in the header file search path specified by
compilation environment. Form b) looks for the header in the directory
containing the file which uses this directive.

This distinction is blurred by modern compilers which use search patch also for
form b) of #include directive. However, it is better to take the distinction
into account into our sources. Therefore the following policy for using the
directive will be adapted:

- If one header includes another header, then the included header should be
inside header search path and #include <...> form should be used.

- Source files should include backup header files using #include "..." form, so
that the version from the current source tree is used.

- If source file includes some non-backup global header file, it should use
#include <...> form. However, headers from sql/ directory are local and should
be included with "include "../sql_header.h" (thus assuming that backup code sits
in a subdir of sql/).

There are some backup header files which are considered global and intended to
be used outside backup tree:

backup_driver.h
backup_kernel.h
backup_stream.h

These files should be included (from other headers) by #include 
directive. From source files they are included by #include "backup_...h"
directive as all other backup headers.

The global headers use some of the other backup header files internally. Thus
the other backup headers must be also present in the header search path.
However, to distinguish them from the global headers, it is assumed that all
other headers are located in a backup/ subdirectory. Thus the local backup
headers should be included from other headers by #include
. As always, from source files all backup headers are
included with #include "local_header.h"


The backup/restore context class
================================

There are several settings and resources which must be created before backup or
restore operation can be performed:

- all DDLs must be blocked
- memory allocator for backup stream library must be initialized
- the backup stream object must be created
- backup/restore operation must be registered so that no other such operation
  can be run
- a catalogue object (Backup/Restore_info) must be created

etc... All these preparations crate a context in which backup or restore
operation can be performed. When the operation is finished or aborted, the
context must be destroyed, reversing the actions done during its preparation.

To support creation and destruction of backup/restore context in a consistent
and safe way, class Backup_restore_ctx will be created with appropriate
constructor and destructor. An instance of this class represents a context
required for performing backup/restore operation. When it is deleted, the
context is removed and all preparations are undone. Since context instance will
be automatically deleted when going out of scope, this removes from the
programmer the burden of remembering to clean-up after backup/restore operation.

Using the backup/restore class, the backup operation will be performed as follows:

{

 Backup_restore_ctx context(thd); // create context instance
 Backup_info *info= context.prepare_for_backup(location); // prepare for backup

 // select objects to backup
 info->add_all_dbs();
 or
 info->add_dbs();

 info->close(); // indicate that selection is done

 context.do_backup(); // perform backup
 
 context.close(); // explicit clean-up

} // if code jumps here, context destructor will do the clean-up automatically

Similar code will be used for restore (bit simpler as we don't support selective
restores yet):

{

 Backup_restore_ctx context(thd); // create context instance
 Restore_info *info= context.prepare_for_restore(location); // prepare for restore

 context.do_restore(); // perform restore
 
 context.close(); // explicit clean-up

} // if code jumps here, context destructor will do the clean-up automatically

The context object does all necessary preparations: it opens the backup stream,
creates Backup/Restore_info instance and also does things like blocking DDLs
etc. It also implements logging services and logs progress of the operation,
reports errors etc. 

Backup engine selection algorithm
=================================

When table is added to backup catalogue, a backup engine must be chosen for it.
This is done inside Backup_info::find_backup_engine() method. 

Backup engines used in the backup process provide backup drivers which create
snapshot of the data stored in the tables handled by that driver. Such snapshot
is described by an instance of Snapshot_info class. A Snapshot_info instance
stores the list of tables which belong to that snapshot and also provides
methods for deciding if a given table can be stored inside that snapshot. 

Notes:

1. Term "snapshot" is also used in the context of REPEATABLE READ isolation
level in the server. When this isolation level is selected, a snapshot of data
is created which is then accessed by SELECT statements inside single
transaction. Since this technique is used in one of the built-in backup engines,
this engine is called "consistent snapshot backup engine" or "CS engine" for
short. This use of term "snapshot" should not be confused with the use in "table
data snapshot" which is a part of backup image as described above.

2. There is 1-1 correspondence between table data snapshots and backup engines.
Each snapshot is created by exactly one engine and each engine creates only one
snapshot.


The algorithm for selecting which snapshot (i.e., which backup engine) will be
used to store given table's data is as follows:

1. If table's storage engine has a native backup engine then this engine is used.

2. Otherwise, iterate over all snapshots created so far and pick the first one
which accepts that table.

The list of created snapshots always contains the snapshots served by the CS and
default backup engines. Therefore any table will be accepted in step 2, as the
last resort by the default engine's snapshot.

Backup_info::snapshots member is the list of snapshots considered in step 2 of
the algorithm. When Backup_info instance is created, the default and CS engine's
snapshots are put on that list. Later, whenever a new Native_snapshot object is
created, it is added to the list. Obviously, the order of snapshots in the list
determines which of them will be selected. This order is as follows:

- all the native snapshots created so far
- the CS backup engine's snapshot
- the default backup engine's snapshot

When algorithm selects a snapshot which was not used before, it is added to the
image's snapshot list using Image_info::add_snapshot() method. At this time a
number is assigned to the snapshot.

Backup_info::native_snapshots member is a map from storage engines to
Native_snapshot instances. It is used in step 1 of the algorithm to see if a
native snapshot for a given storage engine was already created.

All Snapshot_info instances are created and owned by the Backup_info instance.
They are deleted in Backup_info destructor.

Also Restore_info object creates Snapshot_info instances after the list of
snapshots was read from a backup stream. This is done inside bcat_reset()
function which is called by backup stream library after reading backup image's
header. 

Storage for catalogue items
===========================

For each item stored inside the catalogue there is a class whose instance stores
information about that item. All these classes inherit from Image_info::Obj
class (former Image_info::Item). The following classes will be defined (inside
Image_info)

- Db	  for databases
- Table   for tables

(more will be added when WL#4239 is implemented). Instances of these classes are
created when items are added to the catalogue, inside Image_info::add_*()
methods. The Image_info instance owns these objects and is responsible for
deleting them. We will allocate them using memory root so that no explicit
delete is needed.

Warning: using memory root for storing object instances means that their
destructors will not be called. Thus these objects can't use destructors for
their clean-up.

Addressing objects in the catalogue
===================================

Each object stored in the catalogue has assigned a position by which it can be
identified. For example, each database has a number. Given the number of a
database we need to access the corresponding Db object - this is done using
get_db() method:

Image_info::Db *db= info.get_db(3);

To implement this we need to store a mapping from database numbers to Db*
pointers. For that purpose we define Map template. Object of type Map can
store mappings from values of type A to pointers of type B*. For databases we
will use member Image_info::m_dbs of type Map which will store pointers 
to Db objects indexed by database number. The Map class will be implemented
using either HASH or DYNAMIC_ARRAY structure depending on the index type A.

Another way of accessing objects stored in a catalogue is by means of iterators.
There will be different types of iterators for enumerating different kinds of
objects. Currently only two iterators will be implemented:

Image_info::Db_iterator    - to iterate over databases
Image_info::DbObj_iterator - to iterate over tables inside a database