WL#3473: Online backup: Backup engine API

Affects: Server-6.0   —   Status: Complete   —   Priority: Low

Specification of an interface (API) used by backup kernel to communicate with
storage engines supporting on-line backup.

The interface is used to implement backup and restore algorithms described in
WL#3569 and WL#3571 respectively. The design and functionality of backup kernel
is described in WL#3169.
The backup related functionality is encapsulated in a "backup engine" object. 

A storage engine which supports backup should return a pointer to a
backup engine object. The handlerton structure is extended with function
get_backup_engine which should create a backup engine instance and return a
pointer to it (if on-line backup is supported).

Whenever backup kernel wants to backup given list of tables it will ask backup
engine for a backup driver. This is an object representing the process of
creating backup image of the tables. Backup kernel calls methods of the driver
object to control the process and to fetch the backup image data.

Similar, when backup kernel wants to restore tables stored in the engine, it
will ask backup engine for a restore driver. Then it will call methods of the
driver sending to it backup image data and controlling the restore process.

[Only destructive restore will be implemented in this WL.]

Currently, two different types of restore operation are considered
(see WL#3169):
 a) full, destructive restore which replaces all data currently in the instance
    with data from the image.
 b) partial, nondestructive restore in which schemas/tables not being restored 
    are unchanged.
A restore driver might need to perform different operations (clean-up) depending
on what type of restore it is performing. Therefore, when backup kernel asks for
restore driver it informs about the type of restore operation to be performed.

For symmetry and also to give more information to backup implementation, 
similar types for backup operation are provided. 

When creating a backup driver, the kernel will pass information 
about type of backup which (currently) can be also : 

 a) full backup, meaning that all stored tables are being backed-up.
 b) partial backup, when only selected schemas/tables are backed-up.

Whatever type of backup/restore is performed, backup kernel will always 
provide a complete list of tables to be backed-up or restored. 
So, if not needed, a backup/restore driver can ignore the type information.

Robin Schumacher's document, however, states that
we'll support the following types of backups in the initial version:
- backups of an entire MySQL instance
- backups of up selected schemas/databases
- backups of a single schema/database
  (Support for differential backups, incremental backups and table-level 
  backups isn't required in version one.) 

These different kinds of backup will be translated by the backup kernel into
'get_backup' requests with appropriate list of tables given as an argument:

- all tables stored in the engine or
- all tables from a given schema(s)/database(s) stored in the engine or
- tables selected by user which are stored in the engine (not supported in 
  version 1).

Methods of backup engine object

- version

  Input: None

  Output: Version number of the backup image format used by this engine 
          (starting from 1). 

- get_backup

  Input: Type of backup (full or partial), list of tables to be backed-up.
  Output: Backup driver responsible for creating a backup image of 
          the given tables. 

- get_restore

  Input: Type of restore (destructive or nondestructive), list of tables to be 
         restored and version number of the backup image.
  Output: Restore object responsible for restoring the given tables. 

Note: The list of tables passed to get_backup or get_restore is always complete,
regardless of the type of backup/restore operation.

Note: A backup engine *must* support restore from backup images of all versions
smaller or equal to the one returned by version().

Note: Backup kernel will block all DDL operations during the whole backup
process (at least in version 1). Since some engines (NDB) can manipulate tables
outside the normal server execution path, these engines should participate in
DDL blocking. For that reason it is required that when backup engine returns a
backup or restore driver it should restrain from any DDL operations (if it can
do them) until these drivers are released by calling their free() methods. In
other words the DDL blocking interval is between a call to get_backup() or
get_rstore() and a call to drivers free() method.

Unresolved issue: while restoring metadata, backup kernel might need to perform
DDL operations (e.g. create empty tables). This might conflct with the above
requirement that DDL should be blocked on storage engine level.

Methods of backup driver
[The exact method calls are subject to change during the 
implementation phase. /Lars]

A typical interaction between backup kernel and a single backup driver will look
as follows (see WL#3569 for description of different phases of the backup

// Preparation

driver->size();      // get estimate of the total size of backup image

driver->init_size(); // get estimate of the amount of data to be sent in the 
                     // initial phase of backup (can be 0) 

driver->begin();     // From now on driver should be ready for get_data()      
                     // requests.

// Initial data transfer

driver->get_data(buf);  //  Kernel polls data from driver. 
...                     //  Driver signals that it is done with the
driver->get_data(buf);  //  initial phase by return value of get_data() method.

// Backup image synchronization.

driver->prelock();     // Kernel ask all drivers to prepare for lock() call 
                       // below.

driver->get_data(buf); // If needed, further data is polled until driver signals
                       // that it is ready for locking.

driver->lock();        // This is a request to create local "validity point" of 
                       // driver's backup image. The engine should be frozen so 
                       // that this validity point remains valid while other 
                       // engines process their lock() requests.
		       // Important! this call should return as fast as possible
                       // - no longer than few seconds.

driver->unlock();      // This method is called after all engines have been 
                       // locked and the global validity point established. The 
                       // engine can be unlocked and accept further data 
                       // updates.

// Final data transfer

driver->get_data(buf);   // Kernel polls for further data until driver signals 
...                      // that there is no more left.
driver->get_data(buf);   //

// Epilogue

driver->end();  // End of backup process -- driver can do additional cleaning.

driver->free(); // Free allocated resources (can delete the driver).

During this interaction the driver will be in one of these states:

1. idle    : after creation but before the begin() call.
2. init    : after begin(), during initial data transfer.
3. waiting : when initial data transfer is over but prelock() call
             was not yet made.
4. preparing : after prelock() call, preparation (if any) for synchronization
5. ready   : when driver is ready for synchronization before the lock() call.
6. locked  : after creating the validity point and freezing state.
7. final   : after unlock() call, during the final data transfer.

Methods of restore driver

Upon restore, backup kernel will call the following methods of a restore driver:

driver->begin();	// Prepare for restore process.

driver->send_data(buf); // Kernel sends backup image data to the driver.
...                     //
driver->send_data(buf); //

driver->end();		// When all data is sent, this method is called to 
                        // finalize restore process.

driver->free();         // Free allocated resources.

The backup kernel will drop and re-create all tables which are to be restored
before using the driver. Thus restore driver can assume that at the time it is
processing send_data() requests the tables exist and are empty. Also, all
modifications of the tables being restored will be blocked on the kernel level
during whole restore process.

Issue: Additional call may be needed when doing a destructive restore. This call
will tell the engine to wipe-out all its data and will be made after
driver->begin() and before first driver->send_data().
See doxygen documentation of the classes defined in backup_engine.h file at