SUMMARY ======== The task of the backup kernel is to create an image of the current state of a server instance or to restore the state from such an image. The backup image contains a snapshot of a (selected elements of) current instance state at some definite point in time. However, not all elements of the state are stored in the backup image but only the ones listed below, which form a so called "backup state". DESTRUCTIVE AND NONDESTRUCTIVE RESTORE ============================================== Note: only destructive restore will be implemented in this WL. The restore operation has two variants: destructive and nondestructive. The first variant completely replaces the current state of an instance with the state stored in the backup image. Nondestructive restore merges the state from the backup image with the current state, changing state of only these components which are stored in the image and leaving other components unchanged. In nondestructive restore content of tables/databases stored in the backup image is restored while any tables/databases present in the current instance but not stored in the image remain untouched. Note that this may result in referential inconsistency as non-restored tables can refer to the restored ones. It is assumed that the backup kernel will always create backup images which can be used for a full, destructive restore but the user can decide whether he wants to do a destructive or nondestructive one. Additional note on nondestructive restore. Docs team please take note for emphasis in RefMan: The problem of potential inconsistency is inherent in the design, which allows a user to backup/restore only selected database(s). This is not related to implementation but to the semantics of such an operation. When restoring a single database and leaving other untouched, we create a situation where some tables are in the restored state (from past) while others are in the current state. Since cross-database references are possible there can be inconsistencies of at least two types: 1. State inconsistency: table A refers to B but A is in a state from time t1 while B is in a state from time t2. Thus we create a global state which has never occurred before. 2. Referential inconsistency: table A refers to B but in the restore process B was deleted or a column in it was removed/retyped. Results in an errornous state. BACKUP STATE ======================= The "backup state" is the part of the instance state which should be stored in a backup image. This is not the whole state since for instance state of currently active threads or state of ongoing (not commited) transactions is not a part of it. The backup state consists of two main parts: 1. the instance metadata, and 2. data stored in tables. The metadata is split into several items listed in WL#3713 -- the main part of it is the structure of backed-up tables. The backup state changes over time so we should speak about a state at a given time t. This state reflects the situation resulting from all transactions which are *committed* at time t. Transactions which are "in progress" do not affect the backup state as defined here. Notes ----- 1. For consistency let us consider statements which are not part of a transaction as transactions consisting of that single statement. 2. Due to limitations in the current XA handling code it is possible to have "partially commited" transactions. This will limit functionality of the backup system as well. 3. The exact interaction of backup and replication subsystems must be thought over -- possibly when some prototype of backup is ready. Validity Point ============== The backup process starts at some time t1 and continues until time t2 producing a backup image which contains backup state at time t1 < t < t2. The time point t is called the validity point of the backup image. After restoring from the image the state will be the same as it was at time t. Backup is considered correct regardless of where its validity point is located between t1 and t2 but users may prefer to have it as close to t1 as possible. Handling of errors =========================== In the first version of the backup kernel, whenever an error is detected: 1. the current operations are canceled, 2. the error is reported, and 3. the normal operation of the instance is resumed. Canceling creation of a backup image does not affect database state -- it continues its operation as if the backup request was never issued. Canceling restore process might result in a changed content of the tables being restored. However, any other tables remain unaffected. The global data like user accounts should also be unchanged. LIMITATIONS =============== - An instance on which restore operation is performed must have the same set of storage engines as the one on which backup was created (this is because backup image is created by individual storage engines and only the engine which created it can restore from it). - A possibility to restore selected databases/tables can lead to referential inconsistencies. This can not be avoided in a situation where some tables are changed and some not. However, backup kernel can detect this and issue warnings. EXTERNAL REQUIREMENTS ============================== 0. Should correctly save and restore backup state as described above. 1. The database should be functional during the backup process as much as possible. Specifically a) storage engines should not be locked, b) individual tables should not be locked, c) it should be possible to process queries (perhaps with some restrictions as no DDL operations), However, it is ok to block operations which refer to data which is currently being restored. 2. Should be possible to use backup image for setting up replication. 3. Format of the image data should be streamable. INTERNAL REQUIREMENTS ================== 4. Possibility to backup only a part of the backup state (selected databases and tables). 5. Possibility to restore only a part of the state saved in the backup image. 6. Possibility to choose between destructive and nondestructive restore (Not in version one.) 7. Extra requirements on the image format: a) Possibility to translate to known backup formats like XBSA (needs to be investigated further what this implies). b) Possibility to analyze and process the image by external tools. This can be provided on different levels: b1. image format is completely closed and can be used only for restore operation. b2. there is some kind of table of contents listing databases and tables stored in the image. b3. the table data is stored in an open format so that it can be understood by external tools c) Data compression. This may influence the image format if we want to be able to extract partial state (selected databases/tables) and still have the data compressed. f) Data consistency checking: a possibility to easily detect that the image is corrupted. 8. Possibility to use backup/restore functionality to initialize replication.
ENGINES CREATE THE BACKUP IMAGE =================================== The main design decision is that the image of table data is created by individual storage engines and not by the backup kernel. The engines are free to choose whatever method they like to create such an image and they are also free to put the image data in a format of their choice (which is mostly opaque to the kernel). The metadata image is created by the backup kernel. Given that a set of tables can be stored on several engines, the main duties of the backup kernel are: - backup/restore metadata, - initialize backup/restore of table data on all involved engines (with correct timing to minimize resource consumption) - ensure that partial backup images from different engines are all synchronized and correspond to the same point in time. - fetch backup images from all engines and put them into global backup image, - upon restore, extract partial images from the global one and feed them to the storage engines. - create correct environment for storage engines to perform backup/restore tasks (supply arguments, create tables, do neccessary global locking etc.) - provide interface to the SQL layer (handle backup related SQL commands, implement backup C API). - detect and react to errors during backup/restore. BACKUP AND RESTORE ALGORITHMS ======================================== These algorithms implement protocols for correct synchronization of several backup images created by individual storage engines. They are described in WL#3569 and WL#3571. FORMAT OF THE PER ENGINE BACKUP IMAGES ======================================== Each storage engine chooses format of the backup image most appropriate for its internal representation of data and the backup method used. However, to support selective restore from a given backup image (only selected tables) the backup image is divided into several "data streams" corresponding to individual tables. Each stream contains data needed to restore one table. There is a special "shared data stream" to which engine can write any data not connected to any particular table. It is completely up to the storage engine how it distributes its backup image among these streams. It is for instance possible that all data will be sent into the shared stream and per table streams will be empty or vice versa. However, it is important to keep in mind that upon restore only the shared stream and the streams corresponding to the tables being restored will be send back to the engine. As an example consider a request for backing-up tables t1, t2 and t3 on some storage engine. It creates backup image consisting of four data streams: #0: the shared data #1: data for table t1 #2: data for table t2 #3: data for table t3 Later, a user wants to restore tables t1 and t3 only. The backup kernel will send to the storage engine streams #0, #1 and #3 but not #2. Hence stream #2 should not contain any data which would be needed to restore t1 or t3. [Lars wants the backup image to consist of objects, e.g. tables, config data, auto_inc state, meta_data, triggers, SP, SF etc. This to make it possible, in future releases after release 1.0, to select what objects to take backup of and what objects to restore.] BACKUP IMAGE FORMAT VERSIONS ================================== A backup image created by a storage engine is labelled with the name of the engine and a version number (obtiained when the image was created). It is a *strict* requirement that the storage engine provides backward compatibility for image formats. This means that if storage engine X supports version v of image format then it *must* be able to restore data from all images labelled by "X" and with versions w <= v. Thus introducing new backup image formats should be done with care. Questions/issues: - For differentiation, should we introduce incompatibile backup image "flavours" (e.g. "community" and "enterprise" backup formats). [Lars thinks not for release 1.0] - Have backup image format names independent from engine names. For instance, the logical backup format created by default algorithms or backup format for many different versioning engines will be engine-independent. [Lars thinks yes, lets make the default format engine agnostic.] - Give a possibility for a storage engine to handle backup images created by a different one. [Lars thinks yes, in those cases when the engine has not made it impossible.] DATA TRANSFER PROTOCOL ============================== This is a protocol used to fetch backup image data from storage engines or send this data to them in a controlled way. Design goals and decisions: - memory for data buffers is allocated by the kernel (reason: safer than allocating by storage engines), - backup server is pulling data from engines: (reason: gives server precise control over speed of data transfer from different engines which is needed for synchronization), - flexibility allowing for creating backup images either in parallel threads or in the main thread of the backup kernel. Also allowing single/multi buffer solutions (reason: more freedom for storage engine implementors, parallelism and multiple buffers can increase efficiency) - no callbacks, kernel polls engines for information (reason: simplicity). Transfer protocol in both directions is based on placing requests for reading/filling data buffers to the storage engine. The buffers are allocated by kernel and the kernel decides about size of the buffer. Engine which internaly manipulates data of different size must repack the data to fit into buffers supplied by the kernel. The requests are processed by storage engine in order in which they arrive. It is up to engine to decide how to process them -- synchronously using the server thread or asynchronously by spawning dedicated thread(s). Backup kernel doesn't know whether engine uses separate threads to process requests or not and is designed to behave well in both scenarios. Requests are identified by pointers to data buffers. Using this identification backup kernel can poll for status of a previously submitted requests. Details of the protocol and its implementation are described in WL#3473. Disclaimer: the fixed buffer size design was specifically requested by Brian who thinks that it is neccessary for efficiency and correct error handling. The implementor (Rafal) does not agree with that opinion and thinks that it is possible and better to allow engines to send/receive chunks of data of variable sizes choosen by the engine. Anyway, Brian solution is being implemented now. Backup functionality that must be provided by storage engines ------------------------------------------------------------- 1. Giving an estimate of the size of a backup image to be produced and of how much of it will be send in the initial phase of the backup synchronization protocol (see WL#3569). 2. Informing about the backup image version used. 3. Creating, upon request, a backup image of given list of tables stored in the engine. The backup data should be split into several data streams as described above. 4. Establish validity point of the backup image using the synchronization protocol (see WL#3569). This requires being able to freeze engine's state, blocking any operations which might change it. 5. Restoring selected tables from a previously created backup image, using the data from streams corresponding to these tables (and the shared data stream) as described above. Backup image formats of any version smaller or equall to that reported in point 2 should be supported. 6. Implementing the above data transfer protocol for backup data transfers from and to the backup kernel. 7. Cancel, upon request, ongoing restore or backup process, clean-up and resume normal operation. "Default" backup ---------------- Some storage engines may not support this API. The server then performs the backup for them. It is planned to use mysqlbackup with full lock of involved tables until a better solution is developed.
Design given in other WLs. -- Lars, 2007-07-05