WL#4212: Online Backup : Kernel updates for object metadata changes
Affects: Server-6.0
—
Status: Complete
Implement stability and design improvements for the backup kernel and stream library. These changes are necessary to fully support the development of WL#3574. The tasks needed include: - improve source code organization: put definitions of main classes in separate .h/.cc files - go trough the code and make sure that it is consistent - replace ad-hoc hacks with better desinged code - refactor memory allocation to use mem_root for faster deallocation - refactor data structures to use common server datatypes - reorganize storage of objects in the ctalogue to improve efficiency of iterators (implies refactoring of some iterator implementations) - prepare Image_info class for dependency handling
Source organization =================== To clarify the code, a convention will be adapted to keep definition and implementation of the main classes inside separate header and source files. This applies to the main classes only: Image_info - image_info.{h,cc} Backup_info - backup_info.{h,cc} Restore_info - restore_info.{h,cc} The backup/restore context calss Backup_restore_ctx (see below) will be defined in backup_kernel.h and implemented in kernel.cc. Classes related to Image_info such as Snapshot_info and all the internal classes will be defined and implemented in image_info.{h,cc} files. Header files ============ There are two forms of #include directive: a) #includeb) #include "header.h" Form a) searches for the header in the header file search path specified by compilation environment. Form b) looks for the header in the directory containing the file which uses this directive. This distinction is blurred by modern compilers which use search patch also for form b) of #include directive. However, it is better to take the distinction into account into our sources. Therefore the following policy for using the directive will be adapted: - If one header includes another header, then the included header should be inside header search path and #include <...> form should be used. - Source files should include backup header files using #include "..." form, so that the version from the current source tree is used. - If source file includes some non-backup global header file, it should use #include <...> form. However, headers from sql/ directory are local and should be included with "include "../sql_header.h" (thus assuming that backup code sits in a subdir of sql/). There are some backup header files which are considered global and intended to be used outside backup tree: backup_driver.h backup_kernel.h backup_stream.h These files should be included (from other headers) by #include directive. From source files they are included by #include "backup_...h" directive as all other backup headers. The global headers use some of the other backup header files internally. Thus the other backup headers must be also present in the header search path. However, to distinguish them from the global headers, it is assumed that all other headers are located in a backup/ subdirectory. Thus the local backup headers should be included from other headers by #include . As always, from source files all backup headers are included with #include "local_header.h" The backup/restore context class ================================ There are several settings and resources which must be created before backup or restore operation can be performed: - all DDLs must be blocked - memory allocator for backup stream library must be initialized - the backup stream object must be created - backup/restore operation must be registered so that no other such operation can be run - a catalogue object (Backup/Restore_info) must be created etc... All these preparations crate a context in which backup or restore operation can be performed. When the operation is finished or aborted, the context must be destroyed, reversing the actions done during its preparation. To support creation and destruction of backup/restore context in a consistent and safe way, class Backup_restore_ctx will be created with appropriate constructor and destructor. An instance of this class represents a context required for performing backup/restore operation. When it is deleted, the context is removed and all preparations are undone. Since context instance will be automatically deleted when going out of scope, this removes from the programmer the burden of remembering to clean-up after backup/restore operation. Using the backup/restore class, the backup operation will be performed as follows: { Backup_restore_ctx context(thd); // create context instance Backup_info *info= context.prepare_for_backup(location); // prepare for backup // select objects to backup info->add_all_dbs(); or info->add_dbs( ); info->close(); // indicate that selection is done context.do_backup(); // perform backup context.close(); // explicit clean-up } // if code jumps here, context destructor will do the clean-up automatically Similar code will be used for restore (bit simpler as we don't support selective restores yet): { Backup_restore_ctx context(thd); // create context instance Restore_info *info= context.prepare_for_restore(location); // prepare for restore context.do_restore(); // perform restore context.close(); // explicit clean-up } // if code jumps here, context destructor will do the clean-up automatically The context object does all necessary preparations: it opens the backup stream, creates Backup/Restore_info instance and also does things like blocking DDLs etc. It also implements logging services and logs progress of the operation, reports errors etc. Backup engine selection algorithm ================================= When table is added to backup catalogue, a backup engine must be chosen for it. This is done inside Backup_info::find_backup_engine() method. Backup engines used in the backup process provide backup drivers which create snapshot of the data stored in the tables handled by that driver. Such snapshot is described by an instance of Snapshot_info class. A Snapshot_info instance stores the list of tables which belong to that snapshot and also provides methods for deciding if a given table can be stored inside that snapshot. Notes: 1. Term "snapshot" is also used in the context of REPEATABLE READ isolation level in the server. When this isolation level is selected, a snapshot of data is created which is then accessed by SELECT statements inside single transaction. Since this technique is used in one of the built-in backup engines, this engine is called "consistent snapshot backup engine" or "CS engine" for short. This use of term "snapshot" should not be confused with the use in "table data snapshot" which is a part of backup image as described above. 2. There is 1-1 correspondence between table data snapshots and backup engines. Each snapshot is created by exactly one engine and each engine creates only one snapshot. The algorithm for selecting which snapshot (i.e., which backup engine) will be used to store given table's data is as follows: 1. If table's storage engine has a native backup engine then this engine is used. 2. Otherwise, iterate over all snapshots created so far and pick the first one which accepts that table. The list of created snapshots always contains the snapshots served by the CS and default backup engines. Therefore any table will be accepted in step 2, as the last resort by the default engine's snapshot. Backup_info::snapshots member is the list of snapshots considered in step 2 of the algorithm. When Backup_info instance is created, the default and CS engine's snapshots are put on that list. Later, whenever a new Native_snapshot object is created, it is added to the list. Obviously, the order of snapshots in the list determines which of them will be selected. This order is as follows: - all the native snapshots created so far - the CS backup engine's snapshot - the default backup engine's snapshot When algorithm selects a snapshot which was not used before, it is added to the image's snapshot list using Image_info::add_snapshot() method. At this time a number is assigned to the snapshot. Backup_info::native_snapshots member is a map from storage engines to Native_snapshot instances. It is used in step 1 of the algorithm to see if a native snapshot for a given storage engine was already created. All Snapshot_info instances are created and owned by the Backup_info instance. They are deleted in Backup_info destructor. Also Restore_info object creates Snapshot_info instances after the list of snapshots was read from a backup stream. This is done inside bcat_reset() function which is called by backup stream library after reading backup image's header. Storage for catalogue items =========================== For each item stored inside the catalogue there is a class whose instance stores information about that item. All these classes inherit from Image_info::Obj class (former Image_info::Item). The following classes will be defined (inside Image_info) - Db for databases - Table for tables (more will be added when WL#4239 is implemented). Instances of these classes are created when items are added to the catalogue, inside Image_info::add_*() methods. The Image_info instance owns these objects and is responsible for deleting them. We will allocate them using memory root so that no explicit delete is needed. Warning: using memory root for storing object instances means that their destructors will not be called. Thus these objects can't use destructors for their clean-up. Addressing objects in the catalogue =================================== Each object stored in the catalogue has assigned a position by which it can be identified. For example, each database has a number. Given the number of a database we need to access the corresponding Db object - this is done using get_db() method: Image_info::Db *db= info.get_db(3); To implement this we need to store a mapping from database numbers to Db* pointers. For that purpose we define Map template. Object of type Map can store mappings from values of type A to pointers of type B*. For databases we will use member Image_info::m_dbs of type Map
which will store pointers to Db objects indexed by database number. The Map class will be implemented using either HASH or DYNAMIC_ARRAY structure depending on the index type A. Another way of accessing objects stored in a catalogue is by means of iterators. There will be different types of iterators for enumerating different kinds of objects. Currently only two iterators will be implemented: Image_info::Db_iterator - to iterate over databases Image_info::DbObj_iterator - to iterate over tables inside a database
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.