DESCRIPTION The main backup algorithm. DELIVERABLES - Documentation of backup algorithm (in this WL) - Implementation of synchronization phase of backup NOTES - Sinisa writes 2006-11-01: "lock for all kind of updates" ... But this lock will be fully implemented by new transactional lock, as per Monty's specs. This lock will lock transactional engines in one way and non-transactional in another way. I think we should re-examine WL 3569 in the light of this new development." We need to check with Monty what kind of locks he is working on... Anyone has any reference to a WL? Please add here if you have... LIMITATIONS - Commit blocker is not implemented in this WL. See WL#3956.
The main task of the algorithm is to properly initialize and synchronize creation of local backup images by several storage engines. Engines can use different methods for creating their backup images. These methods can be divided into two broad classes: "At end", or Point Behind methods: Engine scans its data while continuing normal operation. Changes to the data are recorded in a log which is added to the backup image. The validity point of the image can be created only after the data scan is complete. "At begin", or Point Forward methods: Engine can create a snapshot of its data at any given moment and continue to work normally. The snapshot will be sent to backup kernel after its creation. The validity point of the backup image is the moment at which snapshot was created. The backup algorithm treats both methods in a uniform way. From it's point of view a process of creating backup image by a storage engine is divided into three phases: 1. Initial phase in which engine makes preparations before it can create a validity point. For "at begin" engines this phase will be empty as these engines are ready for creating a validity point right away. For "at end" engines this is the data scan phase. 2. Waiting phase. In this phase engine is ready for creating a validity point but it should continue its normal operation since it must wait for other engines to be ready. It is possible that engine (e.g. "at end" type) sends additional log data during that phase. 3. Final phase -- after all engines have created their validity points. For "at end" engines this phase will be empty since they have already sent all their backup image data (they may still need to send small amount of "closing data"). For "at begin" engines this is the phase during which all the backup data will be sent. +- start backup +- ready for creating validity point | | | | +- validity point +- end of backup | | | | |-------------------|------------------|--------------------|---> time initial phase waiting phase final phase In any of the above phases an engine can send data to the backup kernel. All these data will be collected and form the backup image for that engine. As seen already, some engines ("at end") will send majority of the data in the initial phase while other ("ar begin") in the final phase. It might be also necessary to send data in the waiting phase to keep the backup image in sync with the ongoing data changes. Note: in this model it is possible to handle even more general backup schemas in which data is sent in all three phases. As far as the backup kernel is concerned, it doesn't matter what kind of data is sent in each phase of a backup process. These can be logical records, physical data blocks, log entries or any other kind of data used by a particular backup method. What is crucial is that the backup kernel knows when the initial phase ends and engine is ready for creating a validity point. This information is used to synchronize several backup images. Backup image synchronization ============================ The synchronization happens after all engines are finished with the initial phase of their backup process and are ready for creating validity points for their local backup images. Consistency between all the local backup images is achieved by having the backup kernel to decide when the validity points should be created. Backup kernel picks the same moment for all participating engines and the locking mechanism described below ensures that all backup images are valid for exactly the same point in time. For this to work, engines must be able to create validy point of their backup images upon a request from the backup kernel. It is important that: a) The validity point is created instantly, without any waiting. b) The validity point can be created at any time choosen by the kerenel. Requirement a) is important because other engines will be blocked during creation of the validity point and the blocking period should be as short as possible. Reqirement b) is here for the same reason -- backup kernel (and other engines) can not wait until a suitable moment arrives. Guilhem gave example showing why b) is an issue here. In MyISAM engine (and possibly others) it can be important that at the time the validity point is established index files are in sync with data files. Thus there should be no ongoing table updates when one wants to create a validity point. For that reason the engine must do extra preparations (waiting for all ongoing updates to finish and blocking any new updates) before it can satisfy requirement b), that is, be ready for creating a validity point at any given time. Taking into account all these issues, the synchronization protocol which will ensure consistency between different backup images works as follows. 1. Backup kernels ask all engines to prepare for synchronization. The engines should not block during that stage but they should do any preparations they need in order to be ready for instant creation of a validity point in step 2. 2. When all engines signalled that they are ready, backup kernel asks them to create their validity points. Each engine creates it and freezes its current (backup) state so that the validity point remains valid until the end of synchronization process. (In most cases freezing would mean just blocking any commits). 3. When validity points for all images are created, the kernel unfreezes each engine. At that moment the synchronization process is finished. The reason for freezing engines in step 2 is illustrated below: Backup for Backup for engine 1. engine 2. | | | | request to create | | validity point -->+ t1 | : | request to create : | validity point ----------------------->+ t2 : : <--- global validity point continue --------->+ t3 : | : continue ------------------------------>+ t4 | | The local validity points of engines 1 and 2 are at t1 and t2, respectively. If any data changes in engine 1 between time t1 and t2 then its state at t2 will be different from the state at the validity point t1. As a consequence the global image will be inconsistent since it will describe state of engine 2 at time t2 and state of engine 1 at time t1. If the first engine is frozen between time t1 and t2 then the global image is consistent as it describes state of both engines at time t2 (state of engine 1 at t2 is the same as its state at t1). Timing backup image creation ============================ For efficiency reasons it is important that the initial phase of all engines involved in creating a backup image ends at approximately the same time so that no engine waits unnecessarily. For example any "at begin" engines whose initial phase is empty should be initialized only after all the "at end" engines have finished their initial data transfer. As another example consider two "at end" engines such that first of them needs to send 1MB of data in the intial phase and the other one 1TB of data. If both engines are initialized at the same moment, then the first one will have to wait for a long time untill the second one finishes its initial transfer: +- E1 ready for v.p. after sending 1MB of data | | E1 has to wait until E2 finishes its initial transfer E1 |===|----------------------------------------------------------|---- E2 |============================================================|-|---- E2 needs to send 1TB in the initial phase... | +- creating validity point. To minimize the waiting period, backup algorithm initializes backup process on different engines at different times based on the estimates of the amount of data to be sent in the initial phase (if avaliable). In the above example, if the engines report 1MB and 1TB as estimates of their initial data transfer size then the timing will be as follows: E1 initialized after E2 has sent 999MB of data -----+ E1 |===|--|---- E2 |============================================================|-|---- E2 needs to send 1TB in the initial phase... | +- creating validity point. In particular, "at begin" engines will return 0 as the estimate which will result in initializing them *after* any other engines have sent their initial data. Since it might be difficult to estimate size of the data, engine can return UNKNOWN_SIZE as an estimate. Such engine will be assumed to be of "at end" type.
No LLD needed, the most important thing with this WL was to get the algorithm right. -- Lars, 2007-07-05 Note that there has been no push of code assigned to this specific task. Per Lars: "This WL describes the general algorithm, but the code is pushed in other WLs. It was just convenient to document the algorithm in the WL." -- Trudy Pelzer, 2007-07-05