WL#3568: Online backup: Stream format and server file handling
Affects: Server-6.0
—
Status: Complete
SUMMARY Module to handle the stream format and saving it/loading it to/from disk. DELIVERABLES - Documentation of stream format (in this WL) - Class or similar to handle the save/load by the server NOTE: This WL documents stream format used in the prototype code. For the format of the first release of online backup see WL#4063
This is the format used in backup prototype tree mysql-5.1-backup-prototype. Details of the format can be found in the Doxygen documentation for this tree. Stream model used by backup kernel =================================== The stream read/written by the backup kernel is divided into chunks of variable length. These chunks help to encode/restore structure of the data stored in the stream. Access to the stream is fully sequential. It is not possible to seek or rewind a stream. Application can read bytes from the stream until end of the current data chunk is reached. When this happens, application can proceed to the next chunk and continue reading bytes. Eventually, the end of last chunk will be reached and then application will learn that the stream has ended. Similar, when writing data, bytes are appended to the current data chunk. Upon request, the current chunk can be "closed" and then new chunk is started. Further writes append data to the new chunk. When all data is written, application closes the stream which also closes the last chunk of data. It is not relevant for the following description, how chunks are physically implemented in a particular stream. We leave this design decision for later and just assume that stream used by backup system consists of data chunks as described above. Format of the global backup image ================================== Note: The following description describes logical structure of the data stream produced by backup system. The actual bytes written will depend on particular implementation of the underlying stream. Such implementation can use additional bytes to implement checksums, chunk boundaries and similar artifacts. We leave magic number detection as part of this low-level design. Backup kernel stores the backup image in a stream consisting of consecutive data chunks. This stream consists of four main parts, each of which is further divided into smaller subparts: 1. header 2. catalog 3. metadata image 4. sub-images Header contains global version number and a list of sub-images present in the image. Catalog lists tables stored in each of the sub-images. Metadata image is stored before table data images so that when reading the stream, tables can be created before they are filled with rows. Backup image header ------------------- The header occupies exactly one chunk (the first chunk of the stream). It starts with a backup system version number followed by a list of formats of the sub-images present in the image. +=========================+ | Version number : int4 | +-------------------------+ | format of 1st sub-image | +-------------------------+ | | | ... | | | +-------------------------+ | format of Nth sub-image | +=========================+ Description of a sub-image format doesn't have a fixed length. The exact format of this entry will be decided later. However, we assume application can read the entries one by one. End of the list is detected by reaching end of the stream chunk. Format of sub-image entry: see Image_info::write_description(). Backup image catalog -------------------- Assuming that there are N sub-images in the backup image, catalog consists of N parts, each part containing a list of tables which are stored in the corresponding sub-image. The order of tables in this list is important: position of a table in the list determines which stream of the sub-image contains data for that table. Each table is represented by two strings: database name and table name. To save some space, database names are stored in a separate list, and then only positions inside that list are used when storing table coordinates. Table name is stored as a full string. Thus an entry describing single table consists of two fields: +---------------------+ | db name pos : int2 | | table name : str | +---------------------+ A complete list of tables is stored in two chunks. First chunk is a list of database names and second one contains list of table descriptions. +=================+ | db name 1 : str | | ... | | db name k : str | +=================+ | table descr 1 | +-----------------+ | ... | +-----------------+ | table descr m | +=================+ The catalog consists of 2*N chunks where N is the number of sub-images. There are 2 chunks per sub-image, used to store the table list. Metadata -------- See write_meta_data() and Archive_info::Item::save(). Sub-images ---------- The rest of the stream contains sub-images created by backup drivers. Each sub-image consists of streams (one stream per table + one common stream). Streams consist of data blocks. Each block is stored in a single stream chunk, together with sub-image and stream numbers: +=========================+ | sub-image number : int2 | | stream number : int4 | +-------------------------+ | | | data payload | | | +=========================+ Serialization of basic types ============================= Numbers intX: where X in {2,4,8} fixed length unsigned integers in network transparent format. X is the number of bytes used to store. int: variable length encoded (VLE) numbers (as coded/decoded by ...) Strings str: length encoded strings. Consists of int field containing number of bytes, followed by the bytes of the string. Prototype backup code stores strings as they are, without taking into account the character encoding.
Prototype design for iterative development. No LLD needed. -- Lars
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.