WL#4901: Ideas for improving the first backup image format.
Affects: Server-6.x — Status: On-Hold — Priority: Medium
This WL collects ideas for improving backup image format which were accumulated over several months of development of MySQL Backup system. The backup image format used in the development was essentially frozen from the time it was first proposed (WL#4063). Although this original design of the format proved to be sufficient for the current functionality of the system, some problems and possible enhancements have been discovered. Since the current version of the system does not support multiple image formats, any changes in the existing format would break full backward compatibility. This is why the format is frozen. However, since MySQL Backup system was not yet officially released, perhaps it is still possible to update the backup image format used by the first release, taking into account the experience gained while developing the system. This can improve the quality of the first release of MySQL Backup system and simplify its further development. Even if it is decided that the image format can not be changed now, the ideas collected here can be used for developing future formats.
Simplify catalog coordinates ============================ Objects saved in backup image are collected in image's catalog and can be identified by catalog coordinates. Currently the coordinates are for global objects: - position in the list of global objects of given type. for tables: - snapshot number, - position in the list of tables belonging to that snapshot. for per-database objects: - database number, - position in the database catalog list. The different treatment of tables and other per-database objects complicates the format and seems to be redundant. A simpler addressing scheme could be used: for global objects: - position in the list of global objects of given type. for per-database objects: - database number, - position in the database catalog list. Thus tables will be treated the same as other per-database objects. There will be no need to split database catalog into separate lists of tables and other objects, but a single list of all objects belonging to a given database could be used. As in the current format, table's snapshot number and its position within the snapshot will be stored in the catalog entry of that table. Simplify metadata section of the image ====================================== Metadata section contains a list of entries, each storing metadata for one of the objects. The order of entries is important, as it ensures correct handling of object dependencies. For certain reasons (support for selective restore of selected databases), this list was arranged as follows: - first metadata for all global items is stored, - then comes metadata for all tables, grouped by database, - finally the metadata for all other objects. Thus the format of the image imposes certain restrictions on the order in which object's metadata is stored. This complicates the code for writing and reading this section of the image while the benefits are doubtful. A much simpler and cleaner approach would be to store metadata for all objects as a single list of entries. The image format would put no restrictions on the order in which metadata is stored - the application which writes the image would be free to arrange them in the most appropriate way. Remove per-table items ====================== Currently, metadata section has a dedicated sub-section for storing metadata for per-table objects. However, this section can not be used, because there is no space in the catalog to store per-table object info. Thus the format could be simplified and confusion avoided by removing this sub-section. If above simplification of metadata section is implemented, this will happen automatically and at the same time, it would be easy to add per-table items later, if needed. Add flags field to summary section ================================== Image header contains flags field. However, the header is written to the stream at the beginning of the backup process and values of some flags can be known only at the end of that process. For example, only after VP we will know if binlog was enabled at that time and image contains valid VP binlog position. The flags which are known only at the end of the process, could be stored in the summary section of the image. A final set of flags would be obtained by bitwise OR of the flags from the header and from the summary. Location of the summary section =============================== Current format allows for storing summary section at the end of backup image, or in the preamble (as indicated by a flag in image header). There are some issues: - Current code does not support writing/reading summary inlined in the preamble. - If supporting inlined summary, perhaps it will be necessary that it has a fixed length, so that a "hole" of known size can be left in the image for storing the summary there. Current format of summary makes it variable length (since we don't know the length of binlog file path). - Even when summary is inlined in the preamble, perhaps a copy should be added at the end of the image. This way the reading code could be simpler because summary would be always present at the end of the image. - Having summary both in the preamble and at the end, the variable size problem could be solved as follows: in the preamble, there will be a fixed space reserved for summary. If some parts of the summary do not fit into that space, this will be indicated by special flag, and reading code could get missing information from the second copy at the end of the image. Because having inlined summary is not essential I (Rafal) would suggest to remove this possibility from the first version of image format. This would agree with the current code which can not use this feature, even if the format theoretically supports it. Add image comment field ======================= It was suggested that it would be good to store a user provided comment in the backup image. This should be a simple extension of the existing format, as image header contains a variable length area for extra data. Thus probably a comment field could be added while maintaining full compatibility with the existing format. Remove group position from binlog coordinates ============================================= Binlog coordinates of the VP are stored in the summary section. Apart from the coordinates of the last binlog even at VP time, there is also space for storing coordinates of the event group to which this event belongs. Storing of event group coordinates is not implemented in the code. It is also not clear if it is necessary or useful to store group coordinates. If it is decided that they are not needed, summary section format could be simplified by removing this field.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.