WL#866: Online backup: MyISAM native driver (redo-log algorithm, modify source file, log byte changes)
Affects: Server-6.0
—
Status: Complete
SUMMARY -------- Make MyISAM capable of supporting the Backup API of WL#3169. DESCRIPTION --------- create a backup driver in MyISAM so that MyISAM tables can be backed up online, and restored by the backup API (WL#3169).
Given that the original WL#866 of 2004 had a piece of its logic inside the MySQL table cache, while the backup API's design imposes that the logic be entirely in the engine, a re-design of WL#866 (compared to how it was in 2004) is needed; however big pieces of the original design of WL#866 can be reused. Two variants ============ NO_INDEX -------- = copy only the MYD file (MYI being redundant information) - requires an index rebuild on the backup, which takes a lot of time&disk&CPU on large tables; rebuild can be done at 3 moments: NO_INDEX/a. during backup NO_INDEX/b. after backup, possibly on a spare machine NO_UNDEX/c. during restore. + restored table is optimized (more efficient indexes) FULL_INDEX ---------- = copy the MYD and MYI file - more server-disturbing backup (more copying, more logging in MyISAM) - bigger backup If we call NO_INDEX/b uncommon for many DBAs, at least uncommon for other DBMSs, and NO_INDEX/c unacceptable (that would mean long downtime at restore), then many DBAs will have NO_INDEX/a as only choice, which is at least as much disturbing as FULL_INDEX. Both methods will be coded at the same time. If they will both be accessible to the user and how, is to be decided.
BACKGROUND ---------- As specified by the backup API's design: the MyISAM handlerton receives a list of tables to back up from the upper backup API. The MyISAM handlerton then sends packets of data (containing the backup of requested tables) to the upper backup API. Cooperation between the upper backup API and the MyISAM handlerton ensures that the backup of this handlerton and of other handlertons is of a single point in time (named "validity point"). The upper layer makes sure that no DDL happens on tables to be backed up (note: this must include REPAIR/OPTIMIZE TABLE). Here are proposals for FULL_INDEX, changes to do to have NO_INDEX are written in the end. PROPOSAL FOR BACKUP ------------------- Definition: "backup logging" is that any update to a MyISAM table (data or index file) is logged into a MyISAM-specific log; the catching of updates happens in mi_*() functions and is already partly existing: see option --log-isam; however --log-isam does "logical logging" ("this row inserted") while we need "physical logging" ("these X bytes written at offset Y"). pwrite() write() ftruncate() are the 3 system calls to catch, as well as writes to mmaped memory (--myisam-use-mmap). 1. Make sure all tables (open now or in the future) do backup logging ("begin()" method of WL#3473, calling a new function mi_backup_start_logging_for_tables()) 1a. Old table instances A condition of correctness of online backup is that: after the copy process has started, any update done to a table-to-back-up must be present in the log. A boolean member MYISAM_SHARE::backup_logging is added. All physical writes to the data/index files (my_write, my_pwrite, memcpy to a mmap'ed arean my_chsize) are made to be of this form: @code { write_to_data_or_index_file; if ((atomic read of MYISAM_SHARE::backup_logging)==1) write log record to backup log; } @endcode (for "cached" writes via IO_CACHE (WRITE_CACHE) or via the key cache, post_write callbacks are added to IO_CACHE and to the key cache). mi_backup_start_logging_for_tables() sets MYISAM_SHARE::backup_logging to 1 using an atomic write (include/my_atomic.h). Atomic write happens before or after atomic read above. If before, change will be in the log. If after, it is also after the write_to_data_or_index_file and thus change will be in the copy. So correctness is always guaranteed. Note the importance of checking MYISAM_SHARE::logging always _after_ write_to_data_or_index_file for the reasoning to hold. 1b. New table instances In mi_open(), the name of the table is compared against the list of tables to back up; if it matches, it declares itself ready for backup logging. 2. Write data to the backup ("get_data()" method of WL#3473) Dirty copying of all matching MYD and MYI files is done. Meanwhile, backup logging appends to the log. Upper layer is notified that the copy is done. Copying of the log may start at this point. 3. Create a validity point ("prelock()" method of WL#3473) This requires to wait for all running updates (on matching tables) to finish. Suggestion by Monty: do a LOCK TABLES READ on all matching tables currently open (actually Monty suggested mi_open()+thr_multi_lock(), which is the same as LOCK TABLES READ); for other matching tables (currently closed), if they open, their opener should (in mi_open()) do a thr_lock() on the table and give this thr_lock to the backup thread. Shortcut by Guilhem, before implementing Monty's suggestion: do a LOCK TABLES READ on all matching tables (this has the drawback that it cannot work for a lot of tables, due to OS file descriptor limits; this is why it is not a final solution). "A lot" is not well-defined: a normal Linux can have 200,000 open files. Because this may be a long blocking wait above, prelock() only launches a separate thread which will do this locking. Then the driver will asynchronously monitor the success of of that thread, inside get_data() calls. When finally locks have all been acquired, any remaining cached data (WRITE CACHE, key cache) is flushed to the table (thus goes into the log), backup logging is stopped for all tables and backup log is closed. 4. lock() method of WL#3473 An empty operation as tables have been locked already 4. Finalize ("get_data()" method of WL#3473) After the validity point has been created, copy the rest of the backup log to the upper backup API. Delete the backup log. PROPOSAL FOR RESTORE -------------------- Background: the upper restore API first creates tables via a CREATE TABLE, then asks the engine to restore the table. 1. Restore all data provided by the upper backup API into files (overwriting the empty ones created by CREATE TABLE, but following symbolic links) ("send_data()" method of WL#3473) 2. apply the backup log and delete it ("end()" method of WL#3473) How to do NO_INDEX: - we must still backup the header of the MYI (contains important not-in-MYD information like the latest allocated auto_increment value, the head of the list of deleted rows in the MYD etc); to make it simple we copy 64k as we know that the header always fits in 64k - backup logging must still happen for changes to this header - no need to flush the indexes as the header is not in the key cache - at some moment (NO_INDEX/a NO_INDEX/b or NO_INDEX/c) the backup log is applied and the index is rebuilt. For now we do it at NO_INDEX/c i.e. at restore time. NO_INDEX and FULL_INDEX are both implemented.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.