WL#7142: InnoDB: Simplify tablespace discovery during crash recovery
Status: Complete — Priority: Medium
The objective of this worklog is to eliminate the use of the file system as a ‘data dictionary’ during redo log processing (before applying redo log): (1) Do not read the first page of all $datadir/*/*.ibd files (2) Do not check the contents of $datadir/*/*.isl files After these changes, *.isl files (introduced in WL#5980) will still be consulted when opening tables after the redo log has been applied. Also, the existence of *.ibd files will be checked in dict_check_tablespaces_and_store_max_id(). The *.isl files and the remaining scan for *.ibd files would be removed as part of the data dictionary work. The changes (1) and (2) will improve reliability as follows: (3) We will ignore extra *.ibd files that are not attached to the InnoDB instance. For example, if the system crashes before the completion of IMPORT TABLESPACE, there could be files with duplicate space_id that could currently cause trouble. Thanks the MLOG_FILE_NAME redo log records introduced in this worklog, redo log apply can sometimes safely ignore such files, and sometimes issue an error message, telling how to resolve manually. (4) We will not silently discard redo log records if some *.ibd file is missing without the redo log containing a MLOG_FILE_DELETE record. For example, if a file rename went bad, the DBA can manually rename the file and restart crash recovery. In innodb_force_recovery mode, missing *.ibd files will continue to be ignored. (5) Failure scenarios related to inconsistent *.isl files will be eliminated during redo log apply. Redo log records will contain references to *.ibd file names; the *.isl files will only be used after redo log apply when opening tables. This worklog covers changes to InnoDB redo log processing. It can be implemented independently of the Global Data Dictionary. The InnoDB redo log format will be changed as follows: New redo log record types: MLOG_FILE_NAME(space_id, first_page_number, filename): Identifies a data file. MLOG_FILE_RENAME2(space_id, first_page_number, filename, new_filename): Rename. MLOG_CHECKPOINT (1 byte): Indicates the end of log checkpoint activity. At least one MLOG_CHECKPOINT must be present after the latest log checkpoint, or the entire redo log will be ignored. Repurposed redo log record type (no format change): MLOG_FILE_DELETE(space_id, first_page_number, filename): Delete a file. Also, identifies a data file during redo log scan. For future compatibility with multi-file tablespaces, the new redo log records will identify the first page number of each file. The first implementation will write and expect first_page_number=0. All existing file-based redo log records except MLOG_FILE_DELETE will be removed and replaced as follows: MLOG_FILE_CREATE, MLOG_FILE_CREATE2: Replaced with MLOG_FILE_NAME. MLOG_FILE_RENAME(space_id, table, new_table): Replaced with MLOG_FILE_RENAME2.
FR1. Redo log can be applied to any tablespace without having access to the data dictionary contents. FR2. Redo log can be scanned and applied without searching the file system for the location and space_id of all tablespace files upfront. (Only the tablespaces referred to by redo log records since the latest checkpoint will be accessed. Clean tablespaces will not be accessed.) FR3. We will have applied the redo log before recovering transactions.
Redo log application will be completely detached from data dictionary changes. To support the discovery and renaming of files, we will introduce a file-level redo log record MLOG_FILE_NAME(space_id,name). A mini-transaction that has modified any page (space_id,page_no) in a persistent tablespace file must emit an MLOG_FILE_NAME(space_id,name) record for the file before the MLOG_MULTI_REC_END marker, unless a record MLOG_FILE_NAME(space_id,name) was already emitted since the latest redo log checkpoint. The MLOG_FILE_NAME will also be emitted before renaming a file. Based on these records, the redo log scanner will construct a mapping of possible file names for a given space_id since the latest redo log checkpoint. The space_id will be checked by opening the file. Special handling is needed on a redo log checkpoint, because upon completing a redo log checkpoint there may exist buffered log entries for tablespaces that depend on MLOG_FILE_NAME records. Before starting the log header update to signal a checkpoint, we will re-append MLOG_FILE_NAME records for all tablespaces that were modified since the end of the previous checkpoint. Finally, we will append a MLOG_CHECKPOINT marker record and will mark as ‘clean’ those tablespaces that were not modified after our checkpoint. After this, we will append all pending log records to the redo log file, and actually do the checkpoint. NOTE: We cannot re-append MLOG_FILE_NAME records for tablespaces that were dropped since the previous checkpoint. Therefore, the tablespace discovery will also gather information from MLOG_FILE_DELETE records, noting that the tablespace has been deleted. On crash recovery, we will first scan the log up to the MLOG_CHECKPOINT record. If there is no MLOG_CHECKPOINT record, it means that the system crashed before the log checkpoint completed. In this case, no redo log will be applied, and we will recover the system as it was at the checkpoint LSN. There must be only one MLOG_CHECKPOINT record since the latest checkpoint. Seeing multiple MLOG_CHECKPOINT records is a fatal error (corrupted redo log). Until we scan an MLOG_CHECKPOINT log record, we will allow redo log records refer to tablespaces for which MLOG_FILE_NAME or MLOG_FILE_DELETE has not been emitted upfront. Once we see the MLOG_CHECKPOINT record, there must be no missing MLOG_FILE_NAME or MLOG_FILE_DELETE for the records scanned so far. After the occurrence of the MLOG_CHECKPOINT record, every log record that refers to a non-predefined tablespace must be preceded by a corresponding MLOG_FILE_NAME record. (It should not make sense to have redo log records after a MLOG_FILE_DELETE, but we do allow this, ignoring the records.) Long term, InnoDB crash recovery will not apply any file-level redo log records except for MLOG_FILE_RESIZE. MLOG_FILE_RENAME2 records will temporarily be applied. Ultimately, the rollback of RENAME will be handled via the DDL_LOG. MySQL Enterprise Backup will continue to apply the MLOG_FILE_* records, including MLOG_FILE_RENAME2. After apply-log has been executed on a backup copy and MySQL is started on the copy, InnoDB recovery will continue from PHASE 1 step 2 below. If a transaction that renamed tablespaces was rolled back, the apply-log would have performed the renaming based on MLOG_FILE_RENAME2 records, and InnoDB would roll back the renames based on DDL_LOG. Because there are combined DDL+DML transactions (such as CREATE TABLE…SELECT) we will have to do the redo log apply in 3 phases. PHASE 1: Recover the data dictionary and all undo logs. STEP 0: Scan all redo log since the latest checkpoint. This will construct a complete map of space_id↦filename that will be consulted in subsequent redo log apply (STEP 1, STEP 4, STEP 6). If there is no MLOG_CHECKPOINT marker, discard the redo log (all tablespaces should be in clean state as of the checkpoint). If there are missing or duplicate *.ibd files referred to by the redo log, refuse startup. The DBA can delete or rename files, or force recovery (causing redo log for missing tablespaces to be discarded). STEP 1: Apply the redo log on the system tablespaces (or all tablespaces). If not all scanned redo log records fit in memory at once, the log will have to be applied in batches, and it will have to be applied on all tablespaces. NOTE: Currently, STEP 1 will recover all tablespaces, and there is no STEP 3, STEP 4 or STEP 6. STEP 2: Recover all incomplete transactions from undo logs. PHASE 2: Roll back incomplete transactions that performed DDL. STEP 3: (optional, to speed up STEP 4, STEP 6) Drop to-be-dropped tablespaces, and discard the redo log for them. NOTE: This is not implemented yet We could execute this step if the server was killed right after committing DROP TABLE, ALTER TABLE or TRUNCATE TABLE, before the post-commit step that actually frees the space. (For ALTER and TRUNCATE, this refers to the "old copy" of the table.) STEP 4: Start applying the remaining redo log records. Later, this can be in the background and can skip pages that are only modified by DML-only transactions (not modified by any DDL-only or DDL+DML transactions). NOTE: STEP 4 is currently already performed as part of STEP 1. STEP 5: Roll back any incomplete DDL-only or DDL+DML transactions. NOTE: Before Global Data Dictionary, there are no DDL+DML transactions in InnoDB. So, at STEP 5 we currently roll back an incomplete DDL transaction (there can be at most one). STEP 6: Complete the applying of any remaining redo log records. NOTE: STEP 6 is currently already performed as part of STEP 1. NOTE: We could cover all transactions in STEP 4 and STEP 5 above. The reason why we defer the rollback of DML-only transactions is because we can. An existing feature of InnoDB crash recovery is to allow connections as soon as possible, rolling back incomplete non-DDL transactions in a background thread. STEP 7: Apply any operations from the DDL_LOG. NOTE: As of now STEP 7 will drop orphan auxiliary tables for FULLTEXT INDEX, and drop incomplete or delete-marked indexes (index name starting with TEMP_INDEX_PREFIX). PHASE 3: Start non-critical background processes. STEP 8: Start the background tasks on transactions. 8.1 Start the rollback of incomplete DML-only transactions. 8.2 Start the purge of delete-marked records and undo logs. NOTE: Rollback and purge will have to perform READ COMMITTED of the DD tables in order to look up the table definitions by dd.tables.se_private_id. [End of tasks that could be controlled by maintenance mode.] STEP 9: Start accepting connections. EXAMPLE: Recovering from a DDL operation when also "Crash-safe DDL with the global data dictionary" is implemented. ALTER TABLE t ... ALGORITHM=COPY will be internally implemented like BEGIN; CREATE TABLE #sql1; INSERT INTO #sql1 SELECT ... FROM t; RENAME TABLE t TO #sql2, #sql1 TO t; DROP TABLE #sql2; COMMIT; In more detail, the filesystem operations and the DDL_LOG operations be as follows: BEGIN; -- "prepare" of CREATE TABLE #sql1 BEGIN; -- InnoDB-internal subtransaction INSERT INTO ddl_log SET type='DELETE', old_file_name='#sql1.ibd'; COMMIT; DELETE FROM ddl_log WHERE id=LAST_INSERT_ID(); creat("#sql1.ibd"); -- Finally at some point before the COMMIT below, update the data dictionary -- for the CREATE TABLE (no commit yet!) INSERT INTO DD.* VALUES ...; INSERT INTO #sql1 SELECT ... FROM t; -- RENAME TABLE t TO #sql2, #sql1 TO t; -- "backward" logging for the DDL rollback BEGIN; -- InnoDB-internal subtransaction INSERT INTO ddl_log SET type='RENAME', old_file_name='#sql2.ibd', new_file_name='t.ibd'; COMMIT; DELETE FROM ddl_log WHERE id=LAST_INSERT_ID(); write MLOG_FILE_NAME; mtr_commit(); rename("t.ibd", "#sql2.ibd") -- "backward" logging for the DDL rollback BEGIN; INSERT INTO ddl_log SET type='RENAME', old_file_name='t.ibd', new_file_name='#sql1.ibd'; COMMIT; DELETE FROM ddl_log WHERE id=LAST_INSERT_ID(); write MLOG_FILE_NAME; mtr_commit(); rename("#sql1.ibd", "t.ibd") UPDATE DD.* ...; -- "commit" of DROP TABLE #sql2 INSERT INTO ddl_log SET type='DELETE', name='#sql1.ibd'; COMMIT /* marks the DDL operation committed */; "post-commit" of DROP TABLE #sql2: BEGIN; DELETE FROM ddl_log WHERE type='DELETE' AND ...; MLOG_FILE_DELETE(old_space_id, "#sql2.ibd") unlink("#sql2.ibd"); COMMIT; In the redo log, this will generate the following redo log records and file system operations for the old_space_id (t.ibd, which is renamed to #sql2.ibd and then dropped) and new_space_id (#sql1.ibd, which is renamed to t.ibd): MLOG_FILE_NAME(new_space_id, "#sql1.ibd") -- for MEB tablespace discovery ... commit of the DDL_LOG.type='DELETE' write for undoing the below creat("#sql1.ibd") ... optional: some redo log records for the bulk INSERT in new_space_id ... not needed (not even for page allocations) if we flush new_space_id ... before the main COMMIT of the DDL transaction MLOG_FILE_NAME(old_space_id, "t.ibd") -- once after latest log checkpoint MLOG_FILE_NAME(old_space_id, "#sql2.ibd") -- flushed before the rename! ... commit of the DDL_LOG.type='RENAME' write for undoing the below rename("t.ibd", "#sql2.ibd") MLOG_FILE_RENAME2(old_space_id,FROM t.ibd,TO #sql2.ibd); MLOG_FILE_NAME(new_space_id, "#sql1.ibd") -- once after latest log checkpoint MLOG_FILE_NAME(new_space_id, "t.ibd") -- flushed before the rename! ... commit of the DDL_LOG.type='RENAME' write for undoing the below rename("#sql1.ibd", "t.ibd") MLOG_FILE_RENAME2(new_space_id,FROM #sql1.ibd,TO t.ibd); ... commit of the DDL operation ... (deletes above DDL_LOG, writes type='DELETE' for removing old_space_id) MLOG_FILE_DELETE(old_space_id,"#sql2.ibd") unlink("#sql2.ibd") ... commit of the "post-commit" operation Now, let us consider redo log apply (PHASE 1 step 1) with this example. In the above example, InnoDB would first perform the file system operation and then write redo log about it if it succeeded. The Hot Backup in MySQL Enterprise Backup (MEB) must obviously replay all of MLOG_FILE_DELETE MLOG_FILE_RENAME2 because it is creating a copy of the ‘live’ file system. The following records will no longer be written, because also MEB will use MLOG_FILE_NAME for discovering ‘dirty’ tablespaces: MLOG_FILE_CREATE MLOG_FILE_CREATE2 If redo log apply sees a mtr_commit() that included a file operation, it means that the file system operation would already have been performed successfully. So, it is not necessary to replay the file operations in normal InnoDB recovery. After DDL_LOG based recovery is in place, InnoDB redo log apply will scan MLOG_FILE_DELETE and MLOG_FILE_RENAME2 records but will not replay them. (InnoDB will be replaying MLOG_FILE_RENAME2, which replaces MLOG_FILE_RENAME.) NOTE: Because the MLOG_FILE_DELETE and MLOG_FILE_NAME records will be used for reconstructing the space_id→filename mapping, we must emit and flush a MLOG_FILE_DELETE record before attempting to delete a file, even if we do not know yet if the deleting will succeed. This is in preparation for the clean crash recovery semantics introduced in " Crash-safe DDL". In the past, we would emit the MLOG_FILE_DELETE asynchronously some time after deleting a file, and silently ignore redo log that were emitted for missing tablespace files. If the server is killed before the mtr_commit() of a file operation gets flushed to the redo log, the redo log scan might not see the operations, even though they had been performed in the file system (depending on whether file system recovery was needed, and how it works). Currently, InnoDB fails to roll back a RENAME operation in the file system, and it can fail to delete *.ibd files when recovering from a crash during ALTER TABLE. The DDL_LOG based rollback would return the data directory to a consistent state: • In case of creating a file (MLOG_FILE_NAME), the commit of inserting a DDL_LOG.type='DELETE' record for undoing the creation would already have been committed and the commit would have been flushed to the redo log, before we start creating the file. • In case of MLOG_FILE_RENAME2, the commit of inserting a DDL_LOG.type='RENAME' record for undoing the rename would already have been flushed to the redo log, before we start renaming the file. • In case of MLOG_FILE_DELETE, as noted above we will write out the MLOG_FILE_DELETE record before actually deleting. Even if we did not do this, the removal of the DDL_LOG.type='DELETE' record is not committed until after we have deleted the file (and written out a MLOG_FILE_DELETE record). After InnoDB crash recovery step 1 (redo log apply), in InnoDB (as InnoDB will be ignoring the operations MLOG_FILE_RENAME2, MLOG_FILE_DELETE), we will end up with a set of files that may be subject to some ‘rollback’ or ‘roll-forward’ operations. There could be up to one file system operation that was not covered by a MLOG_FILE_RENAME2 record. The operation should be covered by a flushed commit of inserting a corresponding DDL_LOG.type.='RENAME', so that the operation can be rolled back or rolled forward in recovery PHASE 2 step 3 or 7. If we are starting up MySQL on a restored hot backup after --apply-log, we should have a situation that is similar to normal InnoDB crash recovery step 1. In summary, it should not make a difference if PHASE 1 step 1 was executed by MySQL/InnoDB startup, or if the step was avoided because the tablespace files were ‘cleaned’ by MEB --apply-log. Either way, all subsequent recovery steps will lead to a consistent result.
WL#7142 InnoDB: Simplify tablespace discovery during crash recovery When the setting innodb_file_per_table=ON was introduced in MySQL 4.1, InnoDB crash recovery was changed so that the directories will be searched for *.ibd files if any redo needs to be applied. The scanning and opening of all *.ibd files (including ones for which no redo log needs to be applied) can be very slow, especially on deployments that contain a large number of *.ibd files. Furthermore, if we allow a more liberal placement of tablespace files in the file system, we might have to extend the search to an even broader range of directories. This worklog eliminates the *.ibd file scan by guaranteeing the following: If there are redo log records for any non-predefined tablespace, there will also be an MLOG_FILE_NAME record. The InnoDB redo log format will be changed as follows: MLOG_FILE_NAME(space_id, filename): A new redo log record. Replaces MLOG_FILE_CREATE, MLOG_FILE_CREATE2. MLOG_FILE_RENAME2(space_id, old, new): The names will be file names (directory/databasename/tablename.ibd). Replaces MLOG_FILE_RENAME, which used table names (databasename/tablename). NOTE: We will write MLOG_FILE_NAME once since the latest redo log checkpoint. Immediately after a checkpoint, the log may contain some MLOG_FILE_NAME records that were "copied across the checkpoint" and a MLOG_CHECKPOINT marker to signal the end of a checkpoint. On redo log apply during crash recovery, we will scan the log up to three times: Recovery scan 1: Look for the first MLOG_FILE_CHECKPOINT marker since the latest checkpoint. If there is no MLOG_FILE_CHECKPOINT, we will skip the entire log. The data files will correspond to the system state as of the checkpoint. Recovery scan 2: Read the redo log since the latest checkpoint. Copy scanned records to recv_sys->addr_hash, and construct a map of recv_spaces, based on MLOG_FILE_NAME and MLOG_FILE_DELETE records. Before applying the records from recv_sys->addr_hash, we will check if any tablespace files are missing. If there are missing tablespaces, we will refuse to start up, so that the DBA can intervene, for example to manually rename files. This new safeguard of WL#7142 can be disabled by setting innodb_force_recovery. If not all redo log records in recv_sys->addr_hash, we will need a third log scan: Recovery scan 3: Read the redo log since the latest checkpoint. If recv_sys->addr_hash fills up, apply the batch of log records and read a new one. mlog_id_t: Remove MLOG_FILE_CREATE, MLOG_FILE_CREATE2, MLOG_FILE_RENAME. Add MLOG_FILE_NAME, MLOG_FILE_RENAME2, MLOG_CHECKPOINT. MLOG_FILE_FLAG_TEMP: Remove. This was a flag for MLOG_FILE_CREATE*. enum dict_check_t: Remove DICT_CHECK_ALL_LOADED. Crash recovery no longer loads all tablespaces. mtr_t::m_named_space: Associates a tablespace with a mini-transaction. A mini-transaction may be associated with up to one non-predefined tablespace. It may also modify predefined tablespaces for change buffering and undo logging. mtr_t::set_named_space(ulint space): Sets m_named_space. This must be called when a mini-transaction is going to modify a non-predefined tablespace. mtr_t::is_named_space(ulint space): Checks if the mini-transaction is associated with a given tablespace. mtr_write_log_t: Add a parameter for the number of bytes to append. mtr_write_log_t::operator(): Stop appending when the limit is reached. mtr_t::Command::prepare_write(): Write MLOG_FILE_NAME records if needed. This is executed as part of mtr_commit(). To write MLOG_FILE_NAME, we will invoke fil_names_write() for non-predefined persistent tablespaces. After log_mutex_enter(), discard the data appended by fil_names_write() based on the result of fil_names_dirty(). Return the number of bytes to append, instead of a Boolean. 0 means that finish_write() should not be called. mtr_t::Command::finish_write(): Take the number of bytes to append as a parameter. mtr_t::commit_checkpoint(): A special method to emit redo log records to the redo log buffer when the caller already invoked log_mutex_enter(). This is only used by fil_names_clear(). fil_space_t::max_lsn: LSN of the most recent fil_names_write() call, or 0 if the tablespace has not been dirtied since fil_names_clear(). Protected by log_sys->mutex or fil_system->mutex. fil_space_t::named_spaces, fil_system_t::named_spaces: List of tablespaces for which MLOG_FILE_NAME has been written since the latest checkpoint. Protected by fil_system->mutex. recv_sys_t: mlog_checkpoint_lsn: The LSN of the first scanned MLOG_CHECKPOINT record, or 0 if none was read yet. fil_space_create(): If a duplicate tablespace name is found, do not silently free the existing tablespace, but instead return an error. fil_space_free(): Make this an externally callable function, to free a tablespace from the cache when applying MLOG_FILE_DELETE. fil_space_free_low(): Renamed from fil_space_free(). The new wrapper fil_space_free() will acquire fil_system->mutex. fil_op_log_parse_or_replay(): Change the order of parameters. Remove log_flags, and rename parse_only to replay. We no longer attempt to replay log records of a multi-item mini-transaction, unless the MLOG_MULTI_REC_END was seen. fil_delete_tablespace(): Write a MLOG_FILE_DELETE record before attempting to delete the file. fil_rename_tablespace(): Change the function signature. Take old_path, new_name, new_path_in. MLOG_FILE_RENAME2 is logging file names, not table names like MLOG_FILE_RENAME was. Also invoke fil_name_write(). enum fil_load_status: Outcomes of fil_load_single_table_tablespace(). fil_load_single_table_tablespace(): Do not exit on failure. Instead, return a status value to the caller. Also, ignore *.isl files. fil_load_single_table_tablespaces(): Remove. We no longer try to load all *.ibd files. fil_create_new_single_table_tablespace(): Do not write any MLOG_FILE_CREATE or MLOG_FILE_CREATE2. Instead, invoke fil_name_write() to write MLOG_FILE_NAME. fil_mtr_rename_log(): Change the signature. Take dict_table_t instead of names. Take a tmp_name. fil_names_write_low(): Write MLOG_FILE_NAME record(s) for a tablespace. In fil_names_clear(), the fil_space_t will be protected by fil_system->mutex. In fil_names_write(), the fil_space_t will be protected by a buffer-fix on some tablespace pages. fil_names_write(): Look up a tablespace and write MLOG_FILE_NAME record(s). This is speculatively called during mtr_commit() before log_mutex_enter(). fil_names_dirty(): Update space->max_lsn while only holding log_sys->mutex. If max_lsn was 0, add the space to the named_spaces list, and tell the caller not to discard the records that were appended by fil_names_write(). fil_names_clear(): Write MLOG_FILE_NAME records and MLOG_CHECKPOINT on a log checkpoint or at system startup. If do_write=true, writes MLOG_CHECKPOINT even if no MLOG_FILE_NAME was written. Reset those fil_space_t::max_lsn for which fil_names_write() has not been invoked after the checkpoint LSN. Return true to the caller if any redo log was written. fil_op_write_log(): Replace log_flags with first_page_no, and replace table names with file paths. The parameter first_page_no is currently being passed as 0, because we do not have non-predefined multi-file tablespaces yet. fil_name_write(): Write an MLOG_FILE_NAME record for a file. Datafile::open_read_only(): Add the parameter bool strict. Datafile::validate_for_recovery(), Datafile::validate_first_page(): Return DB_TABLESPACE_EXISTS on duplicate space_id. Datafile::init(): Add a variant that takes ownership of the "name", and allows filepath to be initialized. Datafile::shutdown(): Remove a redundant check for m_name!=NULL. free(NULL) is documented as no-op in the C standard. is_predefined_tablespace(): Check if a tablespace is a predefined one (system tablespace, undo tablespace or shared temporary tablespace). enum recv_addr_state: Add RECV_DISCARDED, so that buffered redo log records can be retroactively deleted if an MLOG_FILE_DELETE was later recovered for a tablespace. btr_free_but_not_root(), btr_free_root(): Call fsp_names_write(). btr_cur_ins_lock_and_undo(), btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), btr_cur_update_in_place(), btr_cur_optimistic_update(), btr_cur_pessimistic_update(), btr_cur_del_mark_set_clust_rec_log(), btr_cur_del_mark_set_clust_rec(), btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Call fsp_names_write() after successful locking and undo logging. btr_store_big_rec_extern_fields(), btr_free_externally_stored_field(), row_ins_index_entry_big_rec_func(): Call fsp_names_write(). dict_build_tablespace(), dict_create_index_tree_step(), dict_recreate_index_tree(), fil_reinit_space_header(): Call fsp_names_write(). page_cur_insert_rec_write_log(), page_copy_rec_list_to_created_page_write(), page_cur_delete_rec_write_log(), page_cur_delete_rec(), page_create(): Assert that fsp_names_write() has been called. dict_table_rename_in_cache(): Pass old_path to fil_rename_tablespace(). dict_check_tablespaces_and_store_max_id(): Remove the logic for DICT_CHECK_ALL_LOADED. We could probably remove this entire function, given that the maximum is also stored in the DICT_HDR page. mlog_write_initial_log_record_low(): Replaces mlog_write_initial_log_record_for_file_op(). If some page redo log is being written, assert that fsp_names_write() has been called. log_checkpoint(): Before invoking log_write_up_to(), invoke fil_names_clear() to copy any MLOG_FILE_NAME records across the checkpoint. Flush the log up to the MLOG_CHECKPOINT marker, instead of only up to the checkpoint LSN. Without this step, the log between oldest_lsn and log_sys->lsn would be essentially corrupted (missing MLOG_FILE_NAME records on redo log apply). When the redo log scanner sees the first MLOG_CHECKPOINT since the latest checkpoint, it knows that there must be no missing MLOG_FILE_NAME record for any page operation on a non-predefined tablespace. If the MLOG_CHECKPOINT marker is missing, no redo log will be applied, and the system would be at the state of the checkpoint. log_reserve_and_write_fast(): Do not write MLOG_LSN after a MLOG_CHECKPOINT marker, so that we will not get bogus warnings about the data files being newer than the redo log. fil_name_parse(): New function, to update the recv_spaces map based on MLOG_FILE_NAME and MLOG_FILE_DELETE records during recovery. recv_parse_or_apply_log_rec_body(), recv_parse_log_rec(): Add the parameter "apply". Do not apply file-level redo log records unless the entire mini-transaction has been recovered. Fail if an MLOG_FILE_NAME record is missing for a page-level operation. recv_recover_page_func(): Assert that no LSN is after the latest scanned redo log LSN. recv_parse_log_rec(): Check for some more log corruption. recv_parse_log_recs(): Add a parameter "store_to_hash" to control whether the records should be stored into recv_sys->addr_hash. Add a parameter "apply" to specify whether log records should be applied (apply=false during the first scan for MLOG_CHECKPOINT). Return true if an MLOG_CHECKPOINT record was seen for the first time. Improve DBUG_PRINT output, and detect some more log corruption. recv_scan_log_recs(): Add a parameter "store_to_hash" to control whether the records should be stored into recv_sys->addr_hash. recv_group_scan_log_recs(): Initialize the variables and data structures to begin reading redo log records. Add a parameter "last_phase" that is set when a multi-pass recovery is needed and we are scanning the redo log for a third time. In last_phase, we will invoke recv_apply_hashed_log_recs() to empty recv_sys->addr_hash between passes. If last_phase=false, we would stop filling recv_sys->addr_hash, only processing file-level redo log records. recv_init_crash_recovery(): Split some code into recv_init_crash_recovery_spaces(), to be invoked after the first call to recv_group_scan_log_recs(). recv_recovery_from_checkpoint_start(): Invoke recv_group_scan_log_recs() up to 3 times if needed. After processing all redo log, write an MLOG_CHECKPOINT marker so that in case we will crash before making a checkpoint, the log will be replayed by subsequent crash recovery. checkpoint_now_set(): Avoid an infinite loop in case an MLOG_CHECKPOINT marker is the only thing that was written since the latest checkpoint.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.