WL#9175: Correct recovery of DDL statements/transactions by binary log.
Affects: Server-8.0
—
Status: Complete
With advent of new data-dictionary it becomes possible to package metadata changes and writing binary log into single transaction. We aim to make DDL statements crash-safe by using this feature. However there is a gotcha here. If crash happens after the moment when DDL statement is committed in InnoDB but before binary log is flushed to disk we will get inconsistency between DD and binary log after restart. Notice that for normal DML transactions this problem does not occur because we use 2-phase commit protocol for them and binlog serves as transaction coordinator log in this case. I.e. if transaction is absent from binary log it is discovered as prepared but not yet committed and is rolled back during server restart. The mechanism implementing recovery of DML transactions relies on special type of binary log event Xid_log_event which is not used for DDL statements/transactions. The goal of this task is to implement support for correct recovery for DDL transactions based on the binary log data. This means that we need to use 2-phase commit for such transactions plus add an equivalent of Xid_log_events for them. The worklog limits the crash-recovery support to DDL:s that are 2pc capable. At this point this subclass includes all ACL DDL:s including new CREATE|DROP role.
FR1. The server must be able to recover from crash to commit or rollback every 2pc-capable DDL command that was in progress. Commit is done to commands that had reached Engine-prepared status, and got successfully logged into binary log. Otherwise they are rolled back. FR2. In case (a) GTID is ON, or (b) --relay-log-info-repository=table the slave server must be able to recover from a failure including crash to honor FR1 with GTID of the recoverable DDL:s preserved. Not recoverable DDL:s (a) release their GTID:s or (b) preserve the last executed group position prior the failing query, as it's supposed to upon rollback. Note (a) and (b) are not really disjoint options, any combination of the feature values is allowed. FR3. In case GTID is ON and --skip-log-bin, update to the 'mysql.gtid_executed' table is ensured to a be part of the DDL "auto-commit" transaction. FR4. Cross-replication OLD->NEW and NEW->OLD must be seemless. FR5. Enhanced transactional DDL binary logger does not affect event group marking with (a) logical timestamps for LC MTS and (b) the accessed databases for DB MTS.
HLS/IS Recoverable DDL needs 2PC with a transaction coordinator, such as binlog. The binlog-based recovery is traditional method to tolerate crashes. The gtid-based recovery serves to the slave server or a backup-restoring server. The current WL elaborate both frameworks, which we discuss one by one. *The binlog-based recovery* --------------------------- Prior the WL a DDL query has been logged as Query_log_event and that remains. The query logging is turned to be transactional, very similarly to how takes place with an autocommit DML query, with a difference that there won't any BEGIN nor the terminal Xid_log_event (or Commit query-log-event). The XID info is present in the DDL Query-log-event as a new member of the status var set. XID info is needed only on the "master" side where the recovery procedure is augmented to recongnize such DDL kind of Query_log_event. Old -> New replication is not effected at all. New -> Old works thanks to a general property of the Query-log-event status var:s that are not sensed by the Old slave when the var:s are "new". *The gtid-based recovery* ------------------------- Before this WL the GTID data associated with DDL were never stored into the gtid_executed table transactionally with the DDL operations. This is to change now. When DDL processes to the commit stage, insert into gtid_executed is made as a part of the DDL transaction before the DDL prepares. The insert effectively extends the DDL transaction to not actually require 2pc in this case. Regular DML transaction extension framework was introduced by 'mysql.gtid_executed' table worklog. The current WL makes sure DDL exploit it. *The slave --relay-log-info-repository=table based recovery* ------------------------------------------------------------- When DDL processes to the commit stage and the slave applier is configured with TABLE type for @@global.relay-log-info-repository, the info table gets updated as a part of the DDL transaction. When the applier fails to perform the DDL the info table remains unchanged thanks to transaction rollback, and the slave applier is to restart off the failing event. This must be ensure for both the STS and MTS execution mode.
Find in this section low-level details of implementation of functional requirements. === Xid info association with DDL === The Xid info is recorded into the DDL Query_log_event as a new status var. DDL Query_log_event is turned from demanding the immediate logging to use the transaction cache. In order to avoid caching a terminal Xid_log_event or Commit MYSQL_BIN_LOG::commit() method is refined to check a new THD::binlog_ddl_query member. When its value is not the default that indicates a DDL is being handled and the terminal event recording is skipped. The new THD::binlog_ddl_query is initialized with the default value at THD::THD, changes when the DDL Query gets to binlogging routine to instantiate Query_log_event with the new status var holding xid info, and eventually is reset standardly via THD::reset_for_next_command(). === Binlog-based recovery for DDL === MYSQL_BIN_LOG::recover() employs a "custom" replication event group parser. It gets augmented to recognize a new case when a Query-log-event contains the xid info. No other change to any group parser is done. === Presenting XID info in mysqlbinlog output === It's proposed simply print out a new member along with other attributes that are displayed currently\footnote{% a fancier method can be implemented as well to print out Xid info uniformly like: SET GTID_NEXT=val; Query; SET GTID_NEXT=val; BEGIN; Commit /*Xid*/; === Augmentation of the slave DDL transaction === When --relay-log-info-repository=TABLE the DDL transaction is set to contain an update to the slave info table. This is implemented via arrangement of a new statement to execute at proper time. The augmentation is done only to the 2pc-capable DDL subclass. Other DDL types are continued to be handled by the slave applier as before, that is the info table remains to be updated after the DDL operation has been completed.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.