WL#6860: Binlogging XA-prepared transaction
Affects: Server-5.7
—
Status: Complete
This worklog adds support for XA-transactions to replication. An XA-transaction allows the client to participate in the two-phase commit protocol. The state of the XA-transaction being prepared is persisted in the database; i.e., a prepared XA-transaction will survive both a client reconnects or the server restarts. Currently, a prepared XA-transaction will be lost after a client reconnects or the server restarts. Through this worklog, an XA-transaction will be binlogged in two rounds: R1)In the first round at XA-prepare the transaction is logged as prepared but not yet committed. Its state is persisted in the Engine (Innodb). R2)When finally the transaction gets committed (XA-commit) or rolled back (XA-rollback) the second binlogging round completes the whole logging making XA-commit/rollback into the binlog. Therefore XA-transaction causes two GTIDs each assigned to the prepare or the commit part in its round. There may be other transactions (XA or not XA) binlogged in between the XA-prepare part and the XA-commit/rollback part. REFERENCES: - BUG#12161
FR0 The prepared XA transaction (optionally) survives the server restart or client disconnection to cease the former rollback policy at disconnection. This must work with both --skip-log-bin and --log-bin. FR1 Binary logging of XA transaction is done in two phases: a. New replication event type XA-prepare at time of XA PREPARE b. Logging XA COMMIT,ROLLBACK later separately. Notice XA-prepare event is not necessary followed by its XA-COMMIT or ROLLBACK which could cause interleaved binary logging of any two XA transactions. The prepare part of XA and the commit part of XA may appear in different binlog files. FR2 Slave applier must be able to handle multiple interleaved XA transactions, as in sequential so in any parallel mode. FR3 New logging style of XA transaction must be compatible with all existing replication features including binlog format, GTID (on|off), MTS (any kind of scheduler incl the "legacy" sequential), mixed engine transactions with direct or cached DML:s on non-transactional tables, and slave side relay-log repository types. There's no non-functional requirement.
=== Summary === When the server runs with replication (binary log ON) turn on prepared XA transaction had to be rolled back at disconnection. The reason was that even though the prepared transaction could be discovered at the server recovery it's impossible to log the transaction content, at least in formats of @@binlog_format. The limitation is lifted by making the prepared transaction binlogged before disconnection or the server crash takes place. An when there's certainty that the transaction content was logged the final transaction's XA COMMIT or XA ROLLBACK can be logged too. --connection first XA START 'trx'; /* dml operations */ XA END 'trx'; XA PREPARE 'trx'; /* => trx got prepared and binlogged */ --connection second SHOW BINLOG EVENT; /* => must display XA START .. XA PREPARE */ # At this point when either the connection disconnects --disconnect first # OR # # the server "disconnects" by crashing or shutdown # any external connection (after the server restart in the 2nd case) # must be able to find 'trx' in the list # of prepared transactions, and commit or rollback: --connection any_connection XA RECOVER; # must display `trx' XA COMMIT 'trx'; # must commit trx, be logged as a Query-log-even and # return OK === Binary logging and Slave applier extension === Binary logging extension to write prepared XA and its Commit or Rollback decision as separate group of events into the binary log. That makes XA-binlogging possibly interleaving yet without any harm to data consistency after eventual replaying on the slave. The slave applier is taught to deal with XA-prepared group of events and its termination (Commit or Rollback) event. === XA transaction caching and recovery extension === Extension to XA recovery implementation is made in that a connection closing leaves a prepared XA in the transaction cache as well as specially marked in Innodb. Such prepared XA can be discovered and terminated as the user wishes, as well as be handled by the slave applier. === Handlerton interface extension === A new handlerton interface is added up allowing attach and detach a SE "internal" transaction from the server level transaction handle. It is motivated by needs of the slave applier that switches between interleaved XA-transactions first to prepare them and then to commit (or rollback). === Innodb changes === Besides the new handlerton method initialization some augmentment is made to connection close logics in Innodb as well as changes are done to maintain disconnected transaction's state sane to survive the server restart. === User Interface === There no new features added. The user is informed about recovered prepared transaction at the server startup time by existing facilities.
Low-level tasks are dicussed in the order of HLS list. === Binary logging extension === New style of binary logging of XA transaction is done in two rounds, see Ra. and Rb: Ra. New replication event type XA-prepare at time of XA PREPARE When XA-transaction starts executing its first DML operation it registers the binlog hton and initiates the header binlog event which is a Query-log-event of XA-start query. The event naturally contains an encoded XID. binlog_prepare() hton is extended to exectute MYSQL_BIN_LOG::commit() when XA-prepare is handled through trans_xa_prepare(). MYSQL_BIN_LOG::commit() is extended to execute a special XA-prepare branch. The transaction content is sandwiched in between the logged XA-START of Query-log-event and a new XA_PREPARE_LOG_EVENT event. Extension to MYSQL_BIN_LOG::commit() ensures no committing of XA-prepared transaction to the engine. GTID for the XA-prepared transaction is generated according to the regular committing in binlog transaction rules. Rb Logging XA COMMIT,XA ROLLBACK later separately XA COMMIT|XA ROLLBACK query-log-event is generated through binlog_xa_{commit,rollback} which is another extension to binary logging routine. By that point the XA-transaction cache must've been already flushed. The XA-transaction terminal query is logged as a stand-alone query also containing the encoded XID and a GTID value different from the prepared part's GTID. Here is an example of a typical interleaving logging of two XA-transactions: SET GTID_NEXT=gid_1; XA START xid_1; call update_data(); XA PREPARE xid_1; SET GTID_NEXT=gid_2; XA START xid_2; call update_data(); XA PREPARE xid_2; SET GTID_NEXT=gid_3; XA COMMIT xid_1; SET GTID_NEXT=gid_4; XA COMMIT xid_2; XA COMMIT|XA ROLLBACK is singled out into a Query-log-event that is logged only when its XA-prepared transaction part was logged as indicated through the flag of thd->transaction.xid_state.is_binlogged to be raised when XA-prepared indeed has something to log. The flag is checked in either branch (the regular or the recovery) of XA-COMMIT|ROLLBACK logging. At the server restart the flag is set to TRUE 'cos the transaction must've updated data (that's why it left in the engine) and logged XA-prepared part before (disconnection) shutdown/crash. === Slave applier extension === To cope with XA's interleaved logging the slave applier has to be able to switch (see Handlerton extension section for details) from one transaction to another. In MTS case the DB scheduler can't assign XA-commit to another worker, and that is addressed by making XA-commit to depend ot a magic max # of accessed db:s tag which forces synchronization with all other workers prior to execute XA-commit. The clock scheduler does not need any changes. The XA-prepare commit timestamp is guaranteed to be lesser than any XA-commit's possible commit parent that makes XA-commit be schedulable to any worker. The Query-log-event of XA-start is made as legal group starter (starts_group() of log_event.h). The new XA-prepare event plays the group terminal event. That is reflected in trx_boundary_parser. New XA-prepare log event is made inherited from Xid-log-event for few reasons. One of them is to reuse Xid's functionality as replication event group terminal event. The actual dependency is forced to be like the following: binary_log:: Binary_log_event ^ "main"::Log_event | ^ | | binary_log:: "main":: XA_prepare_event Xid_apply_log_event \ / \ / \ / XA_prepare_log_event Here a new Xid_apply_log_event is a common parent with Xid_log_event whose dependecies are changed to Binary_log_event ^ | | Xid_event Xid_apply_log_event \ / \ / \ / Xid_log_event A specific of XA-start is to replace the currently associated engine transaction with a new one that the engine must initiate internally. The old association is preserved inside a new member of Transaction_ctx. At the end of XA-prepare the pre-XA-start assocication is restored. === Handlerton interface extension === A new handlerton interface + /** + Associated with THD engine's native transaction is replaced + with that of the 2nd argument. + The old value is returned through a buffer if non-null pointer + is provided with the 3rd argument. + The method is adapted by XA start and XA prepare handlers to + handle XA tranasaction that is logged as two parts by slave applier. + + This interface concerns engines that are aware of XA transaction. + */ + void (*replace_native_transaction_in_thd)(THD *thd, void *new_trx_arg, + void **ptr_trx_arg); facilitates to the slave applier ability process XA transaction's phases in interleaved manner. The applier executes the XA-prepare and disconnects from the current XA. Another of this applier later will be scheduled with the XA-COMMIT|XA-ROLLBACK to handle which there's been always "external" facility in place. === XA transaction caching, recovery extension === When the transaction is already prepared and binlogged and the client disconnects the server won't destroy Transaction_ctx object (therefore its XID_STATE) associated with the XA transaction, find an "envisioned" block in sql_class.cc -#ifdef ENABLE_WHEN_BINLOG_WILL_BE_ABLE_TO_PREPARE Transaction_ctx object remains in the transaction_cache (see xa.{h,cc}) and gets marked with a special new is_binlogged flag that is raised The flag affects binary logger execution when it processes following XA-COMMIT|XA-ROLLBACK. For that purpose a new function is introduced: +/** + Transaction is marked in the cache as to be recovered. + The method allows to sustain prepared transaction disconnection. + + @param transaction + Pointer to Transaction object that is replaced. + + @return operation result + @retval false success or a cache already contains XID_STATE + for this XID value + @retval true failure +*/ + +bool transaction_cache_unrecover(Transaction_ctx *transaction); Its implemented to reuse logics of transaction_cache_insert_recovery(). Notice similarly to the server layer's XA unrecovering, the engine does not destroy its transaction view (see ha_innodb.cc, connection_close handlerton method changes). The preserved Transaction_ctx object remains to be cross-linked with the prepared innodb transaction. The association is restored at the server restart in which event the server level Transaction_ctx object is reconstructed and inserted into the global Transaction_cache with the new is_binlogged flag raised. === Innodb changes === 1. The new hton method is implemented to clear out or restore THD to trx association. See innodb_replace_trx_in_thd(). 2. trx_disconnect_from_mysql() is added to act what the name suggest which is essentially a lighter cleanup than that of trx_free_in_mysql(). The main use case innobase_close_connection() that starts using the lighter method for TRX_STATE_PREPARED transaction's connection. 3. A new flag to trx_t struct is introduced to give us a way to distinguish between the "cold" transaction recovered in the server recovery time and "warm" one that is accessed via recovery interface at the same server runtime as it was created. A use case for this task is --connection current XA start 'trans'; ... XA-prepare 'trans'; --disconnect current --connect next XA commit 'trans' In particular lock_trx_release_locks() should not trx_sys->n_prepared_recovered_trx-- when seeing such "warm" recoverable trx. In the current patch it's achieved via adding a new member to struct trx_t{ + ulint is_disconnected;/*!< 0=normal transaction, + 1=prepared and disconnected so could + be recovered via xid interface */ to set it to 1 in innodb_replace_trx_in_thd() (the slave applier) and innobase_close_connection() (master side). 4. Notice trx_deregister_from_2pc(trx) at the end of innobase_{commit,rollback}_by_xid might be fixing a bug. 5. trx->will_lock = 0 has to be added to the disconnected prepared trx cleanup to please trx cache.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.