WL#2687: Write non-transactional binlog events directly to binary log
Affects: Server-5.5
—
Status: Complete
Rationale ========= Mixing transactional and non-transactional tables is not supported by the current code-base and may lead to inconsistencies between the master and slaves. In a nutshell, the problem stems from the fact that when mixing transactional and non-transactional changes in in a transaction, it is often the case that the non-transactional changes are not written to the binary log when they are made visible, but rather are delayed and written together with the transactional changes when the transaction commits. For instance, consider the following statements and sessions (connections): con1: BEGIN; con1: INSERT INTO innodb_t values (1); con1: UPDATE myisam_t SET a = 10; con2: BEGIN; con2: DELETE FROM myisam_t; con2: COMMIT; con1: COMMIT; The binary log produced would look like this: con2: BEGIN; con2: DELETE FROM myisam_t; con2: COMMIT; con1: BEGIN; con1: INSERT INTO innodb_t values (1); con1: UPDATE myisam_t SET a = 10; con1: COMMIT; Clearly, on the master the table myisam_t would have with no rows. In contrast, the slave would end up with (a = 10) in the same table. Consider the following statements and sessions (connections): con1: BEGIN; con1: INSERT INTO innodb_t values (1); con2: INSERT INTO myisam_t values (1); con1: UPDATE myisam_t SET a = a + 10; con2: UPDATE myisam_t SET a = a * 10; con1: COMMIT; The binary log produced would look like this: con2: INSERT INTO myisam_t values (1); con2: UPDATE myisam_t SET a = a * 10; con1: BEGIN; con1: INSERT INTO innodb_t values (1); con1: UPDATE myisam_t SET a = a + 10; con1: COMMIT; To simplify the reasoning, assume that the only data in table myisam_t is (a = 1). In such a case, the master would end up with (a = 110) and the slave with (a = 20). Goals ===== The goal of this WL is to ensure that it is safe to mix changes on both non-transactional and transactional tables under the ROW and MIXED modes. Note however that MyISAM tables have issues since a crash might leave data in the table on restart. So we can't guarantee results for non-transactional tables unless they are crash-safe (see Section HLS). Unfortunately, there is nothing that can be done to make this mixture completely safe under the STMT mode. So, we will not change the behavior for STMT mode. See bug: - BUG#40278 Decisions regarding this WL =========================== - This is a known issue with ROW in 5.1; the fix will therefore be pushed into the 5.1 tree. -- Trudy Pelzer, 2006-01-24 - Discussed with Mats Kindahl on February 9th. This is a larger job than anticipated and won't be ready in time for the 5.1 beta code freeze. It will be fixed in 5.2 instead. -- Trudy Pelzer, 2006-02-10 - Since this is only for non-transactional table updates within real transactions and the same problem has always existed for statement-based replication, Lars agrees that this is not needed before 5.2. -- Lars, 2007-02-20 -- This work should be pushed to 5.4 -- Lars, 2009-06-09 Documentation ============= After pushing this WL, we need to describe what is described here. Highlights: 1 - Changing in behavior when in ROW or MIXED modes. 2 - The behavior was kept the same in STMT mode. 3 - Tagging statements as unsafe and how this influences MIXED mode. 4 - sync_bin_log does not happen per event but only when both caches are flushed. 5 - In general DDL events are written directly to the binary log. The exceptions are the "CREATE TABLE ... SELECT FROM". BUG REFERENCES / HISTORY ======================== - CLOSED bugs were already pushed to 5.1 - BUGS marked with WL#2687 will be fixed after WL#2687 is pushed. |-----------+-------------------------------------------+---------+---------| | REFERENCE | DESCRIPTION | STATUS | MODE | |-----------+-------------------------------------------+---------+---------| | BUG#28976 | Fixes flag issue that would | CLOSED | SBL | | | log M stmts outside trans. boundaries | | | | BUG#40116 | Fix makes all events a transaction | CLOSED | SBL/RBL | | | to be written into the transaction cache. | | | | BUG#44581 | Fixes the case where M stmts would | CLOSED | SBL | | | lock timeout on N tables on slave | | | | BUG#45694 | Fixes the same case as BUG#44581 | CLOSED | SBL | | | but instead of lock timeout we | | | | | have deadlock | | | | BUG#46129 | Fixes all NTM problems for SBL | CLOSED | SBL | | BUG#46130 | Fixes Slave not correctly handling | CLOSED | SBL | | | "expected errors" | | | | BUG#46864 | Fixes uncovered bug by BUG#46129 | CLOSED | SBL | | | (ignored error flag on slave for | | | | | a given stmt) | | | | BUG#47287 | Reverts RBL behavior changed by BUG#46129 | CLOSED | RBL | |-----------+-------------------------------------------+---------+---------| | BUG#40278 | Too risky for 5.1 (lot of changes | WL#2687 | RBL | | | and refactoring around binlogging). | | | | BUG#43794 | DUPLICATE of BUG#40278 | WL#2687 | RBL | | BUG#47175 | failing INSERT...SELECT in N | WL#2687 | RBL | | | table is not logged | | | |-----------+-------------------------------------------+---------+---------| Pushed to mysql-5.1-rep+3 ========================= revision-id: alfranio.correia@sun.com-20091103190256-637o8qxlveikrt3i
Definitions =========== We are executing a transaction or statement containing a mixture of changes to non-transactional and transactional tables. We refer to tables managed by a non-transactional engine as N-tables and tables managed by transactional engines as T-tables. We further refer to statements accessing T-tables and changing at least one of them as T-statements (or simply T); statements accessing N-tables and changing at least one of them as N-statements (or simply N); statements referencing a mixture of N- and T-tables and changing at least one of them as M-statements (or simply M); M-statement is an abstraction for multi-table updates, triggers, procedures and functions that references a mixture of N- and T-tables and changes at least one of them. A transaction can be either explicitly defined by calling START TRANSACTION or BEGIN (B) or implicitly defined when auto-commit is not on, and is finished by either calling commit (C) or rollback (R). In particular, a single statement that accesses a T-table is implicitly in the boundaries of a transaction and is either automatically committed if its execution succeeds or automatically rolled back if it fails. Any transactional table is by definition crash-safe but not all the non-transactional tables are crash-safe. In MyIsam, for example, a statement that fails do not have its changes rolled back, if there is any. Even by changing individual rows, there is still a chance of having inconsistency. Solution ======== Row manipulations for N-tables are written to the binary log upon committing or rolling back the statement that produced the changes, instead of being written upon committing a transaction. Note that this solution only works for ROW and MIXED mode and there is no feasible solution for STMT mode. For the STMT mode, we just make sure that consistency is preserved for T-tables and the few safe cases involving N-statements are properly handled. Solution in STMT mode ===================== In general, if N-statements were written to the binary log upon committing or rolling back the statement, causality (i.e. dependency among statements) might be harmed thus possibly introducing inconsistency into T-tables even without concurrency. For example, consider the following statements and session (connection): con1: BEGIN; con1: INSERT INTO myisam_t values (1); con1: INSERT INTO myisam_t values (2); con1: INSERT INTO innodb_t values (1); con1: UPDATE innodb_t set v = SELECT sum(*) from myisam_t; con1: INSERT INTO myisam_t values (3); con1: COMMIT; The binary log produced would look like this: con1: BEGIN; con1: INSERT INTO myisam_t values (1); con1: COMMIT con1: BEGIN; con1: INSERT INTO myisam_t values (2); con1: COMMIT con1: BEGIN; con1: INSERT INTO innodb_t values (1); con1: UPDATE innodb_t set v = SELECT sum(*) from myisam_t; con1: INSERT INTO myisam_t values (3); con1: COMMIT; However, if you had written all the T-statements into the binary log upon committing the statement, the innodb_t table would be inconsistent on the master. In general, it is impossible to automatically detect causality/dependency among statements by just analyzing the statements sent to the server. This happen because dependency may be hidden in the application code and it is necessary to know a priori all the statements processed in the boundaries of a transaction such as in a procedure. Moreover, even for the few cases that we could automatically address in the server, the computation effort required could make the approach infeasible. So, we do as follows: - N-statements that happen before all statements that update T-tables in a transaction are written directly to the binary log upon committing or rolling back the statement. - Otherwise, N-statements are classified as unsafe, a warning message is printed out and their writing is delayed until a commit or rollback is requested on behalf of the transaction. - M-statements are always classified as unsafe and delayed. - When a N- or M-statement is rolled back, for example due to a duplicate key found in their N-table, changes may be left over if the table is not crash-safe. In this case, the statement is written to the binary log with the associated error code. - Upon committing the transaction, the delayed statements are written to the binary log wrapped by a BEGIN/COMMIT. - Upon rolling back the transaction, the delayed statements are written to the binary log only if an N-table was changed. In this case, the transaction is written to the binary log wrapped in a BEGIN/ROLLBACK pair. Note that in the STMT mode, the only safe combination happens when statements that update N-tables are executed before all statements that update T-tables in a transaction. Solution in ROW Mode ==================== In ROW mode, we do as follows: - Rows from N-tables are written to the binary log upon committing or rolling back the statement that produced them. - In contrast, rows from T-tables are only written upon committing a transaction. - Clearly, rows from T-tables can be discarded upon rollback. Note that this mode is completely safe and as such no warning message is printed out. Solution in MIXED Mode ====================== In this mode, changes may be written to the binary log as statements or rows. Any statement classified as unsafe is automatically written to the binary log as rows based on the facilities already provided by the server. Thus, the MIXED mode is equivalent to the STMT mode when there is no unsafe statement. Any unsafe statement, however, has its changes logged as rows which makes it completely safe. Note that consistency may not be preserved if a user switches between ROW/MIXED and STMT modes. Note that M-statements such as 1: INSERT INTO myisam_t SELECT * FROM innodb_t; 2: INSERT INTO innodb_t SELECT * FROM myisam_t; are classified as unsafe to ensure that in mixed mode the execution is completely safe and equivalent to the row mode. Consider the following statements and sessions (connections) to understand the reason: con1: INSERT INTO innodb_t VALUES (1); con1: INSERT INTO innodb_t VALUES (100); con1: BEGIN con2: INSERT INTO myisam_t SELECT * FROM innodb_t; con1: INSERT INTO innodb_t VALUES (200); con1: COMMIT; The point is that the concurrent statements may be written into the binary log in a way different from the execution. For example, BINARY LOG: con2: BEGIN; con2: INSERT INTO myisam_t SELECT * FROM innodb_t; con2: COMMIT; con1: BEGIN con1: INSERT INTO innodb_t VALUES (200); con1: COMMIT; .... or BINARY LOG: con1: BEGIN con1: INSERT INTO innodb_t VALUES (200); con1: COMMIT; con2: BEGIN; con2: INSERT INTO myisam_t SELECT * FROM innodb_t; con2: COMMIT; Clearly, this may become a problem in STMT mode and setting the statement as unsafe will make rows to be written into the binary log in MIXED mode and as such the problem will not stand. In STMT mode, although such statement is classified as unsafe, i.e. INSERT INTO myisam_t SELECT * FROM innodb_t; there is no enough information to avoid writing it outside the boundaries of a transaction. This is not a problem if we are considering snapshot isolation level but if we have pure repeatable read or serializable the lock history on the slave will be different from the master. Summary of the solutions ======================== format operation T Before error action ------ --------- -------- ----- ------ STMT T * no delay until commit/rollback is requested STMT T * yes ignore them STMT N no no no-delay write to the binary log STMT N no yes no-delay write to the binary log STMT N yes no delay until commit/rollback and warn "unsafe" STMT N yes yes delay until commit/rollback and warn "unsafe" STMT M * no delay until commit/rollback and warn "unsafe" STMT M * yes delay until commit/rollback and warn "unsafe" ROW T * no delay until commit/rollback is requested ROW T * yes ignore them ROW N * no no-delay write to the binary log ROW N * yes no-delay write to the binary log ROW M * no for rows from N-tables no-delay; for rows from T-tables delay until commit/rollback ROW M * yes for rows from N-tables no-delay; ignore the rows from T-tables STMT C * no write delayed changes to the binary log STMT C * yes write delayed changes if a N-table was update wrapped by a rollback; otherwise ignore them STMT R * * write delayed changes if a N-table was changed; otherwise ignore them ROW C * no write delayed changes ROW C * yes do nothing, rollback will be called ROW R * * ignore them Requirements ============ 1. In STMT mode, causality (i.e. dependency among statements) *shall* be preserved by writing the statements to the binary log by the same order that they were executed in a transaction. 2. In ROW and MIXED modes, changes to N-tables *shall* be available in the binary log when the statement is committed or rolled back. On the other hand, changes to T-tables *shall* be available in the binary log when the transaction is committed. 3. In the STMT or MIXED modes, N-statements that are executed before all T-statements in a transaction *shall* be available in the binary log when the *statement* is committed or rolled back. 4. A rollback of a transaction *shall* leave changes to N-tables in the binary log, if there is any. If along with such changes, changes to T-tables are also written, the transaction *shall* be end with a rollback to keep slaves consistent with the master. 5. On a crash, any changes to N-tables before the last sync point (as specified by ``--sync-binlog`` shall be available/present on disk. No guarantee is made regarding changes to N-tables written after the last sync point.
Classifying a statement as unsafe ================================= Note that the facilities to classify a statement as unsafe are already provided by the server. In this WL, we only ensure that N-statements after an earlier T-statement and also M-statements are classified as unsafe. This means that both statements will trigger the printing out of the following warning message in STMT mode: "Statement may not be safe to log in statement format". And in MIXED mode, their changes will be written to the binary log as rows. Writing to the binary log ========================= To guarantee that changes are corretly written to the binary log as previously decribed, we have two cache structures per transaction: one for transactional changes (trx-cache) and another one for non-transactional changes (stmt-cache). The trx-cache is flushed and truncated upon committing or rolling back a transaction. The stmt-cache is flushed and truncated upon committing or rolling back a statement, i.e. while finishing it. Mixing transactional and no-transactional statements ==================================================== The proposed implementation is divided into three layers as follows: Layer 1 - Decides which format should be used: ROW or STATEMENT. Layer 2 - Decides which type of cache should be used: stmt-cache or trx-cache. Layer 3 - Flushes the cache to the binary log. Note that only the flush operations must be synchronized. We omitted however the synchronization primitives to avoid cluttering the description. We also ommitted here the algorithm of some auxiliary functions. Namely, stmt_tables_of(), row_tables_of(), all_is_transactional() and any_is_mixed() whose description is presented below: DESCRIPTION: Returns a set of tables referenced by an internal representation of a statement. INPUT: stmt is a statement in an internal form. OUTPUT: set of tables referenced by the statement. set of tables <-- stmt_tables_of(stmt); DESCRIPTION: Returns a set with the table referenced by a row. INPUT: a row. OUTPUT: a set with the table referenced by the row. set of tables <-- row_tables_of(row); DESCRIPTION: Checks if all tables in a set of tables passed are transactional. INPUT: set_tables is a set of tables. OUTPUT: TRUE if all the tables are transactional, FALSE otherwise. bool <-- all_is_transactional(set_tables); DESCRIPTION: Checks if in a set of tables, there are both transactional and non-transactional tables. INPUT: set_tables is a set of tables. OUTPUT: TRUE if there are both transactional and non-transactional tables, FALSE otherwise. bool <-- any_is_mixed(set_tables); Note that there are other auxiliary functions whose propose and behavior can be easily inferred by the name of the function and its parameters. For that reason, we don't describe them here. -----(Layer 1 - Deciding format)----- INTRODUCTION: N/A DESCRIPTION: This function decides which format should be used, either row or statement, and is called before executing a statement and after parsing and preparing it. INPUT: mode represents either ROW, STMT or MIXED; stmt is a statement in an internal form. OUTPUT: either ROW or STMT which represent how the changes shall be written to the binary log. *FORMAT (mode, stm)* -> format= (mode == ROW ? ROW : STMT); if (mode == STMT or mode == STMT) { if ((!empty(trx-cache) && !all_is_transactional(stmt_tables_of(stm))) || any_is_mixed(tables_of(stm))) if (mode == MIXED or mode == ROW) format= ROW; else { print out warning message; format= STMT; } } return (format); -----(Layer 1 - Deciding format)----- -----(Layer 2 - Caching data)----- INTRODUCTION: The function to be used is chosen based on the format decision previously described. DESCRIPTION: This function writes a row to a cache. INPUT: row is the changed row to be cached. OUTPUT: N/A *CACHE_ROW(row)* -> if (all_is_transactional(row_tables_of(row))) add_to(trx-cache, row) else add_to(stmt-cache, row) DESCRIPTION: This function writes a statement to a cache. INPUT: stm is an internal representation of the statement to be cached. OUTPUT: N/A CACHE_STMT(stm) -> if (empty(trx-cache) && !all_is_transactional(stmt_tables_of(stm))) add_to(stmt-cache, stm.string) else add_to(trx-cache, stm.string) -----(Layer 2 - Caching data)----- -----(Layer 3 - Flushing data)----- INTRODUCTION: The function to be used is chosen based on the type of the operation: commit or rollback. In what follows, we assume that a single statement, either T, N or M that is successfully executed triggers the following calls: - COMMIT(false) --> COMMIT(true). On the other hand a transaction B T|N|M T|N|M C that is successfully executed triggers the following calls: COMMIT(false) --> COMMIT(false) --> COMMIT(true). The same reasoning applies to the rollback function. DESCRIPTION: This function flushes the content of a cache upon commit. INPUT: If all is equal to FALSE, a statement is being committed, otherwise the transaction is being committed. OUTPUT: N/A COMMIT(all) -> if all then flush_with_boundaries(trx-cache, COMMIT) else flush_with_boundaries(stmt-cache, COMMIT) DESCRIPTION: This function flushes the content of a cache upon rollback. INPUT: If all is equal to FALSE, a statement is being rolled back, otherwise the transaction is being rolled back. OUTPUT: N/A ROLLBACK(all) -> if all then { if (mode == ROW || mode == MIXED) truncate(trx-cache, 0) else flush_with_boundaries(trx-cache, ROLLBACK) } else flush_with_boundaries(stmt-cache, ROLLBACK) -----(Layer 3 - Flushing data)----- Definition table for Layer 1: Format (F): (statement (S), row (R)) Error (E): (error (E), warning (W), ignored (I)) STMT ROW MIXED 1) B T T C/R F - S F - R F - S E - I E - I E - I 2) B T N C/R F - S F - R F - R(N) S(T) E - W(N) I(T) E - I E - I 3) T F - S F - R F - S E - I E - I E - I 4) N F - S F - R F - S E - I E - I E - I 5) M F - S F - R F - R E - W E - I E - I 6) B N N T C/R F - S F - R F - S E - I E - I E - I 7) B N N C/R F - S F - R F - S E - I E - I E - I 8) B M T C/R F - S F - R F - R(N) R(T) S(T) E - W(M) I(T) E - I E - I 9) B M N C/R F - S F - R F - R(N) R(T) E - W(M) W(N) E - I E - I 10) B N M C/R F - S F - R F - S(N) R(N) R(T) E - W(M) I(N) E - I E - I 11) B T M C/R F - S F - R F - R(N) R(T) S(T) E - W(M) I(N) E - I E - I Definition table for Layer 2 and 3: Caching (C): (stmt-cache (S), trx-cache (T), ignored (I)) Flushing (F): (upon committing/rolling back a statement (D), upon committing/rolling back a transaction (C/R), ignored (I)) STMT ROW MIXED C - T C - T C - T 1.a) B T T C F - C F - C F - C C - T C - T C - T 1.b) B T T R F - I F - I F - I C - T C - S(N) T(T) C - S(N) T(T) 2.a) B T N C F - C F - D(N) C(T) F - D(N) C(T) C - T C - S(N) T(T) C - S(N) T(T) 2.b) B T N R F - R F - D(N) I(T) F - D(N) I(T) C - T C - T C - T 3.a) T F - C F - C F - C C - T C - T C - T 3.b) T with error F - I F - I F - I C - S C - S C - S 4.a) N F - D F - D F - D C - T C - S(N) T(T) C - S(N) T(T) 5.a) M F - C F - D(N) C(T) F - D(N) C(T) C - T C - S(N) T(T) C - S(N) T(T) 5.b) M with error F - R F - D(N) I(T) F - D(N) I(T) C - S(N) T(T) C - S(N) T(T) C - S(N) T(T) 6.a) B N N T C F - D(N) C(T) F - D(N) C(T) F - D(N) C(T) C - S(N) T(T) C - S(N) T(T) C - S(N) T(T) 6.b) B N N T R F - D(N) R(T) F - D(N) I(T) F - D(N) I(T) C - S(N) C - S(N) C - S(N) 7.a) B N N C F - D(N) F - D(N) F - D(N) C - S(N) C - S(N) C - S(N) 7.b) B N N R F - D(N) F - D(N) F - D(N) C - T C - S(N) T(T) C - S(N) T(T) 8.a) B M T C F - C F - D(N) C(T) F - D(N) C(T) C - T C - S(N) T(T) C - S(N) T(T) 8.b) B M T R F - R F - D(N) I(T) F - D(N) I(T) C - T C - S(N) T(T) C - S(N) T(T) 9.a) B M N C F - C F - D(N) C(T) F - D(N) C(T) C - S(N) T(M) C - S(N) T(T) C - S(N) T(T) 10.a) B N M C F - D(N) C(M) F - D(N) C(T) F - D(N) C(T) C - S(N) T(M) C - S(N) T(T) C - S(N) T(T) 10.b) B N M R F - D(N) R(M) F - D(N) I(T) F - D(N) I(T) C - T C - S(N) T(T) C - S(N) T(T) 11.a) B T M C F - C F - D(N) C(T) F - D(N) C(T) C - T C - S(N) T(T) C - S(N) T(T) 11.b) B T M R F - R F - D(N) I(T) F - D(N) I(T)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.