For each statement, we must determine the logging format: row or statement. This is done as follows.
At parse time, it is detected if the statement is unsafe to log in statement format (that is, requires row format). If this is the case, the THD::Lex::set_stmt_unsafe() function is called. This must be done prior to the call to THD::decide_logging_format() (that is, prior to lock_tables). As a special case, some types of unsafeness are detected inside THD::decide_logging_format(), before the logging format is decided. Note that statements shall be marked unsafe even if binlog_format!=mixed.
THD::decide_logging_format() determines the logging format, based on the value of binlog_format and the unsafeness of the current statement.
THD::decide_logging_format() also determines if the statement is impossible to log, in which case it generates an error and the statement is not executed. The statement may be impossible to log for the following reasons:
both row-incapable engines and statement-incapable engines are involved (ER_BINLOG_ROW_ENGINE_AND_STMT_ENGINE)
BINLOG_FORMAT = ROW and at least one table uses a storage engine limited to statement-logging (ER_BINLOG_ROW_MODE_AND_STMT_ENGINE)
statement is unsafe, BINLOG_FORMAT = MIXED, and storage engine is limited to statement-logging and (ER_BINLOG_UNSAFE_AND_STMT_ENGINE)
statement is a row injection (that is, a row event executed by the slave SQL thread or a BINLOG statement) and at least one table uses a storage engine limited to statement-logging (ER_BINLOG_ROW_INJECTION_AND_STMT_ENGINE)
BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-logging (ER_BINLOG_STMT_MODE_AND_ROW_ENGINE)
statement is a row injection (that is, a row event executed by the slave SQL thread or a BINLOG statement) and BINLOG_FORMAT = STATEMENT (ER_BINLOG_ROW_INJECTION_AND_STMT_MODE)
more than one engine is involved and at least one engine is self-logging (ER_BINLOG_MULTIPLE_ENGINES_AND_SELF_LOGGING_ENGINE)
See the comment above decide_logging_format for details.
THD::decide_logging_format() also determines if a warning shall be issued. A warning is issued for unsafe statements if binlog_format=STATEMENT. Warnings are not issued immediately; instead, THD::binlog_stmt_unsafe_flags is set and the warning is issued in THD::binlog_query(). This prevents warnings in the case that the statement generates an error later so that it is not logged.
Sub-statements. Let T be a statement that invokes an unsafe sub-statement S (S may be a stored function, stored procedure, trigger, view, or prepared statement). Each sub-statement is cached as an sp_head object. The sp_head object stores the Lex that was generated when the statement defining the sub-statement was parsed (that is, when CREATE FUNCTION/CREATE PROCEDURE/CREATE TRIGGER/CREATE VIEW/PREPARE was parsed). Hence, this cached Lex has the unsafe flag set. When T is parsed, it fetches S from the cache. At this point, it calls sp_head::propagate_attributes(), which marks the current Lex object as unsafe if the cached Lex object was unsafe.
NOTE: the following list is incomplete; it does not take into account changes made in 2010 or later (roughly).
A statement may be flagged as unsafe. An unsafe statement will be logged in row format if binlog_format=MIXED and will generate a warning if binlog_format=STATEMENT.
The following types of sub-statements are currently marked unsafe:
System functions that may return a different value on slave, including: FOUND_ROWS, GET_LOCK, IS_FREE_LOCK, IS_USED_LOCK, LOAD_FILE, MASTER_POS_WAIT, RAND, RELEASE_LOCK, ROW_COUNT, SESSION_USER, SLEEP, SYSDATE, SYSTEM_USER, USER, UUID, UUID_SHORT.
Note: the following non-deterministic functions are not marked unsafe:
CONNECTION_ID (Query_log_events contain the connection number)
CURDATE, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, CURTIME, LOCALTIME, LOCALTIMESTAMP, NOW, UNIX_TIMESTAMP, UTC_DATE, UTC_TIME, UTC_TIMESTAMP (Query_log_event contain timezone and the time when the statement was executed)
LAST_INSERT_ID (this is replicated in an Intvar_log_event)
Also note that most floating-point math functions will return a hardware-dependent result. We do not mark such function unsafe, because we only support replication between platforms that use identical floating point math.
System variables, with some exceptions listed at http://dev.mysql.com/doc/en/binary-log-mixed.html
UDFs: since we have no control over what the UDF does, it may be doing something unsafe.
Update from a sub-statement of a table that has an autoincrement column. This is unsafe because the Intvar_log_event is limited to only hold autoincrement values for one table.
INSERT DELAYED, since the rows inserted may interleave with concurrently executing statements.
Updates using LIMIT, since the order in which rows are retreived is not specified.
Statements referencing system log tables, since the contents of those tables may differ between master and slave.
Non-transactional reads or writes executing after transactional reads or writes in a transaction (see Section 19.4.3, “Logging Transactions”).
Reads or writes to self-logging tables, and all statements executing after reads or writes to self-logging tables in the same transaction.
The following has not yet been implemented:
Statements using fulltext parser plugins (cf. Bug#48183)
Status of this subsection: complete but not reviewed 2009-10-21
There are several types of statements that require attention because of their special behavior in transactions:
Non-transactional updates that take place inside a transaction present problems for logging because (1) they are visible to other clients before the transaction is committed, and (2) they are not rolled back even if the transaction is rolled back. It is not always possible to log correctly in statement format when both transactional and nontransactional tables are used in the same transaction.
Statements that do an implicit commit (that is, most but not all DDL, and some utility commands) are logged specially due to unspecified requirements by NDB.
Statements that update temporary tables need special treatment since they are not logged in row format.
To reason about logging different table types, we make some preliminary definitions.
(D-T-table) A table that has a transactional engine is called a T-table.
(D-N-table) A table that has a nontransactional engine is called an N-table.
(D-N-write) A statement makes an N-write if it makes any type of change to the server state that will not be changed by a ROLLBACK.
Note: N-writes include updates to N-tables, but also CREATE and DROP statements.
(D-log-target) Events are either appended to the Transaction Cache (TC) or to the Statement Cache (SC) or written directly to the binlog.
The following preliminary rules are actually consequences of the principle that statements shall be correctly logged when binlog_format=MIXED or ROW. They also apply when binlog_format=STATEMENT: this makes statement format work in many practical cases.
(PR-causality) If statement A is executed before statement B, and B is logged in statement format, and B reads tables that A may modifies, then B shall be logged after A.
(PR-durability) Events shall be written to the binary log at the moment they take effect. In particular, changes to N-tables shall be written to the binary log when they have been executed, and changes to T-tables shall be written to the binary log on commit. If --sync-binlog has been specified, then it suffices that events are be written to the binary log at the next synchronization point.
(PR-causality-precedence) If P-causality and P-durability cannot both be fulfilled, then P-causality is considered more important.
The preliminary rules above, together with the principles for logging format, have been used to construct the following rules.
CALL statements are unrolled (see ???TODO: add section about unrolling???), so that each statement executed by the stored procedure is logged separately. (If a stored procedure A invokes a stored procedure B, then B is unrolled recursively). In the following, we assume that unrolling has already been done, and the word "statement" refers to a non-CALL top-level statement or a non-CALL sub-statement.
Let S be a logged statement that does not have an implicit commit, except CREATE TEMPORARY TABLE...SELECT (This includes all "pure DML": INSERT, UPDATE, DELETE, REPLACE, TRUNCATE, SELECT, DO, CALL, EXECUTE, LOAD DATA INFILE, and BINLOG. It also includes CREATE TEMPORARY TABLE without SELECT, and DROP TEMPORARY TABLE. CREATE TEMPORARY TABLE...SELECT is handled in the next subsection).
Before executing S, determine unsafeness:
(R-unsafe-nontransactional) If S either makes N-writes or reads from an N-table, and either S or a previous statement in the same transaction reads or writes to a T-table then S is marked unsafe.
(R-unsafe-self-logging) If either S or a previous statement in the same transaction reads or writes to a self-logging table, then S is marked unsafe.
When logging S, determine where to log it by applying the following rules in order:
(R-log-statement-format) If S is to be logged in statement format (that is, if one of the following holds: (1) @@session.binlog_format=STATEMENT; (2) @@session.binlog_format=MIXED and S is safe; (3) S is of DDL type, that is, CREATE TEMPORARY TABLE):
If S produces an error and does not do any N-write, do not log.
Otherwise, if either S or any previous statement in the same transaction reads or writes in any T-tables, log to TC.
Otherwise, log to SC.
(R-log-row-format) If S is to be logged in row format (that is, if S is DML and one of the following holds: (1) @@session.binlog_format=ROW; (2) @@session.binlog_format=MIXED and S is unsafe):
Do not log row events that write to temporary tables.
Log row events that write to non-temporary N-tables to SC.
Log row events that write to non-temporary T-tables to TC, except rows that are rolled back due to an error. (Note: if there is an error, rows written to a T-table are kept if there are subsequent rows written to an N-table.)
(R-flush-SC) At the end of S, write BEGIN + SC + COMMIT to the binlog and clear the SC.
At end of transaction:
(R-log-commit) At COMMIT or implicit commit, where all XA tables in the transaction succeed in the "prepare" phase:
If the TC is non-empty, write BEGIN + TC + COMMIT to the binlog.
If the TC is empty, do nothing.
(R-log-rollback) At ROLLBACK; or at COMMIT or implicit commit where some XA table fails in the "prepare" phase:
If the TC contains any N-write, write BEGIN + TC + ROLLBACK to the binlog.
If the TC does not contain any N-write, do nothing.
(R-log-rollback-to-savepoint) At ROLLBACK TO SAVEPOINT:
If the TC contains any N-write after the savepoint, write ROLLBACK TO SAVEPOINT to the TC.
Otherwise, clear the part of the TC that starts at the savepoint and extends to the end of the TC. (Bug#47327 breaks this rule)
(R-clear-TC) Clear the TC at the end of the transaction.
First, unsafeness is determined as above (R-unsafe-transaction). Then the logging format is decided. Then the following rules apply.
(R-log-create-select-statement-format) If logging in statement format (that is, one of the following holds: (1) @@session.binlog_format=STATEMENT; (2) @@session.binlog_format=MIXED and statement is safe):
If there is an error, do not write anything.
If there is no error and the TEMPORARY keyword is used, write the entire CREATE...SELECT statement to the TC.
If there is no error and the TEMPORARY keyword is not used, write the entire CREATE...SELECT directly to the binlog.
(R-log-create-select-row-format) If logging in row format (that is, one of the following holds: (1) @@session.binlog_format=ROW; (2) @@session.binlog_format=MIXED and statement is unsafe):
If the TEMPORARY keyword is used, do not write anything.
If the TEMPORARY keyword is not used, write CREATE TABLE (without select) + BEGIN + row events + COMMIT to the TC. If there is an error, clear the TC; otherwise flush the TC to the binlog at the end of the statement and then clear the TC. (Note: currently Bug#47899 breaks this rule)
Note: this breaks D-rpl-correct rule, because the slave will have an intermediate state that never existed on the master (namely, a state where the new table exists and is empty).
(R-log-commit-statement) All other statements that have a pre-commit are written directly to the binlog. (Note: this is semantically equivalent to writing it to the SC and flushing the SC. However, due to requirements by NDB (which have not been clarified), we write directly to the binlog.)
Status of this subsection: not started 2009-10-21
User variables: User variables (@variable) are logged as
User-defined functions
Server variables
Built-in functions
Status of this subsection: not started 2009-10-21
INSERT DELAYED
LIMIT
System tables
Status of this subsection: finished, not reviewed, not fully implemented 2009-10-21
The sets of columns recorded in the BI and AI are determined by the value of binlog_row_image. To specify the sets of columns, we define the PKE (for Primary Key Equivalent), as follows:
If a PK exists, the PKE is equal to the PK.
Otherwise, if there exists a UK where all columns have the NOT NULL attribute, then that is the PKE (if there are more than one such UKs, then one is chosen arbitrarily).
Otherwise, the PKE is equal to the set of all columns.
The set of columns included in the BI and AI are defined as in the following tables:
write event
binlog_row_image |
Before image |
After image |
minimal |
- |
All columns where a value was specified, and the autoincrement column if there is one |
noblob |
- |
All columns where a value was specified, and the autoincrement column if there is one, and all non-blob columns |
full |
- |
All columns |
update event
binlog_row_image |
Before image |
After image |
minimal |
PKE |
All columns where a value was specified |
noblob |
PKE + all non-blob columns |
All columns where a value was specified, and all non-blob columns |
full |
All columns |
All columns |
delete event
binlog_row_image |
Before image |
After image |
minimal |
PKE |
- |
noblob |
PKE + all non-blob columns |
- |
full |
All columns |
- |
Cf. WL#5092.
