WL#7083: GTIDS: set gtid_mode=ON online
Affects: Server-5.7
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog provides a way to turn on GTIDs online, so that: 1. Reads and writes are allowed always during the procedure; and 2. servers do not need to synchronize. Before this worklog, the user had to stop updates, then sychronize all servers, then restart all servers simultaneously. Thus, turning on GTIDs implied several minutes of planned downtime. After this worklog, we still require the server to restart, but it is enough to restart one server at a time, so the replication cluster can still be online and accept updates. Thus, the mode of operation is similar to that of a rolling upgrade. REFERENCES ========== - BUG#69059: GTIDS LACK A REASONABLE DEPLOYMENT STRATEGY - WL#6559 : Optimize GTIDs for passive slave - store GTIDs in table
Functional Requirements ======================= FR1. The procedures for turning ON or OFF GTIDs must not impose a requirement to synchronize the entire topology at any given point in time. FR2. The procedures for turning ON or OFF GTIDs must not impose a requirement to restart any server at any time (other than possibly upgrading the server version to one that contains the feature). FR3. This is a secondary goal / positive side effect. The new functionality shall make it possible for a multi-source slave to have both masters with GTID_MODE=OFF and masters with GTID_MODE=ON. Non-functional Requirements =========================== NFR1. The procedures must work in arbitrary replication topologies. NFR2. If the user makes any mistake, it should be detected as soon as possible, if at all possible. Mistakes must never lead to situations where wrong transactions are applied on the database. NFR3. The extra functionality should not introduce new ways for the user to make the kind of mistakes that causes the slave to go out of sync with the master at a later fail-over operation.
1. ANALYSIS OF REQUIREMENTS =========================== If servers that generate GTIDs coexist in the same topology as old servers that do not generate GTIDs, we will have a mixture of transactions that have identifiers and transactions that do not have identifiers. Terminology. There are two types of transactions: - A transaction that has a GTID in the form UUID:NUMBER is called a *GTID-transaction*. In binary and relay logs, every GTID-transaction is always preceded by a Gtid_log_event. GTID-transactions can be addressed using either the GTID or using filename and position. - A transaction that does not have a GTID assigned is called an *anonymous transaction*. After WL#7592, anonymous transactions are always be preceded by an Anonymous_gtid_log_event. Before WL#7592, anonymous transactions are not preceded by any particular event at all. So after this worklog, transactions in a relay log that was received from an old master may not be preceded by any particular event at all, but after being replayed and logged in the slave's binary log, they will be preceded with an Anonymous_gtid_log_event. See also section 2.7 for a description of how to detect anonymous transactions in new and old binary and relay logs. Anonymous transactions can only be addressed using filename and position. 1.1. Requirement: ANONYMOUS TRANSACTION MUST REMAIN ANONYMOUS ON RE-EXECUTION ----------------------------------------------------------------------------- An anonymous transaction must be kept anonymous when re-executed. If an old master generates an anonymous transaction, then the new slave must preserve anonymity and not generate a new GTID. The following example illustrates what would happen if anonymous transactions were not kept anonymous when replicated: +----------> Server B | GTID_MODE=ON Server A --------+ Binlog: T1 (GTID=B:1) GTID_MODE=OFF | Binlog: T1 (GTID=anon) +----------> Server C GTID_MODE=ON Binlog: T1 (GTID=C:1) Server A is a master, Servers B and C are slaves of A. A is old and generates only anonymous transactions. B and C are new and generates a GTID for anonymous that it re-executes. A has executed one transaction, T1. B and C have re- executed T1 and each of them has generated a new GTID. The GTIDs are different since the Server UUIDs for B and C are different. Suppose A crashes. Then B should become a new master and C should become a slave of B. Since C does not have any transaction with GTID (B, 1), B will send T1 to C and C will re-execute it. This will lead to inconsistent data on C since T1 is executed twice. 1.2. Requirement: GTID-TRANSACTION MUST KEEP ITS GTID WHEN RE-EXECUTED ---------------------------------------------------------------------- If a master with GTID_MODE=ON generates a GTID for a transaction, then an error should be generated if an old server or a server with GTID_MODE=OFF tries to process the transaction. The following example illustrates what would happen if an old server would remove the GTID from a transaction that it re-executes and which originates from a new server. Server A -------------------> Server B GTID_MODE=ON GTID_MODE=OFF Binlog: T1 (GTID=A:1) Binlog: T1 (GTID=anon) Server A is a master, Server B is a slave of A. A has GTID_MODE=ON whereas B has GTID_MODE=OFF (or is old). A has executed one transaction, T1, and assigned it GTID (A:1). B has re-executed T1 and stripped away the GTID, converting T1 to an anonymous transaction. Suppose that B is upgraded and starts to use the AUTO_POSITION protocol. Then, when B reconnects next time using the AUTO_POSITION protocol, A:1 will be retransmitted and B will execute it again. This leads to inconsistent data on B. 1.3. Requirement: GENERATE GTIDS/ANONYMOUS TRANSACTIONS ACCORDING TO GTID_MODE ------------------------------------------------------------------------------ A server that has GTID_MODE=ON must only generate GTIDs. If it would receive anonymous transactions from a master, it must fail to execute the transactions. A server that has GTID_MODE=OFF must only generate anonymous transactions. If it would receive GTID-transactions from a master, it must fail to execute the transactions. 1.4. Requirement: GTIDS MUST NEVER BE LOST ------------------------------------------ Even if the user switches between GTID_MODE=ON and GTID_MODE=OFF several times, the GTID state (i.e., the value of @@GLOBAL.GTID_EXECUTED) must not be lost. So if e.g. one server by accident generates an anonymous transaction, other servers can temporarily change GTID_MODE to the effect that they can replicate the anonymous transaction (and in the meantime they cannot perform an automatic fail-over); but this does not lose any of the existing GTID state, so the servers can continue to work correctly after setting GTID_MODE=ON again. 1.5. Requirement: GTID-TRANSACTIONS MUST BE GTID-CONSISTENT ----------------------------------------------------------- The variable @@GLOBAL.ENFORCE_GTID_CONSISTENCY (which already exists in the server) disallows certain types of transactions that cannot be safely logged using GTIDs. Therefore, we require that any GTID-transaction is subject to the checks implied by @@GLOBAL.ENFORCE_GTID_CONSISTENCY. 1.6. Requirement: ALLOW MULTIPLE MASTERS WITH OPPOSITE GTID_MODES ----------------------------------------------------------------- The architecture should be open-ended. In particular, if in the future slaves are capable to connect to multiple sources, this should not be limited by the feature. Suppose that a future slave is connecting to two masters. One of the masters has GTID_MODE = ON and the other has GTID_MODE = OFF. This may happen e.g. if the masters are owned by two different DBAs (maybe even different organizations) and the slave DBA has no influence over them. Still the slave DBA may need to aggregate the data from the two masters. There must be a way to configure the slave so that this scenario is possible. 1.7. Requirement: SLAVE MUST BE ABLE TO PROCESS WHAT IT GETS FROM THE MASTER ---------------------------------------------------------------------------- This is a requirement on the way the feature is used. Any slave must be configured so that it is able to process the transactions coming from its master. In particular, if master is ON, slave cannot be OFF, and if slave is OFF, master cannot be ON. (These are only examples of restrictions; there will be more restrictions; see below.) 1.8. Requirement: AUTO_POSITION REQUIRES GTID-ONLY MASTER --------------------------------------------------------- In order to use the AUTO_POSITION protocol, the master must only generate GTID-transactions, not anonymous transactions. Thus, if the AUTO_POSITION protocol is enabled, connecting to a master that has GTID_MODE!=ON must fail. This is because anonymous transactions cannot be addressed using GTIDs; they can only be addressed using (filename, offset) pairs. 1.9. SUMMARY OF REQUIREMENTS ---------------------------- This is a summary of the requirements listed on the above sections. First, the implementation requirements: IR1. An anonymous transaction must be kept anonymous when re-executed. IR2. A GTID-transaction must keep its GTIDs when re-executed IR3. When GTID_MODE = OFF, only anonymous transactions must be generated. IR4. When GTID_MODE = ON, only GTID-transactions must be generated. IR5. GTID_EXECUTED shall always be persisted and never lose GTIDs. IR6. All GTID-transactions must be subject to ENFORCE_GTID_CONSISTENCY checks. IR7. It shall be possible to configure a slave so that it can accept updates both from masters that have GTID_MODE = ON and masters that have GTID_MODE = OFF. Second, the requirements for the procedure to turn on GTIDs itself: PR1. All servers must generate anonymous transactions until all servers know how to preserve GTIDs. PR2. All anonymous transactions in the topology must have been processed before using AUTO_POSITION. 2. PROPOSED SOLUTION ==================== To satisfy the implementation requirements and the procedure requirements listed in the REQUIREMENTS section, the act of turning ON GTIDs needs to be done in multiple steps. As such, the DBA needs to tell the server on which step of the procedure the server is in. For that we resort to the GTID_MODE variable. 2.1. MAKING GTID_MODE SETTABLE ------------------------------ To instruct the server in which mode it should operate, we change the GTID_MODE system variable: - Make GTID_MODE accept more values than just ON and OFF. (In fact, it does already but these modes are not implemented yet): - OFF: Both new and replicated transactions must be anonymous. - OFF_PERMISSIVE: New transactions are anonymous. Replicated transactions can be either anonymous or GTID-transactions. - ON_PERMISSIVE: New transactions are GTID-transactions. Replicated transactions can be either anonymous or GTID-transactions. - ON: Both new and replicated transactions must be GTID-transactions. - Make the variable settable dynamically, except in the following case: - Changing GTID_MODE from ON_PERMISSIVE to ON requires a server restart. The reason is explained in Section 2.8. The default remains GTID_MODE = OFF. The variable remains global-only. The variable can only be set by SUPER, from a top-level statement, outside a transaction. To see why we need the intermediate steps, consider bidirectional replication between two servers: - Initially the two servers have GTID_MODE = OFF. You cannot switch any of the servers directly to GTID_MODE = ON_PERMISSIVE or ON, because the other server would still have GTID_MODE = OFF and thus it would generate an error when it tried to process the transactions. So the first step must be to set GTID_MODE = OFF_PERMISSIVE. - Suppose the two servers have GTID_MODE = OFF_PERMISSIVE. You cannot switch any of the servers directly to GTID_MODE = ON, because that server would still receive anonymous transactions from the other server and therefore it would generate an error. So the second step must be to set GTID_MODE = ON_PERMISSIVE. - Once the two servers have GTID_MODE = ON_PERMISSIVE, it suffices to process all anonymous transactions of all relay logs, and after that all transactions are GTID-transactions. So then it is safe to switch to GTID_MODE = ON. The same procedure works in any topology. 2.2. PROCEDURE FOR TURNING ON GTIDS WITH THE REPLICATION CLUSTER ONLINE ----------------------------------------------------------------------- The procedure to start using GTIDs is as follows. Note that it is crucial that you complete every step before continuing to the next step. U1. The pre-conditions for using GTIDs are: U1.1. *All* servers in your topology must use MySQL 5.6.X or later. You cannot use the GTID feature if one server in the topology is old. *All* servers that need to switch GTID_MODE to ON online must use MySQL 5.7.Y or later. If any server is older than that, it can still switch GTID_MODE to ON, but all such old servers have to be offline at the same time, during part of the procedure (see below). Here, 5.6.X is the first release that supports GTIDs and 5.7.Y is the first release that supports the four GTID operation modes. U1.2. All servers leave GTID_MODE with the default value OFF. U2. On each server, execute: SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN; Then, let it run for a while with your normal workload. If this causes any warnings in the log, adjust your application so that it only uses GTID-compatible features and does not generate any warning. U3. On each server, execute: SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON; U4. On each server, execute: SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE; It does not matter which server executes this statement first, but it is important that all servers complete this step before any server begins the next step. 4.1. If any servers use a version older than 5.7.X, switch them off at this point. Steps 5 and 6 apply only to the servers of version 5.7.X or later. U5. On each server, execute: SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE; It does not matter which server executes this statement first. U6. On each server, wait until the status variable ANONYMOUS_TRANSACTION_COUNT is zero. This can be checked using: SHOW STATUS LIKE 'ANONYMOUS_TRANSACTION_COUNT'; On a replication slave, it is theoretically possible that this shows zero and then non-zero again. This is ok; it suffices that it shows zero once. U7. Wait for all transactions generated up to step 6 to replicate to all servers. You can do this without stopping updates: the only important thing is that all anonymous transactions get replicated. There are several possible ways to wait for transactions to replicate: U7.1. A simple method which works regardless of your topology, but relies on timing: if you are sure that the slave never lags more than N seconds, just wait for a bit more than N seconds. Or wait for a day, or whatever time period you consider safe for your deployment. U7.2. A safer method in the sense that it does not depend on timing: if you only have a master with one or more slaves, do the following: U7.2.1. On the master, execute: SHOW MASTER STATUS; Note down the values in the File and Position column. U7.2.2. On every slave, execute: SELECT MASTER_POS_WAIT(, ); U7.3. If you have a master and multiple levels of slaves (slaves of the slaves), repeat U7.2 on each level, starting from the master, then all the direct slaves, then all the slaves-of-slaves, etc. U7.4. If you use a circular replication topology where multiple servers may have write clients, perform step U7.2 for each master-slave connection, until you have completed the full circle. Repeat so that you do the full circle *twice*. Here is an example: Suppose you have three servers A, B, and C, replicating in a circle like A -> B -> C -> A. The procedure is then: - Do step U7.2.1 on A and step U7.2.2 on B. - Do step U7.2.1 on B and step U7.2.2 on C. - Do step U7.2.1 on C and step U7.2.2 on A. - Do step U7.2.1 on A and step U7.2.2 on B. - Do step U7.2.1 on B and step U7.2.2 on C. - Do step U7.2.1 on C and step U7.2.2 on A. U8. On each server, execute: SET @@GLOBAL.GTID_MODE = ON; U9. On each server, add gtid-mode=ON to my.cnf. U10.You are now guaranteed that all transactions have a GTID (except transactions generated in step 5 or earlier, which have already been processed). To start using the GTID protocol so that you can later perform automatic fail-over, execute on each slave: U10.1. Wait until at least one GTID-transaction has been replicated to all slaves. U10.2. STOP SLAVE; U10.3. CHANGE MASTER TO MASTER_AUTO_POSITION = 1; U10.4. START SLAVE; (If a future version of the server supports replication from multiple masters, step U10.3 must be performed once for each replication channel.) (Step U10.1 is needed to avoid spurious errors due to C5.3; see below.) 2.3. PROCEDURE FOR TURNING OFF GTIDS WITH THE REPLICATION CLUSTER ONLINE ------------------------------------------------------------------------ Users who want to turn off GTIDs can do almost the same procedure as in the previous section, but backwards. The only thing that differs is the point at which you wait for logged transaction to replicate. D1. On each slave, execute: D1.1. STOP SLAVE; D1.2. CHANGE MASTER TO MASTER_AUTO_POSITION = 0, MASTER_LOG_FILE = , MASTER_LOG_POS = ; D1.3. START SLAVE; (If a future version of the server supports replication from multiple masters, step 1.2 must be performed once for each replication channel.) D2. On each server, execute: SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE. D3. On each server, execute: SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE. D4. On each server, wait until the variable @@GLOBAL.GTID_OWNED is equal to the empty string. This can be checked using: SELECT @@GLOBAL.GTID_OWNED; On a replication slave, it is theoretically possible that this is empty and then nonempty again. This is ok; it suffices that it is empty once. D5. Wait for all transactions that currently exist in any binary log to replicate to all slaves. Use the same method as in U7 of the procedure for turning on GTIDs. D6. On each server, execute: SET @@GLOBAL.GTID_MODE = OFF; D7. On each server, set gtid-mode=OFF in my.cnf. D8. If you want to set ENFORCE_GTID_CONSISTENCY = OFF, you can do so now. D9. If you want to downgrade to an earlier version of MySQL, you can do so now, using the normal downgrade procedure. 2.4. COMBINATIONS OF MASTER GTID_MODE, SLAVE GTID_MODE, AND AUTO_POSITION ------------------------------------------------------------------------- As exemplified in section 2.1, the only rule that works in all topologies is that master and slave differ by at most one step. However, in order to satisfy IR6, it is necessary to relax this restriction and allow master and slave to differ by more than one step: - Suppose a future slave is capable of replicating from two masters at the same time, and suppose it has one master with GTID_MODE = OFF and another master with GTID_MODE = ON. - A slave that has GTID_MODE = OFF_PERMISSIVE or ON_PERMISSIVE can accept any GTID_MODE from the master without problems. - A slave with GTID_MODE = OFF_PERMISSIVE or ON_PERMISSIVE can do fail-over, as long as the old and new master have GTID_MODE = ON. - So the only combinations of master GTID_MODE and slave GTID_MODE that must be disallowed are when the slave has GTID_MODE = OFF and the master has GTID_MODE = ON or ON_PERMISSIVE; or when the slave has GTID_MODE = ON and the master has GTID_MODE = OFF or OFF_PERMISSIVE. These settings do not make sense because the slave cannot handle the identifiers of the transactions committed on the master. - The combinations of AUTO_POSITION and GTID_MODE that are necessary to disallow are (1) the slave has AUTO_POSITION = 1 and the master has GTID_MODE != ON; (2) the slave has AUTO_POSITION = 1 and the slave has GTID_MODE == OFF. The following table illustrates the allowed combinations. Master GTID_MODE OFF OFF_PERMISSIVE ON_PERMISSIVE ON Slave GTID_MODE OFF Y Y N N OFF_PERMISSIVE Y Y Y Y* ON_PERMISSIVE Y Y Y Y* ON N N Y Y* N - Slave thread will stop with an error instead of connect Y - GTID_MODEs are compatible * - AUTO_POSITION can be used Second, notice that there is a use case when this additional flexibility is needed: - Suppose that a future slave is connecting to two masters. One of the masters has GTID_MODE = ON and the other has GTID_MODE = OFF. This may happen e.g. if the masters are owned by two different DBAs (maybe even different organizations) and the slave DBA has no influence over them. Still the slave DBA may need to aggregate the data from the two masters. - In this case, the slave must run with either GTID_MODE = OFF_PERMISSIVE or GTID_MODE = ON_PERMISSIVE. So a slave that has one of these two modes must be compatible with a master that has GTID_MODE = ON or OFF, i.e., with all modes. - The slave DBA may want to perform a fail-over for the channel connected to the GTID_MODE = ON master, to another GTID_MODE = ON master. So AUTO_POSITION = 1 should be allowed as long as the slave has GTID_MODE != OFF, and the slave SQL thread should generate an error if master has GTID_MODE != ON. Note: When CHANGE MASTER TO MASTER_AUTO_POSITION = 1 is executed, the slave is not connected to the master, so the only check we can do at that time is to give an error if GTID_MODE = OFF. Then, we may have an additional point of error generation in the IO thread connect code, if the master is using GTID_MODE = ON. 2.5. COMBINATIONS OF GTID_MODE AND GTID_NEXT -------------------------------------------- The following table shows the behavior of the server for the different values of GTID_MODE and GTID_NEXT. This summarizes the discussion above. GTID_NEXT AUTOMATIC AUTOMATIC ANONYMOUS UUID:NUMBER binlog on binlog off GTID_MODE OFF anonymous anonymous anonymous error OFF_PERMISSIVE anonymous anonymous anonymous UUID:NUMBER ON_PERMISSIVE new GTID anonymous anonymous UUID:NUMBER ON new GTID anonymous error UUID:NUMBER Legend: anonymous - Generate an anonymous transaction. error - Generate an error and fail to execute 'SET GTID_NEXT'. UUID:NUMBER - Generate a GTID with the specified UUID:NUMBER. new GTID - Generate a GTID with an automatically generated number. Note: When the binary log is off and GTID_NEXT = 'AUTOMATIC', then no GTID is generated. This is consistent with how the server works now. 2.6. ENFORCE_GTID_CONSISTENCY ----------------------------- ENFORCE_GTID_CONSISTENCY shall be changed to be a dynamic variable. The variable shall be global and settable only by SUPER in a top-level statement outside a transaction. The default shall remain OFF. The variable shall increase its range to values 0, 1, 2, with the following symbolic names: 0 = OFF All transactions are allowed to violate GTID consistency. 1 = ON No transaction is allowed to violate GTID consistency. 2 = WARN All transactions are allowed to violate GTID consistency, but a warning is generated in this case. The WARN value is useful in order to pre-check the workload before turning this variable to ON. If this was not possible, there would be a risk for downtime when switching the value to ON and a lot of errors would be generated. When GTID_MODE = ON, only ENFORCE_GTID_CONSISTENCY = ON is allowed. When GTID_MODE != ON, all values are allowed. However, even if ENFORCE_GTID_CONSISTENCY != ON, the check is enforced in the following cases: - For transactions that use GTID_NEXT = 'UUID:NUMBER', an error is generated if the transaction violates GTID consistency, regardless of the value of ENFORCE_GTID_CONSISTENCY. - When GTID_MODE = ON or ON_PERMISSIVE, for transactions that use GTID_NEXT = 'AUTOMATIC', an error is generated if the transaction violates GTID consistency, regardless of the value of ENFORCE_GTID_CONSISTENCY. Notice that this logic ensures that IR7 is satisfied, i.e., a slave that uses GTID_MODE = ON_PERMISSIVE or OFF_PERMISSIVE can have two masters, one that uses GTID_MODE = OFF and another that uses GTID_MODE = ON. Such a slave can use ENFORCE_GTID_CONSISTENCY = OFF or WARN. Then, transactions coming from the GTID_MODE = ON master are still subject to the consistency checks (since such transactions use GTID_NEXT = 'UUID:NUMBER'), while transactions coming from the GTID_MODE = OFF master are accepted even if they violate GTID consistency. The following error conditions are checked: - An error shall be generated if GTID_MODE = ON and the user executes SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = OFF or WARN. - An error shall be generated if GTID_MODE = OFF_PERMISSIVE and there is an ongoing transaction that uses GTID_NEXT = 'AUTOMATIC' and violates GTID consistency, and the user executes SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE. See subsection 2. - An error shall be generated if ENFORCE_GTID_CONSISTENCY is changed from OFF or WARN to ON and there are ongoing transactions that violate GTID consistency. See also subsection 2.8.1. - A warning shall be generated if ENFORCE_GTID_CONSISTENCY is changed from OFF to WARN and there are ongoing transactions that violate GTID consistency. 2.7. DETECTING ANONYMOUS TRANSACTIONS ------------------------------------- This is a clarification of existing behavior; all the following is implemented in the server prior to this worklog. There are two components that generate SQL statements from binary logs or relay logs: the slave applier thread and mysqlbinlog. Both these must take care to detect which transactions are anonymous and which are GTID-transactions, and they must set GTID_NEXT accordingly. In binary logs or relay logs that originate from a server where the present worklog is implemented, this is detected only based on the type of event that precedes the transaction: either a Gtid_log_event or an Anonymous_gtid_log_event. In binary logs or relay logs that originate from an old server, this is detected by noticing that the file contains transactions without any Gtid_log_event. The implementation is as follows: - Slave thread: When it applies a Format_description_log_event that originates from a master, it sets THD.variables.gtid_next.type to a special value, NOT_YET_DETERMINED_GROUP. This will be converted to a correct value later; there are two cases: 1. If a Gtid_log_event or Anonymous_gtid_log_event appears, then that event will set THD.variables.gtid_next.type accordingly. 2. If no Gtid_log_event or Anonymous_gtid_log_event appears, then the next time an SQL statement is executed, it will set THD.variables.gtid_next.type to ANONYMOUS_GROUP. This is done in gtid_pre_statement_checks, which is called from mysql_parse for SQL statements and from Rows_log_event::do_apply_event for row events. - mysqlbinlog: When it reads a binary log, it will output a BINLOG statement containing a base64-encoding of the initial Format_description_log_event. When a client replays this, the Format_description_log_event does the same as it does in the slave thread, i.e., sets THD.variables.gtid_next.type to NOT_YET_DETERMINED_GROUP. After that it works exactly as in the case for the slave thread above. 2.8. ONGOING TRANSACTIONS AND SERVER RESTARTS --------------------------------------------- In section 2.1, we mentioned that server restart is required when changing from GTID_MODE = ON_PERMISSIVE to ON. The reason for this is that we need to handle ongoing transactions correctly. In this section we explain how ongoing transactions are handled by all the steps. 2.8.1. ENFORCE_GTID_CONSISTENCY: OFF -> ON or WARN -> ON: When ENFORCE_GTID_CONSISTENCY = ON, the server is not allowed to execute any transaction that violates GTID consistency. However, ENFORCE_GTID_CONSISTENCY is checked at transaction start. So if we allow ongoing transactions while changing from ENFORCE_GTID_CONSISTENCY = OFF or WARN to ON, it is possible to have the following erroneous execution: 1. trx1 violates GTID consistency. 2. trx1 passes the ENFORCE_GTID_CONSISTENCY check because ENFORCE_GTID_CONSISTENCY = OFF. 3. Another client executes SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = OFF. 4. trx1 commits. One solution to this problem would be to require a server restart in order to change ENFORCE_GTID_CONSISTENCY to ON, since that definitely ensures that there are no ongoing transactions. However, in order to reduce the number of restarts, we use a different method. We maintain a counter of the number of transactions that violate GTID-consistency. When a violating transaction begins to execute, the counter is incremented, and when a violating transaction ends, the counter is decremented. The statement SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON is only allowed when the counter is zero. There is a cost for maintaining a global counter, even if we use lock-free atomic integer operations. However, we estimate that the cost is not significant since it only affects those transactions that violate GTID consistency. 2.8.2. ENFORCE_GTID_CONSISTENCY: ON -> OFF, ON -> WARN, or WARN -> OFF: These transitions are not problematic for ongoing transactions. Since we go from more restrictive to more permissive modes, any transaction that was ongoing before the SET statemement will be allowed after the statement as well. 2.8.3. ENFORCE_GTID_CONSISTENCY: OFF -> WARN: Here we go from a more permissive to a more restrictive mode. Consider the following execution: 1. trx1 violates GTID consistency, but no warning is generated since ENFORCE_GTID_CONSISTENCY = OFF when the transaction begins to execute. 2. Another client executes: SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN 3. trx1 commits Then, trx1 has committed when ENFORCE_GTID_CONSISTENCY = WARN, without generating any warning. To ensure that there is some warning, we generate a warning for the statement SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN, if there is any ongoing transaction that violates GTID consistency. 2.8.4. GTID_MODE: OFF -> OFF_PERMISSIVE: This transition is not problematic in 5.7. Since we go from a more restricted to a more permissive mode, any transaction that was ongoing before the SET statement will be allowed after the SET statement too. In 5.6, this would have been problematic. The reason is: - Gtid transactions always have a Gtid_log_event. - Anonymous transactions have an Anonymous_gtid_log_event. - There is one exception to the above rules: in 5.6, anonymous transactions do not have any event when GTID_MODE = OFF. - In 5.6, the Gtid event is allocated at the beginning of the transaction, but written only at the end of the transaction. Hence, in 5.6, if we would allow ongoing transactions while changing GTID_MODE from OFF to OFF_PERMISSIVE, the following erroneous execution would be possible: 1. trx1 begins to execute. Since GTID_MODE = OFF, it does not allocate any event. 2. Another client executes SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE. 3. trx1 commits. Then it tries to write the Gtid_log_event to unallocated space. This is not a problem in 5.7, because the allocation of the Gtid/Anonymous event was moved to the end of the transaction (this was a big refactoring). 2.8.5. GTID_MODE: OFF_PERMISSIVE -> ON_PERMISSIVE: - Both anonymous transactions and GTID-transactions are allowed in both modes, so ongoing replicated transactions (which execute with GTID_NEXT = 'ANONYMOUS' or GTID_NEXT = 'UUID:NUMBER') can commit without problem. - The GTID of a new transaction (which executes with GTID_NEXT = 'AUTOMATIC') is generated when the transaction prepares (if the binary log is diabled) or flushes (if the binary log is enabled). So it is possible to have the following execution: 1. trx1 prepares or flushes, and is determined to be 'ANONYMOUS' since GTID_MODE = OFF_PERMISSIVE. 2. Another client executes SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE. 3. trx1 commits. Then, trx1 commits as an anonymous transaction even if GTID_MODE = ON_PERMISSIVE at the time of the commit. This could cause problems in the upgrade procedure in case step U7.2.1 of the procedure is performed before trx1 has been fully flushed to the binary log. To prevent this from happening, we have introduced step U6. - GTID_MODE = ON_PERMISSIVE is more restrictive than GTID_MODE = OFF_PERMISSIVE for transactions that use GTID_NEXT = 'AUTOMATIC' and violate GTID consistency (cf. subsection 2.6). Spefically, such transactions must generate an error when GTID_MODE = ON_PERMISSIVE but not when GTID_MODE = OFF_PERMISSIVE. Thus, if there is any ongoing transaction that uses GTID_NEXT = 'AUTOMATIC' and violates GTID consistency, SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE shall generate an error. 2.8.6. GTID_MODE: ON_PERMISSIVE -> ON: When GTID_MODE = ON, all transactions must be GTID-transactions; anonymous transactions are disallowed. Transactions can be set to anonymous before they begin to execute, using SET @@SESSION.GTID_NEXT = 'ANONYMOUS'. So if we allow ongoing transactions while changing GTID_MODE = ON_PERMISSIVE to ON, we can have the following erroneous execution: 1. trx1 sets GTID_NEXT = 'ANONYMOUS'. This is allowed because GTID_MODE = ON_PERMISSIVE. 2. Another client executes SET @@GLOBAL.GTID_MODE = ON. 3. trx1 commits. One solution to this problem would be to require a server restart in order to change GTID_MODE to ON, since that definitely ensures that there are no ongoing transactions. However, in order to reduce the number of restarts, we use a different method, similar to that we use to allow online SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON. We maintain a counter of the number of ongoing transactions that are anonymous. When an anonymous transaction starts to execute, the counter is incremented, and when it ends, the counter is decremented. The statement SET @@GLOBAL.GTID_MODE = ON fails with an error if the counter is not zero. The counter is exposed as the status variable ANONYMOUS_TRANSACTION_COUNT so that the user can know when it is allowed to set GTID_MODE = ON. (In fact, this status variable should be zero already at step U6 of the upgrade procedure; see section 2.2.) Maintaining a global counter has a cost, as it requries either a global lock or an atomic integer operation. This has a performance impact on all transactions, even on a server that has GTID_MODE = OFF and does not plan to ever turn GTID_MODE to ON. We shall do a performance test to see if this is a significant problem. If it is, we can remove the counter and require a server restart instead. (In contrast, the counter used for ENFORCE_GTID_CONSISTENCY is only incremented for the small set of unusual statements/transactions that violate GTID consistency; we consider the cost acceptable for such cases.) This would change steps U6 and U7 of the upgrade procedure: see Appendix A. 2.8.7. GTID_MODE: ON -> ON_PERMISSIVE: This transition does not cause any problems for ongoing transactions, since ON is more restrictive than ON_PERMISSIVE. 2.8.8. GTID_MODE: ON_PERMISSIVE -> OFF_PERMISSIVE: The considerations for this step are similar to those in step 2.8.5: - Both anonymous transactions and GTID-transactions are allowed in both modes, so ongoing replicated transactions (which execute with GTID_NEXT = 'ANONYMOUS' or GTID_NEXT = 'UUID:NUMBER') can commit without problem. - The GTID of a new transaction (which executes with GTID_NEXT = 'AUTOMATIC') is generated when the transaction prepares (if the binary log is diabled) or flushes (if the binary log is enabled). So it is possible to have the following execution: 1. trx1 prepares or flushes, and is determined to be 'UUID:NUMBER' since GTID_MODE = ON_PERMISSIVE. 2. Another client executes SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE. 3. trx1 commits. Then, trx1 commits as a GTID-transaction even if GTID_MODE = OFF_PERMISSIVE at the time of the commit. This could cause problems in the downgrade procedure in case step U7.2.1 of the procedure (part of D5) is performed before trx1 has been fully flushed to the binary log. To prevent this from happening, we have introduced step D4. - ON_PERMISSIVE is more restrictive for transactions that violate GTID consistency compared to OFF_PERMISSIVE (cf subsections 2.6 and 2.8.6). Therefore, GTID consistency does not cause any trouble for this transition. 2.8.9. GTID_MODE: OFF_PERMISSIVE -> OFF: When GTID_MODE = OFF, all transactions that commit must be anonymous; GTID-transactions are disallowed. Transactions can be set to GTID-transactions before they begin to execute. If we would allow GTID-transactions to execute while changing GTID_MODE to OFF, we could have the following erroneous execution: 1. trx1 sets GTID_NEXT = 'UUID:NUMBER'. This is allowed because GTID_MODE = OFF_PERMISSIVE. 2. Another client executes SET @@GLOBAL.GTID_MODE = OFF. 3. trx1 commits. Luckily, it is easy to detect if there are any ongoing GTID-transactions: this can be checked using the function gtid_state->owned_gtids->is_empty(). So we disallow setting GTID_MODE = OFF when this function returns false. In 5.6, this transition would have been problematic for the same reason as in 2.8.4: an ongoing transaction would have allocated space for a Gtid_log_event, but since it commits when GTID_MODE = OFF it would never write to the allocated memory, and eventually the uninitialized memory would get written to the binary log. In 5.7 this problem does not exist since a Gtid_log_event is generated unconditionally at the end of the transaction. 2.9. BINLOG_CONFIGURATION_LOG_EVENT ----------------------------------- In order to perform some of the safety checks, we need to know what value of GTID_MODE was in effect when the binary log was generated. To make this possible, we introduce a new event type: Binlog_configuration_log_event. The Binlog_configuration_log_event is stored in the beginning of the binary log, just after the Format_description_log_event and Previous_gtids_log_event. It shall contain a single field: GTID_MODE, with value 0 (OFF), 1 (OFF_PERMISSIVE), 2 (ON_PERMISSIVE), or 3 (ON). When GTID_MODE is changed, the binary log shall be rotated, so that the Binlog_configuration_log_event correctly matches the GTID_MODE. Binlog_configuration_log_event shall have the LOG_EVENT_IGNORABLE_F flag set. It shall be replicated to the slave as any other event. The member function do_apply_event shall do nothing. When the relay log is generated, Binlog_configuration_log_event shall be generated. It shall be generated just after the Previous_gtids_log_event. When AUTO_POSITION is enabled, the master send thread and the slave receive thread generate errors if a Binlog_configuration_log_event is found which has GTID_MODE != ON. 2.10. ANONYMOUS_TRANSACTION_COUNT --------------------------------- In step U6 of the upgrade procedure, user has to wait for possible ongoing anonymous transactions to commit, so that they are not missed in the synchronization step U7. In order to know when this is done, we introduce the status variable ANONYMOUS_TRANSACTION_COUNT. This will be equal to the number of ongoing transactions for which it has been decided that they must be anonymous. For transactions coming from a user session, it is decided at the time of transaction prepare whether the transaction is going to be anonymous or have a GTID. The decision is based on GTID_MODE. Therefore, it is possible that there are N ongoing transactions, all of which will eventually be committed as anonymous transactions, but at the same time ANONYMOUS_TRANSACTION_COUNT < N since the decision to make the transactions anonymous has not yet been taken. 3. COMPATIBILITY CHECKS ======================= There are a number of places where the server needs to check compatibility of GTID_MODE, GTID_NEXT, AUTO_POSITION, ENFORCE_GTID_CONSISTENCY, GTID_MODE of a master, GTID_MODE of a slave, and GTIDs of running transactions. 3.1. CHECKS PERFORMED WHEN SETTING GTID_MODE -------------------------------------------- When user changes GTID_MODE, the following compatibility checks are possible to implement: C1.1. GTID_MODE must only change one step. Rationale: It would conceivably be possible to allow changing directly from OFF to ON_PERMISSIVE and from ON to OFF_PERMISSIVE. However, this would not have any significant advantage since: 1. It is not needed in the recommended procedure. 2. The workaround is obvious (use the intermediate step). Moreover, enabling this would also have significant drawbacks: 1. It is more uniform and easy to understand that a variable can change one step at a time, rather than one step in some cases and one or two steps in other cases. 2. If we allow two steps, it is easier for the user to make a mistake in the upgrade or downgrade procedure. 3. The analysis of changing just one step at a time is complex as it is (cf section 2.8). Allowing more than one step at a time would imply even more case analysis, would be harder to maintain, etc. Error message: "The value of GTID_MODE can only change one step at a time: OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON. Also note that this value must be stepped up or down simultaneously on all servers. See the Manual for instructions." C1.2. SET @@GLOBAL.GTID_MODE = ON is not allowed when there are ongoing anonymous transactions. See subsection 2.8.6. Error message: "SET GTID_MODE = ON is not allowed when there are ongoing, anonymous transactions. Before setting GTID_MODE = ON, wait until SHOW STATUS LIKE 'ANONYMOUS_TRANSACTION_COUNT' shows zero on all servers. Then wait for all existing, anonymous transactions to replicate to all slaves, and then execute SET @@GLOBAL.GTID_MODE = ON on all servers. See the Manual for details." C1.3. SET @@GLOBAL.GTID_MODE = OFF is not allowed if there are ongoing GTID-transactions. That is, generate an error if @@GLOBAL.GTID_OWNED != ''. Error message: "SET GTID_MODE = OFF is not allowed when there are ongoing transactions that have a GTID. Before you set GTID_MODE = OFF, wait until SELECT @@GLOBAL.OWNED_GTIDS is empty on all servers. Then wait for all GTID-transactions to replicate to all servers, and then execute SET @@GLOBAL.GTID_MODE = OFF on all servers. See the Manual for details." C1.4. When GTID_MODE changes from OFF_PERMISSIVE to ON_PERMISSIVE, and there is any ongoing transaction that uses GTID_MODE = 'AUTOMATIC' and violates GTID consistency, an error shall be generated. Note: this means that users have to adjust their workload to be GTID-consistent before setting the option. Error message: "SET GTID_MODE = ON_PERMISSIVE is not allowed when there are ongoing transactions that use GTID_NEXT = 'AUTOMATIC', which violate GTID consistency. Make sure to adjust your workload to be GTID-consistent before setting GTID_MODE = ON_PERMISSIVE. See the Manual for @@GLOBAL.ENFORCE_GTID_CONSISTENCY for details." C1.5. The AUTO_POSITION mode of every replication channel must be compatible with the new GTID_MODE. I.e., if AUTO_POSITION = 1 and the new GTID_MODE is OFF, then generate an error. Error message: "SET GTID_MODE = OFF is not allowed since replication channel '%.192s' is configured in AUTO_POSITION mode. Execute CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL '%.192s' before you set GTID_MODE = OFF." C1.6. SQL_SLAVE_SKIP_COUNTER must be 0, since SQL_SLAVE_SKIP_COUNTER=1 is not allowed when GTID_MODE=ON (see section 4.2). Error message: "SET GTID_MODE = ON is only allowed when SQL_SLAVE_SKIP_COUNTER = 0." * CHECKS NOT PERFORMED WHEN SETTING GTID_MODE Some checks which seem desirable to perform when setting GTID_MODE are too difficult to implement and not strictly necessary. In particular, the folling checks will NOT be implemented: C1.7. Unprocessed transactions in the relay log must be compatible with the new GTID_MODE. The check could use one or both of the following methods: - Read the Binlog_configuration_log_event of all unprocessed relay logs. - Let the receiver thread store the position of the last received anonymous transaction, and let the applier thread store the position of the last committed anonymous transaction. If the former is greater than the latter, generate an error. Both these checks are a little bit complex, and the error will be detected by the applier thread anyways (see C8.1 and C8.2). C1.8. The GTID_MODE of connected masters must be compatible with the new GTID_MODE. Since the upgrade procedure is online, masters will change GTID_MODE without the slave reconnecting. There is currently no way for slaves to read master configuration at other points than reconnect time. In any case, the receiver thread will stop once it receives transactions generated by the master in the incompatible mode (see C7.1 - C7.6). C1.9. The GTID_MODE of connected slaves must be compatible with the new GTID_MODE. Since the upgrade procedure is online, slaves will change GTID_MODE without the master knowing about it. There is currently no way for masters to read slave configuration other than what the slave specifies at reconnect time. In any case, the slave's receiver thread will check compatibility with the master's GTID_MODE (see C7.1 - C7.6), so the error will be detected by the slave when it receives transactions generated using the incompatible mode. C1.10.The AUTO_POSITION of connected slaves must be compatible with the new GTID_MODE. I.e., if GTID_MODE = ON and there is some slave running with AUTO_POSITION = 1, then SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE shall generate an error. This is hard to implement because it requires iterating over all connected send threads. In any case, the sender and receiver threads will check compatibility with AUTO_POSITION (see C6.1, C6.4, C7.1, and C7.4). 3.2. CHECKS PERFORMED WHEN SETTING AUTO_POSITION ------------------------------------------------ When the user executes CHANGE MASTER TO MASTER_AUTO_POSITION = 1, the following check must be performed: C2.1. If GTID_MODE == OFF, an error is generated and the CHANGE MASTER TO command fails. Error message: "CHANGE MASTER TO MASTER_AUTO_POSITION = 1 cannot be executed because GTID_MODE = OFF." This check is already performed by the server (but with a slightly different error message). 3.3. CHECKS PERFORMED WHEN SETTING GTID_NEXT -------------------------------------------- The checks performed when setting GTID_NEXT are already in place, so there is nothing to change. (The checks are: generate error when setting GTID_NEXT = 'ANONYMOUS' and GTID_MODE = ON, and generate error when setting GTID_NEXT = 'UUID:NUMBER' and GTID_MODE = OFF.) 3.4. CHECKS PERFORMED BY SLAVE WHEN CONNECTING TO A MASTER ---------------------------------------------------------- In the master-slave handshake, the slave shall read the master's GTID_MODE and perform the following checks: (this is mostly a summary of section 2.4) C4.1. If slave has GTID_MODE = OFF and master has GTID_MODE = ON_PERMISSIVE or ON, the slave receiver thread shall generate an error and stop. Error message: "The replication receiver thread cannot start because the master has GTID_MODE = %.192s and this server has GTID_MODE = %.192s." This message already exists in the server. C4.2. If slave has GTID_MODE = ON and master has GTID_MODE = OFF_PERMISSIVE or OFF, the slave receiver thread shall generate an error and stop. Error message: Same as for C4.1. C4.3. If slave is using the AUTO_POSITION protocol and master does not have GTID_MODE = ON, the slave receiver thread shall generate an error and stop. Error message: "The replication receiver thread cannot start in AUTO_POSITION mode: the master has GTID_MODE = %.192s instead of ON." This error already exists in the server. C4.4. If slave has GTID_MODE = OFF and AUTO_POSITION = 1, the slave IO thread shall generate an error and stop. This cannot normally happen because of the checks performed when setting GTID_MODE = OFF and AUTO_POSITION = 1. However, it can happen if user changes GTID_MODE from ON to OFF in the configuration file when the server is offline and then starts the server with --force-gtid-mode-on-startup. Error message: "The replication receiver thread cannot start in AUTO_POSITION mode: this server uses GTID_MODE = OFF." 3.5. CHECKS PERFORMED BY MASTER WHEN A SLAVE CONNECTS ----------------------------------------------------- In the master-slave handshake, the master performs the following checks: C5.1. When a server connects as slave using the AUTO_POSITION protocol, and the master does not have GTID_MODE = ON, the master shall generate an error and stop the send thread. This may seem redundant because we have C4.3, but it is not, because there is a race: the master may change GTID_MODE after the slave has checked it and before the slave connects. To avoid surprises, the master should take a lock to prevent changing GTID_MODE, then perform the check, then perform its initialization, and then release the lock. Error message: "The replication sender thread cannot start in AUTO_POSITION mode: this server has GTID_MODE = %.192s instead of ON." The check already exists in the server, with a typo in the error message. C5.2. When a server connects as slave using the AUTO_POSITION protocol, and the master finds that the slave is missing transactions that were generated when the master was using GTID_MODE != ON (as detected by reading the Binlog_configuration_log_event), the master shall generate an error and stop the send thread. Error message: "The replication sender thread cannot start in AUTO_POSITION mode: the binary log file '%.256s' contains GTIDs that are missing on slave, and which were generated using GTID_MODE = %.192s instead of ON." C5.3. When a server connects as slave using the AUTO_POSITION protocol, and the master finds that the last transaction that is *not* to be sent is anonymous, then the master shall generate an error and stop the send thread. This prevents loss of transactions in the following case: 1. Master and slave are in sync and both use GTID_MODE = ON and auto_position protocol. 2. Slave threads stop. 3. Master server changes to GTID_MODE = OFF or OFF_PERMISSIVE. 4. Master server generates some transactions. 5. Master server restarts and sets GTID_MODE = ON again. 6. Slave threads start. If the check was not there, the anonymous transactions would be silently skipped. Error message: "The replication sender thread cannot start in AUTO_POSITION mode: the first transaction to send is preceded by an anonymous transaction. Replicate at least one GTID-transaction to the slave before you enable AUTO_POSITION." This step implies that step 9.1 is needed in the upgrade procedure. If user forgets step 9.1, the only thing that will happen is that you get an error in the applier thread after step 9.4, and you just have to redo steps 9.1-9.4. There is no risk for data loss, only a minor inconvenience. 3.6. CHECKS PERFORMED BY A RUNNING SEND THREAD ON MASTER -------------------------------------------------------- The master does not know the slave's GTID_MODE, as it may legally change during the online procedure for turning on or off GTIDs. Therefore, the master's send thread shall not perform any checks for GTID_MODE. When using the auto-positioning protocol, the send thread should perform the following checks: C6.1. If AUTO_POSITION = 1 and the send thread reads a Binlog_configuration_log_event that contains GTID_MODE != ON, an error is generated and the send thread is stopped. Error message: "Cannot replicate binary log generated with GTID_MODE = %.192s when AUTO_POSITION is enabled, at file %.512s, position %lld." C6.2. If GTID_MODE = ON and the send thread reads a Binlog_configuration_log_event with GTID_MODE = OFF, an error is generated and the send thread is stopped. Error message: "Cannot replicate binary log generated with GTID_MODE = %.192s when GTID_MODE = %.192s, at file %.512s, position %lld." C6.3. If GTID_MODE = OFF and the send thread reads a Binlog_configuration_log_event with GTID_MODE = ON, an error is generated and the send thread is stopped. Error message: same as C6.2. C6.4. If AUTO_POSITION = 1 and the send thread reads an Anonymous_gtid_log_event, an error is generated and the send thread is stopped. Error message: "Cannot replicate anonymous transaction when AUTO_POSITION = 1, at file %.512s, position %lld." C6.5. If GTID_MODE = ON and the send thread reads an Anonymous_gtid_log_event, an error is generated and the send thread is stopped. Error message: "Cannot replicate anonymous transaction when GTID_MODE = ON, at file %.512s, position %lld." C6.6. If GTID_MODE = OFF and the send thread reads a Gtid_log_event, an error is generated and the send thread is stopped. Error message: "Cannot replicate GTID-transaction when GTID_MODE = OFF, at file %.512s, position %lld." 3.7. CHECKS PERFORMED BY A RUNNING RECEIVE THREAD ON SLAVE ---------------------------------------------------------- The slave must not receive transactions that are incompatible with the current GTID_MODE. Therefore, the slave receive thread shall implement the following checks: C7.1. If AUTO_POSITION = 1 and the receive thread receives a Binlog_configuration_log_event that contains GTID_MODE != ON, the receive thread shall stop with an error. Error message: same as for C6.1. C7.2. If the server uses GTID_MODE = ON and the receive thread receives a Binlog_configuration_log_event that contains GTID_MODE = OFF_PERMISSIVE or OFF, then the receive thread shall stop with an error. Error message: Same as for C6.2 C7.3. If the server uses GTID_MODE = OFF and the receive thread receives a Binlog_configuration_log_event that contains GTID_MODE = ON_PERMISSIVE or ON, then the receive thread shall stop with an error. Error message: Same as for C6.3. C7.4. If AUTO_POSITION = 1 and the receive thread receives an Anonymous_gtid_log_event, the receive thread shall stop with an error. Error message: same as for C6.4. C7.5. If GTID_MODE = ON and the receive thread receives an Anonymous_gtid_log_event, the receive thread shall stop with an error. Error message: same as for C6.5. C7.6. If GTID_MODE = OFF and the receive thread receives a Gtid_log_event, the receive thread shall stop with an error. Error message: same as for C6.6. The check already exists, but with a slightly different error message. 3.8. CHECKS PERFORMED BY A RUNNING APPLIER THREAD ------------------------------------------------- The applier threads automatically performs the checks imposed by setting GTID_NEXT according to the events it read from the relay log (see 3.3). The checks for setting GTID_NEXT are done already in MySQL 5.6. In addition, the following checks are performed: C8.1. If the server is using GTID_MODE = OFF and the applier thread reads a Binlog_configuration_log_event that contains GTID_MODE = ON_PERMISSIVE or ON, then the applier thread shall stop with an error. Error message: Same as for C6.2. C8.2. If the server is using GTID_MODE = ON and the applier thread reads a Binlog_configuration_log_event that contains GTID_MODE = OFF_PERMISSIVE or OFF, then the applier thread shall stop with an error. Error message: Same as for C6.2. 3.9. CHECKS PERFORMED BY SERVER STARTUP --------------------------------------- C9.1. If server starts with GTID_MODE = OFF and the replication connection has AUTO_POSITION = 1, then a warning shall be generated. The channel will still have AUTO_POSITION = 1, which will later cause an error when starting the slave receiver thread (see C5.1). Warning message: "Detected misconfiguration: replication channel '%.192s' was configured with AUTO_POSITION = 1, but the server was started with --gtid-mode=off. Either reconfigure replication using CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL '%.192s', or change GTID_MODE to some value other than OFF, before starting the slave receiver thread." C9.2. It is easy for the user to forget updating my.cnf after performing the online upgrade procedure. Changing the GTID_MODE in a server restart is dangerous and can lead to the DBA having to temporarily downgrade to AUTO_POSITION = 0 and ON_PERMISSIVE on all servers in order to process the anonymous transactions. This is unwanted and presents a risk since automatic fail-over will not be allowed in the meantime. To prevent against such mistakes, the following check shall be performed at server statsup: If the Binlog_configuration_log_event of the last binary log contains 'GTID_MODE = ON', and the server starts with a GTID_MODE other than ON, the server shall generate an error and fail to start. Error message: "Cannot start the server because the server was last running with GTID_MODE = ON and is now being started with --gtid-mode=%.192s. Unintentionally starting in the wrong GTID_MODE can be harmful. If you are intentionally changing the GTID_MODE, suppress the check using --force-gtid-mode-on-startup." C9.3. A command-line option shall be provided so that the user can circumvent C9.2 and start the server anyways: --force-gtid-mode-on-startup If this option is enabled, and GTID_MODE is different from the GTID_MODE of the last binary log, then only a warning is generated instead of an error. Warning message: "The server was last using GTID_MODE = ON and is now being started with --gtid-mode=%.192s. This is allowed because --force-gtid-mode-on-startup is used." C9.4. If the server starts with GTID_MODE = ON and ENFORCE_GTID_CONSISTENCY and GTID_MODE != ON, an error shall be generated. Error message: "GTID_MODE = ON requires ENFORCE_GTID_CONSISTENCY = ON." 3.10 CHECKS PERFORMED WHEN SETTING ENFORCE_GTID_CONSISTENCY ----------------------------------------------------------- C10.1. When GTID_MODE = ON, and the user tries to change ENFORCE_GTID_CONSISTENCY to OFF or WARN, an error shall be generated. When GTID_MODE = ON, and the user tries to change ENFORCE_GTID_CONSISTENCY to OFF or WARN, the statement shall fail and an error shall be generated. The error message shall be the same as in C9.4. C10.2. When ENFORCE_GTID_CONSISTENCY is changed from OFF or WARN to ON, and there is any ongoing transaction that violates GTID consistency, the statement shall fail and an error shall be generated. Error message: "Cannot set ENFORCE_GTID_CONSISTENCY = ON because there are ongoing transactions that violate GTID consistency." C10.3. When ENFORCE_GTID_CONSISTENCY is changed from OFF to WARN, and there are ongoing transactions that violate GTID consistency, the statement shall generate a warning. Warning message: "There are ongoing transactions that violate GTID consistency." 3.11 CHECKS PERFORMED WHEN STARTING AN APPLIER THREAD ----------------------------------------------------- No compatibility checks will be performed when starting an applier thread. We could conceivably check that the Binlog_configuration_log_events of all unprocessed relay logs are compatible with the current GTID_MODE, but this will be detected when the thread reaches the relevant relay log in any case. 4. OTHER GTID FEATURES ====================== 4.1. WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS -------------------------------------- WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS should be possible to execute whenever GTID_MODE != OFF. Rationale: If multi-source is implemented, a slave can use multi-source to aggregate data from two masters. If one master has GTID_MODE = ON and the other has GTID_MODE = OFF, the slave must have GTID_MODE = ON_PERMISSIVE or OFF_PERMISSIVE. Still the slave may want to wait for a given GTID set from the master that uses GTID_MODE = ON. There is nothing to change: the server already works this way. In case similar functions are implemented, e.g., to wait for transactions to be received, then the same restriction shall apply. 4.2. SQL_SLAVE_SKIP_COUNTER --------------------------- SQL_SLAVE_SKIP_COUNTER should be possible to set whenever GTID_MODE != ON. Rationale: It is currently not allowed to set it when GTID_MODE = ON, since the correct and only GTID-safe way to skip transactions is using an empty transaction. However, when GTID_MODE != ON, there can be anonymous transactions which cannot be skipped using empty transactions. There is nothing to change: the server already works this way. 4.3. GTID_EXECUTED, GTID_PURGED, and FIELDS IN SHOW SLAVE STATUS, PERFORMANCE_SCHEMA, ETC ---------------------------------------------------------------- Fields displaying GTID sets (e.g. GTID_EXECUTED, GTID_PURGED, SHOW SLAVE STATUS / RETRIEVED_GTID_SET, PERFORMANCE_SCHEMA.replication_connection_status.RECEIVED_TRANSACTION_SET, etc) should work the same way regardless of GTID_MODE. It is possible that these sets are nonempty even when GTID_MODE = OFF, because GTID_MODE can have been ON earlier. If the server has been off forever and has not executed any GTID transactions, the sets should simply be empty (''). Before this worklog, GTID_EXECUTED and GTID_PURGED are empty if GTID_MODE is OFF and binlog is disabled. This must change so that they are initialized on server startup regardless of GTID_MODE and binlog being enabled. Notation: - Fields displaying a GTID set should contain an empty string if the field is empty. - Fields displaying a single GTID (e.g., PERFORMANCE_SCHEMA.REPLICATION_EXECUTE_STATUS_BY_WORKER / CURRENT_TRANSACTION) should display "ANONYMOUS" if the current transaction is anonymous. 4.4. SET GTID_PURGED -------------------- It should be allowed to execute SET GTID_PURGED regardless of the GTID_MODE, since GTID_EXECUTED and GTID_PURGED are preserved e.g. when going from GTID_MODE = ON to OFF to ON. 4.5. THE GTID_EXECUTED TABLE ---------------------------- The table mysql.gtid_executed was introduced in WL#6559. When a transaction is committed, its GTID is stored in the table mysql.gtid_executed. The table is range-compressed once for every N committed transaction. The range compression is performed by a separate thread. Currently, when GTID_MODE = OFF, the thread is not started at all; otherwise, it is started when the server starts and stopped when the server stops. Since GTID_MODE is now dynamic, we need to change the logic. To make it simple, we start the thread unconditionally when the server starts and stop it unconditionally when the server stops. Since no transactions are committed when GTID_MODE = OFF, the thread will never wake up in this case and thus will not use any CPU. 5. SUMMARY OF USER-VISIBLE CHANGES ---------------------------------- - GTID_MODE is now dynamic. It can be set by SUPER from a top-level statement. - GTID_MODE now takes the following values: - 0 = OFF: Both new and replicated transactions must be anonymous. - 1 = OFF_PERMISSIVE: New transactions are anonymous. Replicated transactions can be either anonymous or GTID-transactions. - 2 = ON_PERMISSIVE: New transactions are GTID-transactions. Replicated transactions can be either anonymous or GTID-transactions. - 3 = ON: Both new and replicated transactions must be GTID-transactions. - GTID_MODE can only be altered one step at a time: OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON - GTID_MODE can not be altered dynamically from ON_PERMISSIVE to ON. This step requires a server restart. - ENFORCE_GTID_CONSISTENCY is now dynamic. It can be set by SUPER from a top-level statement. - ENFORCE_GTID_CONSISTENCY now takes the following values: 0 = OFF All transactions are allowed to violate GTID consistency. 1 = ON No transaction is allowed to violate GTID consistency. 2 = WARN All transactions are allowed to violate GTID consistency, but a warning is generated in this case. Additionally, transactions that use GTID_NEXT = 'UUID:NUMBER' are not allowed to violate GTID consistency, regardless of the value of ENFORCE_GTID_CONSISTENCY. Transactions that use GTID_NEXT = 'AUTOMATIC' are not allowed to violate GTID consistency when GTID_MODE = ON_PERMISSIVE or ON. - GTID_MODE = ON is only allowed when ENFORCE_GTID_CONSISTENCY = ON - The binary log contains a new type of event type: Binlog_configuration_log_event. - The existing binary log event Previous_gtids_log_event has been extended with one more field. - A new command line option --force-gtid-mode-on-startup has been introduced. - The status variable ANONYMOUS_TRANSACTION_COUNT has been introduced. This shows the number of transactions for which it has been determined that they will be anonymous. ==== APPENDIX A: alternative that requires one server restart ==== The above algorithm requires every transaction to increase and decrease an atomic counter. This is needed in order to allow ON_PERMISSIVE -> ON without restarting the server. If the overhead imposed by the atomic operations is deemed unacceptable, we could remove them and require a server restart for the ON_PERMISSIVE -> ON step. This means that steps 6 and 7 of the upgrade procedure become more complex: U6'.Wait for any anonymous transactions that may be still be executing to commit. This cannot be checked with 100% certainty. However, transactions are only assigned their anonymity a very short time before they get committed (normally a fraction of a second). Therefore, you can wait a minute to be safe. U8'.Restart each server with gtid-mode=ON. If any servers older than 5.7.X were switched off in step 4.1, then they can be switched on now as well. It does not matter which server executes this step first. When performing this step on a master, typically a switch-over will be needed. In a tree topology, this can be done as follows: U8.1. Ensure the slaves are not lagging too much behind the master. U8.2. Stop updates on the master. U8.3. Wait until some slave is up to date with the master; we call this slave the stand-in. Record the binlog positions shown in SHOW MASTER STATUS on the stand-in. U8.4. Allow clients to connect to the stand-in to do updates. U8.5. If there are other slaves of the master, wait for them to catch up with the master. Then redirect all other slaves to the stand-in, using MASTER_LOG_FILE and MASTER_LOG_POS as recorded in step U8.3. U8.6. Restart the master with GTID_MODE = ON. U8.7. Connect the master as a slave of the stand-in, using MASTER_LOG_FILE and MASTER_LOG_POS as recorded in step U8.3. U8.8. Wait until the master does not lag too much behind the stand-in. U8.9. Stop updates on the stand-in. U8.10. Wait until the master is up to date with the stand-in. Record the binlog positions shown in SHOW MASTER STATUS on the master. U8.11. Allow clients to connect to the master to do updates. U8.12. Wait until all direct slaves of the stand-in are up to date with the stand-in. U8.13. Connect the slaves and the stand-in as slaves of the master, using MASTER_LOG_POS and MASTER_LOG_FILE as recorded in step 8.10. Moreover, Check C1.2 needs to be replaced by an error generated unconditionally when user executes SET @@GLOBAL.GTID_MODE = ON.
SUMMARY OF CHANGES ================== 1. Small simplifications. While debugging the feature, a few small things had to be fixed, e.g. more DBUG output, etc. This patch collects all such simplifications, so that they don't distract the rest of the worklog. 2. Currently, the code uses numeric constants instead of enumeration values for GTID_MODE. Also it uses the names UPGRADE_STEP_1 and UPGRADE_STEP_2 instead of OFF_PERMISSIVE and ON_PERMISSIVE. Change to use the new names and to use enumeration values always. Use an enum type to give better compilation checks (e.g. warning for missing enumeration value in switch). Encapsulate all access to the global variable using getter functions. 3. Make GTID_MODE settable and allow OFF_PERMISSIVE and ON_PERMISSIVE. Make reads to gtid_mode be guarded by global_sid_lock.rdlock and writes by global_sid_lock.wrlock. Make the GTID table compression thread start and stop unconditionally. Make GTID_EXECUTED and GTID_PURGED be initialized unconditionally. Allow SET GTID_PURGED when GTID_MODE = OFF. Rotate the binary log when changing GTID_MODE. 4. Implement check C1.2. This requires that we implement the following counter: - anonymous_gtid_count: The number of active transactions that use GTID_NEXT = 'ANONYMOUS'. To implement this, we need: - Update anonymous_gtid_count whenever thd->gtid_owned changes to THD::OWNED_SIDNO_ANONYMOUS, or when it changes from THD::OWNED_SIDNO_ANONYMOUS to something else. 5. Make ENFORCE_GTID_CONSISTENCY settable, allow WARN. Use an enum instead of a boolean. Use symbolic names always. Encapsulate all access to the global variable using getter functions. Make reads of ENFORCE_GTID_CONSISTENCY be protected by global_sid_lock.rdlock and writes by global_sid_lock.wrlock. Make every transaction check for GTID consistency regardless of the value of GTID_MODE. Then, it is enough to take global_sid_lock.rdlock for transactions that violate GTID consistency. 6. Implement simple checks: C1.1-C1.3, C1.6, C2.1, C4.1-C4.4, C5.1, C6.4-C6.6, C7.4-C7.6, C9.4, C10.1 7. Implement checks that need to read AUTO_POSITION from multiple channels: C1.5, C9.1 8. Introduce two global counters: - automatic_gtid_consistency_violation_count: The number of active transactions that use GTID_NEXT = 'AUTOMATIC' and violate GTID consistency. - anonymous_gtid_consistency_violation_count: The number of active transactions that use GTID_NEXT = 'ANONYMOUS' and violate GTID consistency. To implement these, we need: - Introduce a flag THD::gtid_consistency_violation that indicates if the current transaction has increased one of these counters. - Currently, consistency is checked only if ENFORCE_GTID_CONSISTENCY = ON. We shall change so that consistency is checked unconditionally. If consistency is violated, do this: - Fail with an error if one of the following holds: - ENFORCE_GTID_CONSISTENCY = ON, or - GTID_NEXT = 'UUID:NUMBER', or - GTID_NEXT = 'AUTOMATIC' and GTID_MODE = ON_PERMISSIVE or ON - Otherwise: if GTID_NEXT = 'AUTOMATIC': increase automatic_gtid_consistency_violation_count set thd->gtid_consistency_violation = 1 if GTID_NEXT = 'ANONYMOUS': increase automatic_gtid_consistency_violation_count set thd->gtid_consistency_violation = 2 if ENFORCE_GTID_CONSISTENCY = WARN: generate a warning allow the statement to execute At the end of the statement, if thd->gtid_consistency_violation != 0, decrease the corresponding global counter and set thd->gtid_consistency_violation = 0. 8. Implement checks that depend on the counters introduced in the previous step: C1.4, C10.2, C10.3 10. Add a new event type, Binlog_configuration_log_event, that contains the GTID_MODE in use when the binary log was created. (But make the event format extensible so that other fields can be added if needed.) The event should have the LOG_EVENT_IGNORABLE_F flag set. Flush the binary log every time GTID_MODE is changed. This is needed so that every binary log contains the correct GTID_MODE in the Binlog_configuration_log_event. 11. Implement checks that depend on Binlog_configuration_log_event: C5.2, C6.1-C6.3, C7.1-C7.3, C8.1, C8.2, C9.2, which all depend on Binlog_configuration_log_event 12. Implement C9.3, which depends on C9.2 implemented in the previous step. 13. Implement C5.3. We postpone it until this step since it is a little bit complex, since it requires adding functionality to MYSQL_BIN_LOG::read_gtids_from_binlog.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.