WL#4832: Improve scalability of binary logging
Affects: Server-5.6 — Status: Complete — Priority: Medium
GOAL ==== In order to ease the development of WL#5223, regular events (e.g. Rows_log_event, Query_log_event, etc) must be written into a cache and eventually flushed to the binary log. An incident event that denotes a problem while using the cache, however, must be written directly into the binary log. The idea is to not keep a lock on the binary log while encoding events and know the size of the transaction or statement before actually starting to write it. Doing so, WL#5223 can be easily implemented. CONTEXT ======= In 5.1, the binary log is locked when writing events, even in the case that the event goes into the cache and not to the binary log. This causes a significant bottleneck when it comes to the scalability of replication, since every thread has to acquire the binlog lock for every statement executed. In 5.5, WL#2687 fixed this behaviour and every DML is either written into the trx-cache or stmt-cache before being flushed to the binary log upon commit or rollback. BEGIN, COMMIT, and ROLLBACK events are written together with the contents of the cache when the cache is flushed to disk. DDL is written directly to disk without going through a cache. In 5.6, in order to ease the development of WL#5223, the idea is that, in most cases, a cache contains all that is written into the binary log: including BEGIN, COMMIT, ROLLBACK and DDLs. Incident events, however, will still be written directly into the binary log. ASSOCIATED BUGS =============== BUG#42757: Redundant use of LOCK_log in MYSQL_BIN_LOG::write(Log_event*) BUG#43362: Missing implicit commit after statements that commit active transactions
WL#2687: Write non-transactional binlog events directly to binary log
WL#3726: DDL locking for all metadata objects
WL#3726: DDL locking for all metadata objects
BACKGROUND ========== The binary log is composed by a stable storage file and two caches, trx-cache and stmt-cache. The trx-cache stores changes to transactional engines and statements classified as unsafe. The stmt-cache stores changes to non-transactional engines. The binary log is implemented as a storage engine and may receive, for example, notifications on the drop down of a connection, definition of a savepoint, commitment of a transaction, etc. We exploit these notifications to implement the routines that store information into a cache and flushes its content to disk. Let's assume that a transaction T1 executes statements S1, S2 and then commits, i.e. . T1 (BEGIN S1 S2 COMMIT). T1's execution generates the following calls/notifications to the binary log: . WRITE S1 . COMMIT S1 . WRITE S2 . COMMIT S2 . COMMIT T1. The COMMIT/ROLLBACK per statement is used by the stmt-cache that flushes its content into the binary log as non-transactional changes must be flushed ahead of any transaction. Then the COMMIT/ROLLBACK per transaction is used by the trx-cache to flush its content. When WRITE is called the event is appended to the appropriated cache. When COMMIT/ROLLBACK is called, the BEGIN is written into the binary log, the appropriated cache is flushed to it and the COMMIT is written to it. Let's assume now that a DDL is executed. In this case, the following calls/notifications to the binary log are generated: . WRITE DDL When WRITE is called, the event is appended directly into the binary log. Incident and Rotate events behave similarly. PROPOSAL ======== . Write BEGIN, COMMMIT/ROLLBACK into the appropriated cache. When WRITE is called the appropriated cache is initialised, if this was not done yet. If the cache is empty, the BEGIN is written into the cache and right after S1. When COMMIT/ROLLBACK is called, the token is written into the appropriated cache and then it is flushed. . Write DDLs into the stmt-cache and flush it immediately. When WRITE is called the appropriated stmt-cache is initialised and the DDL appended to the cache. Right after the cache is flushed. . Incident events are written directly into the binary log. Incident events that denote problems while using the cache are still written directly into the binary log. Although there is no reason to write Rotate events directly to the binary log, we keep this behaviour. KEY POINTS ========== . DDL could follow the same life-cycle of a DML, i.e. write --> commit/rollback --> flush. However, it sounds strange to have a commit/rollback associated with a DDL. . Some DMLs must be written into the binary log following the life-cycle of a DML, because COMMIT/ROLLBACK notifications are never generated. . Before introducing MDL, i.e. Meta Data Locking, DDLs may be written to the binary log with a wrong order. WL#4986 must carefully check if MDL is properly working.
PROPOSAL ======== . Refactor the sql/binlog.cc and sql/log_event.cc to achieve the goals. . Use the possible values to define if an event should go through a cache or not: /* If possible the event should use a non-transactional cache before being flushed to the binary log. This means that it must be flushed right after its correspondent statement is completed. */ EVENT_STMT_CACHE, /* The event should use a transactional cache before being flushed to the binary log. This means that it must be flushed upon commit or rollback. */ EVENT_TRANSACTIONAL_CACHE, /* The event must be written directly to the binary log without going through any cache. */ EVENT_NO_CACHE, . Use the possible values to define when an event should be flushed to disk: /* The event must be written to a cache and upon commit or rollback written to the binary log. */ EVENT_NORMAL_LOGGING, /* The event must be written to an empty cache and immediatly written to the binary log without waiting for any other event. */ EVENT_IMMEDIATE_LOGGING, /* If there is a need for different types, introduce them before this. */
Copyright (c) 2000, 2016, Oracle Corporation and/or its affiliates. All rights reserved.