This is a step to modularize replication. The goal is to create an interface between the core server ("core") and the replication library ("rpl-lib") that is identical in 5.5 and 5.6, such that the 5.6 rpl-lib can be linked to the 5.5 core. Hence, the interface has to be flexible enough that it can handle both the 5.5 rpl-lib and the 5.6 rpl-lib. In this worklog, it suffices to make 5.6 rpl-lib link statically to 5.5 core. It is not a goal to make replication a plugin; that will be done in future worklogs. It is an open question whether the 5.6 rpl-lib for 5.5 core should be compiled with 5.6 core headers or with 5.5 core headers. We will do whatever turns out to be easiest.
The interface will be partitioned in three logically separated pieces: Binlog Functionality used by core when it writes to the binary log, as well as SQL statements for maintenance of binary logs: FLUSH BINARY LOGS, SHOW BINARY LOGS, SHOW BINLOG EVENTS, PURGE BINARY LOGS, RESET MASTER. Master Functionality used by the master dump thread, as well as the related SQL statement: SHOW SLAVE HOSTS. Slave Functionality used by the slave threads, as well as the related SQL statements: START SLAVE, STOP SLAVE, CHANGE MASTER, SHOW SLAVE STATUS, FLUSH RELAY LOGS, SHOW RELAYLOG EVENTS, BINLOG; as well as the function MASTER_POS_WAIT(). Each of these pieces will be represented in C++ by a class. The core will not refer to any other symbols than members of these classes. We will do the work in the following steps: (1) Create the three classes and add the class functions that we need. To minimize the amount of changes, the interface will include function arguments of unstable types like THD* and TABLE*. Later steps will replace such arguments by something more stable. This step has three sub-steps: (1.1) Binlog class [WL#5778] (1.2) Slave class [WL#5779] (1.3) Master class [WL#5789] (2) Remove references to server variables and user variables (through the THD object) from the implementation of Binlog.log_statement. Instead, add an argument to Binlog.log_statement that contains a list of all server variables to be replicated. This step has two sub-steps: (2.1) server variables [WL#5790] (2.2) user variables [WL#5791] (3) Replace THD* by MYSQL_THD in all functions that are part of the interface. Add thd_* functions to core through which we can get the relevant data. This step has four sub-steps: (3.1) Binlog.log_statement [WL#5792] (3.2) Other functions in Binlog class [WL#5793] (3.3) Slave class [WL#5794] (3.4) Master class [WL#5795] (4) The function for writing a block of data loaded by LOAD DATA INFILE to the binary log currently has the prototype Binlog::log_loaded_block(IO_CACHE *io_cache) This is bad because: (1) it assumes that IO_CACHE is used to load data, but we may want to replace that; (2) it assumes IO_CACHE is stable; (3) it assumes that the generic datatype IO_CACHE contains information specific to LOAD DATA INFILE statements, which it currently does but that is just bad design. So we should change the prototype to: Binlog::log_loaded_block(MYSQL_THD thd, int length, char *data) [WL#5798] (5) Move definitions of replication-specific server variables to the library. [WL#5796] (6) Copy error codes from 5.6 to 5.5. [WL#5802] (7) Refactor first rpl_master_has_bug and then Field::compatible_field_size. [WL#5805, WL#5815] (8) Use Binlog::log_statement instead of other ways to do the same thing. [WL#5816] (9) Make it possible for the plugin to dynamically add new clauses to CHANGE MASTER without modifying the parser. [WL#5755] (10)Create a new sub-directory 'rpl' in the 'sql' directory in the source tree. Move rpl_* to the new sub-directory and remove the prefix rpl_ from the files. [WL#5797] (11)Create interface for replication filters. [WL#5817] (12)Create interface for writing LOAD DATA to binary log. [WL#5818] (13)Remove miscellaneous cross-references between rpl-lib and core. [WL#5819] (14)Separate replication plugin interfaces from core plugin interfaces. [WL#5820] (99)Link replication as a library. [WL#5814] More tasks may need to be added here. We cannot determine all tasks in advance, because that would require inspecting the entire replication codebase. We will likely discover the tasks during the coding phase. Note: some of the above work can be done in parallel. The only dependencies are: - (1.1) must be pushed before (2.1) and (2.2) can be pushed, however (2.1) and (2.2) can be started before (1.1) is pushed (we can create internal mechanisms to pass server and user variables to Binlog::log_statement without using the Binlog class). - (2.1) and (2.2) must be done before (3.1) and (3.2). (The easiest way to do (3.1) and (3.2) is to replace THD by MYSQL_THD in the function prototype, then compile and see what needs to be done.) - (1.2) must be pushed before (3.3) can be started. - (1.3) must be pushed before (3.4) can be started. ==== Open questions ==== The following is an unsorted list of miscellaneous things we need to do before core and rpl-lib are separated. Most of them should probably be moved into new worklogs. - What is MYSQL_BIN_LOG::start_union_events and friends? How do we expose this functionality in the interface? Preliminary decision: this function seems misplaced. We should probably move it to THD and the functionality it provides should be entirely in core. (It's not completely clear what this does, but it seems to be related to logging of SP invocations.) - THD::rli_fake is of type Relay_log_info, which is defined in rpl-lib. This is no good. We will need to add the following: - core should expose an interface to attach custom data to THD. - core should expose hooks that will be invoked when THD is created and destroyed. - rpl-lib should register a callback for the "THD::destroy" hook. The callback should free rli_fake. - The following functions need to be moved into rpl-lib: THD::binlog_write_row THD::binlog_update_row THD::binlog_delete_row In fact, Binlog::log_write_row is a wrapper around THD::binlog_write_row, so we can just move the body of THD::binlog_write_row into Binlog::log_write_row. - The following THD member functions are only used internally by rpl-lib and can just be moved into rpl-lib: THD::binlog_setup_trx_data THD::binlog_set_stmt_begin THD::binlog_get_pending_rows_event THD::binlog_set_pending_rows_event THD::binlog_prepare_pending_rows_event - The following THD member functions are used by both core and rpl-lib. We need to figure out the best interface for this. THD::flush_pending_rows_event THD::binlog_start_trans_and_stmt THD::binlog_write_table_map - ha_ndbcluster.cc uses active_mi to get binlog positions. This is not nice, it would be better if binlog positions were internal to rpl-lib (e.g., because of WL#3584). We need to determine why ndb needs to know binlog positions. Then, either find a way for ndb to not rely on binlog positions, or expose binlog positions from rpl-lib to core via Binlog, and from core to ndb via an interface in core. - How can we get rid of declarations of replication-specific mutexes, condition variables and files from mysqld.cc? Examples: key_BINLOG_LOCK_index, key_BINLOG_COND_prep_xids, key_file_binlog