WL#6056: Read/write layers in replication
Affects: Server-Prototype Only
—
Status: Un-Assigned
The goal of this worklog is to create uniform interfaces for reading and writing binary log data. This is a step that will allow very simple implementation of the following features: - master with no binary log (for sync or semi-sync replication) and/or slave with no relay log - master uses table for binary log and/or slave uses table for relay log - compress binary log and/or relay log - apply filters at any place in the replication chain (on master before/after writing to binlog, on slave before/after writing to relay log, or in mysqlbinlog) - implement checksums in a much cleaner and efficient way Without this worklog, each of the above features would be big, difficult and error-prone, as the features interact in complicated ways. With this worklog, each feature will be relatively easy to implement and there will be no cross- dependencies between the features. Other similar features are likely to benefit from this worklog in similar ways.
======== BACKGROUND ======== Transactions are read and written in five places: client threads, dump threads, IO thread, SQL thread, and mysqlbinlog. More precisely: - client that is going to write to the binary log "reads" transaction from server (i.e., generates a set of events from the THD object) - client writes to binary log - dump thread reads from binary log - dump thread writes to net - IO thread reads from net - IO thread writes to relay log - SQL thread reads from relay log - SQL thread "writes" to server (i.e., executes event) - mysqlbinlog reads from binary log, relay log, or net - mysqlbinlog "writes" transaction as text Currently, each of these five places has its own ad-hoc code for encoding and decoding transactions, iterate over transactions, retrieve information from transactions, etc. We will propose interfaces that are the same for all "read" operations respectively all "write" operations. ======== PROPOSAL ======== Add two layers: layer 0 and layer 1. -------- Layer 0 -------- In layer 0, a transaction has the form of a 2-tuple: (length, data), where: - length is an integer - data is a sequence of 'length' bytes that contains both transaction data (e.g. the statement) and stream meta-data (e.g., global transaction ID). Layer 0 is used as follows: - The binary log is implemented as a back-end capable of serializing and de- serializing a stream of such tuples on disk. - The relay log uses the same back-end as the binary log, only plugged in at another place in the code. - For the dump thread and IO thread, we implement a 'net' component. The net component is a back-end capable of serializing and de-serializing a stream of 2- tuples to/from a client connection. - Checksumming is trivial to implement as an operation on such 2-tuples. - Compression is trivial to implement as an operation on such 2-tuples. -------- Layer 1 -------- In layer 1, a transaction has the form of a 4-tuple: (length, data, ID, timestamp), where: - length is an integer - data is a sequence of 'length' bytes that contains transaction data (e.g. the statement) - ID is the global transaction identifier for the transaction - timestamp is the time when the transaction was applied on the nearest upstream server (i.e. this server if the tuple appears in a binary log, the master if the tuple appears in a relay log, the slave if the tuple appears in the binary log of the slave, etc) Thus, layer 1 has separated stream meta-data from transaction data. Layer 1 is used as follows: - There will be a canonical way to encode 4-tuples in layer 1 into 2-tuples in layer 0. - The dump thread de-serializes the binary log into layer 0 data; then decodes the data into layer 1 data; then filters out the transactions with IDs that the slave requested not to get; then encodes the remaining transactions into layer 0 data; then serializes them to the net. - The IO thread de-serializes the net into a stream of layer 0 data; then serializes the layer 0 data into the relay log - The SQL thread de-serializes the relay log into layer 0 data; then decodes the data into layer 1 data; then applies the data to the server. - mysqlbinlog de-serializes transactions from the binary log into layer 0; then decodes the data into layer 1; then filters out transactions with IDs or timestamps that should be filtered out according to command-line arguments; then encodes into layer 0; then serializes as text. - To implement binlog-free master, make the dump thread read directly from the server instead of from the binary log (details need to be sorted out carefully here). - To implement relay log-free slave, make the SQL thread read directly from the net instead of from the relay log. ======== Open questions ======== The precise interfaces to read and write to each layer remains to be done. Considerations to take into account: - must support parallel binlog write - must support waiting or not waiting for ack (for sync and semisync to work)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.