WL#10956: Binary log storage access API
Affects: Server-8.0
—
Status: Complete
EXECUTIVE SUMMARY ================= This worklog implements a storage layer abstraction interface, making it possible to decouple the act of capturing changes and persisting them. Therefore, this is also a stepping stone for creating an encrypted persistence layer, effectively providing encryption at rest for the binary log. DETAILS ======= Currently, IO_CACHE is used for accessing binlog/relaylog files everywhere in the code. It just supposes binlog/relaylog files are uncompressed and unencrypted. So that is difficult to add encryption or compression features into binlog/relaylog. This worklog will encapsulate binlog/relaylog storage into a separate layer. A group of interfaces will be defined for accessing binlog/relaylog files. The interfaces can have different implementations. IO_CACHE will be encapsulated and provided as an implementation of binlog storage accessing API for accessing plain binlog/relaylog files. In the future, an encryption storage layer implementation can be provided. DEV STORIES =========== - Developers want to define a common API for writing binlog/relay events to somewhere else than IO_CACHE. Binlog events shall be able to write into any object which support the common API. * It will make it easy to implement new access features(e.g encryption). * New implementations will not compromise existing implementations. - Developers want to define a common API for reading binlog/relay events from somewhere else than IO_CACHE. Binlog events shall be able to read from any object which support the common API. * It will make it easy to implement new access features(e.g encryption). * New implementations will not compromise existing implementations. - Developers want to define a layered/chained/pipeline model for binlog/relay log files. * It is easy to add/remove a feature into/from the pipeline.
Functional Requirements ======================= NONE Non-Functional Requirements =========================== NF1. It shall define some interfaces for reading and writing binlog files. The code just needs to call the interfaces for reading and writing binlog events without knowing any detail of the implementation of the interfaces. NF2. It shall has an implementation for accessing events in binlog files. NF3. It shall has an implementation for accessing events in binlog caches. NF4. When adding new interfaces implementation (e.g. Encryption), it shall not change the existing interfaces. It should be enough just to implement the exact existing interfaces. NF5. The patch shall not impact performance.
UPGRADE/DOWNGRADE ================= No impact. SECURITY ======== No impact. OBSERVABILITY ============= No impact. PROTOCOLS ========= No impact on protocols. No impact on the on-the-wire format of the binlog. DEPLOYMENT ========== No new persistent or temporary files after this worklog. No files removed. Interface Specification ======================= No interface change for users. High-Level Specification ======================== * Binlog's accesses can be summarized as - Searialize binlog events to a storage The the storage could be binlog cache, binlog file, buffer. This is done by Log_event::write[_xxx]() functions. - Desearialize an binlog event from a storage could be a file or memory buffer. - Reads byte events from a binlog file, it is the pattern in binlog_sender. - Controls binlog process. do prepare, flush, sync, rotate related things. This design abstract all storages like file, binlog_cache even memory buffers as streams. A few access interfaces is defined for the streams. All streams following the interfaces can be accessed as storage mentioned above. In future we can have more streams, events can be easily written into or read from the streams. Most of the time, reading from and writting into binary logs are separate tasks. When doing transaction, it just writes. When reading events, it never writes. So this worklog separates read and write into input stream and output stream. With the stream concept in, Binlog's accesses can be summarized as - Searialize binlog events to an output stream. - Read a byte event from an input stream the byte event will be stored into an memory buffer(uchar *). - Deserialize binlog events from an memory buffer. We cannot desearialize an event directly from an input stream, because All events' deserialization process are based on memory buffer. * Stream Design - Basic Stream for Serialization and Deserialization Serialization and deserialization just need pretty simple interfaces. They are - Basic_ostream only support write() - Basic_istream only support read() - Stream Chain Searialization and deserialization process could read from or write to a stream chain. Stream chain is not part of this design. But it do support stream chain. For example: Serialize -> Buffer Output Stream -> File Output Stream. Read byte event <- Buffer input stream <- File Output Stream - Binlog File Streams Since there are many binlog file accesses, it will design a few binlog file streams to hide file access detail from binlog code. - Binlog_ifile for read a binlog file - Binlog_ofile for control and write a binlog file Implementation detail is hide from binlog code. When binlog code need to write, it just call Binlog_ofstream::write. When binlog code need to flush/truncate/open/close, it just call the functions of Binlog_ostream. Internally, Binlog_ofstream could have a ostream chain like: Buffer Output Stream -> Encryption Output Stream -> File Output Stream Binlog code don't need to know the detail. Binlog_ifstream is similar.
Output Streams ============== * Class Basic_ostream It is the pure abstract class which declares the write interface for writing bytes into a place. /** Write some data into the output stream. @param[in] buffer data will be written into the stream @param[in] length the length of the data @return returns false if succeeds, otherwise returns true */ - virtual bool write(const unsigned char *buffer, my_off_t length) = 0; Binary events have below functions to writing themself to an IO_CACHE. - Log_event::write_header(IO_CACHE* file, ...) - Log_event::write_footer(IO_CACHE* file, ...) - Xxx_log_event::write_body(IO_CACHE* file, ...) - Xxx_log_event::write(IO_CACHE* file, ...) In this design, argument IO_CACHE is replaced by Basic_ostream. The functions will be: - Log_event::write_header(Basic_ostream* ostream, ...) - Log_event::write_footer(Basic_ostream* ostream, ...) - Xxx_log_event::write_body(Basic_ostream* ostream, ...) - Xxx_log_event::write(Basic_ostream* ostream, ...) With this design, events can be written into different place easily. It can write into a binlog file, it can also be write into binlog cache or any other classes which derives from Basic_ostream. This worklog includes four different basic output streams: - class StringBuffer_ostream - Transaction_message - class Binlog_ofile - class Binlog_cache * Class StringBuffer_ostream A wrapper of StringBuffer to implement a simple output stream for writing data into stack memory. If the data is too big then allocate enough heap memory for the data. It is used to simplify group replication code. Group replication used IO_CACHE for serialize an event to some memory buffer. With this class, events can be serialized into a memory buffer directly. * Class Transaction_message It is a class in group replication which serializes binary events into a message. This worklog makes it derives from Basic_ostream, so the binary events can be serialized into its buffer directly. * Class Binlog_ofile It defines a logical binlog file which wrappers and hides the real storage layer operations. It provides the operations for controlling binlog files, like open, close, write, flush etc. When opening Binlog_file, it initializes the real storage. When writing an event to Binlog_file, it writes the event into the real storage. MYSQL_BIN_LOG code operates on a plain binlogs. It doesn't need to know/care the detail of low level storage operates(e.g. if it is encrypted or not). It derives from Basic_ostream, so events can be written into it directly. /** Open the binlog file. It opens the file output stream. @param[in] log_file_key The PSI_file_key for this stream @param[in] binlog_name The file will be opened @param[in] flags The flags used by IO_CACHE. @return returns false if succeeds, otherwise true is returned. */ bool open(PSI_file_key log_file_key, const char* binlog_name, myf flags); /** close the binlog file and the file output stream. */ void close(); /** Binlog has position conception. It is used many places. Position is the logical offset of the binlog, but not the real position where it is stored in a output stream. So This class maintains binlog position. */ - my_off_t m_position - bool write(const unsigned char *buffer, my_off_t length) same to Basic_ostream::write. it overrides Basic_ostream::write. /** Updates some data in the binlog file. @param[in] buffer data will be written into the binlog file. @param[in] length the length of the data @param[in] offset the start position where the data will be updated @return returns false if succeeds, otherwise returns true */ - bool update(const unsigned char *buffer, my_off_t length, my_off_t offset) /** Truncates some data from the end of the binlog file. @param[in] offset where the data will be truncated to @return returns false if succeeds, otherwise returns true */ - bool truncate(my_off_t offset) /** Flush buffered data into file system. @return returns false if succeeds, otherwise returns true */ - bool flush() /** Flush the data from file system into disk. @return returns false if succeeds, otherwise returns true */ - bool sync() /** returns current position where it is writing */ - my_off_t position() /** returns true if the binlog file is empty */ - bool is_empty() if it is empty or not /** return true if the real storage is opened. */ - bool is_open() * Class Binlog_cache It defines a binlog cache container for store binlog events. It provides a few elegant interfaces for writing or reading binlog events into or from the container. It hides the detail of level storage details which binlog code doesn't need to know. It is derived from Basic_ostream, So we can pass it as argument to Xxx_log_evnet::write() for writing events into binlog cache. Binlog_cache has both read and write interfaces. ---------------- Write Interfaces ---------------- - bool write(const unsigned char *buffer, my_off_t length) same to Basic_ostream::write. it overrides Basic_ostream::write. - virtual bool truncate(my_off_t offset) = 0; same to Binlog_file::truncate /** Drop the data in the cache. @return returns false if succeeds, otherwise returns true */ - virtual bool reset() = 0; /** Returns the numbers of disk writes in the transaction */ - virtual size_t disk_writes() = 0; Returns the number of disk writes. /** Returns the temporary file name if it has one. */ - virtual const char* tmp_file_name() = 0; --------------- Read Interfaces --------------- /** Copy the entire data to somewhere. @ostream Where the data will be written into */ bool copy_to(Basic_ostream *ostream); /** Returns data length of the cache. */ - virtual my_off_t length() = 0; Returns the length of the data in the cache. /** Returns if the cache is empty. */ - bool is_empty() { return length() == 0; } It is used in binlog_cache_data class to replace IO_CACHE. binlog_cache *m_cache; * IO_CACHE_ostream It is a wrapper of IO_CACHE to implement necessary file operations. It is a low level output stream used in Binlog_ofile. * IO_CACHE_binlog_cache It is a wrapper of IO_CACHE to implement the required features for a binlog cache. It is a low level stream used in Binlog_cache. Input streams ============== * Class Basic_istream It is the pure abstract class which declares the basic interface for reading some data from a place. /** Read some data from the stream. @param[in] buffer where the data will be written into @param[in,out] length buffer's size as input parameter. It will be set to read bytes as output parameter. @return returns false if succeeds, otherwise returns true */ - virtual bool read(unsigned char *buffer, my_off_t *length) = 0; Read some data from the source. caller pass the bytes wanting through 'length'. And read will returns bytes it exact read through 'length'. There are two class derives from it: - Basic_seekable_istream It is introduced below. - Stdin_istream It encapsulate stdin into a basic_stream interface. * Class Basic_seekable_istream It is the pure abstract class which declares the basic seek interface for seeking read position in a stream. - virtual bool seek(my_off_t offset) = 0; There are three classes derives from it: - class IO_CACHE_istream IO_CACHE_istream is a file input stream based on IO_CACHE - class Stdin_seakable_istream A wrapper of Stdin_istream which inherits to Binlog_istream and implements the seek() interface of Basic_seekable_istream. In fact, it only support seeking forward, seeking backward is invalid. It is used by mysqlbinlog. - Basic_binlog_ifile It is introduced below. * Class Basic_binlog_ifile It defines a logical binlog file which wrappers and hides the real storage layer operations. It provides the operations for controlling binlog files, like open, close, read, seek etc. When reading an event from Binlog_ifile, it reads the event from the real storage. Binlog access code operates on a plain binlogs. It doesn't need to know/care the detail of low level storage operates(e.g. if it is encrypted or not). It derives from Basic_seekable_istream, so it can be called by readers as as byte stream. /** It points to the underlying input stream created by reader. When read() or seek() function is called, it calls m_istream to operate the underlying storage. */ - Basic_seakable_istream *m_istream - my_off_t m_position same to MYSQL_BIN_LOG::Binlog_file - bool read(unsigned char* buffer, my_off_t *length) same to Basic_istream::read, it overrides Basic_istream::read. /** Sets the read start position for next read(). @param[in] offset where it should seek to. */ - bool seek(my_off_t position) same to Binlog_istream::seek - my_off_t position() same to MYSQL_BIN_LOG::Binlog_file /** returns true if underlying input stream is opened. */ - bool opened() /** Open the system layer file. It is the entry of the stream pipeline. Implementation is delegated to sub-classes. Sub-classes opens system layer files in different way. @param[in] file_name name of the binlog file which will be opened. */ virtual Basic_seekable_istream *open_file(const char *file_name) = 0; /** close the system layer file. */ virtual void close_file() = 0; Three classes dreive from it. - Binlog_ifile It is for the binlog files generated on master side. - Relaylog_ifile It is for the binlog files generated on slave side. - Mysqlbinlog_ifile It is for mysqlbinlog. Mysqlbinlog could read files from stdin. Binary Event Readers ==================== * Class Binlog_read_error It defines the error types which could happen when reading binlog files or deserializing binlog events. String error message of the error types are also defined. It has an member variable to store an error type and provides a few functions to check the error type stored in the member variable. * Class Binlog_event_data_istream Event_data is serialized event object. It is a chunk of data in buffer. Binlog_event_data_istream fetches byte data from Basic_istream and divides them into event_data chunk according to the format. /** The stream where event read from */ - Basic_istream *m_istream /** Stores the error encounted when reading header or body. It is a pointer. It muse be initiaized when in its constructor by caller. */ - Binlog_read_error *m_error /** Read an event data from the stream and verify its checksum if verify_checksum is true. @param[out] data The pointer of the event data @param[out] length The length of the event data @param[in] allocator It is used to allocate memory for the event data. @param[in] verify_checksum Verify the event data's checksum if it is true. @param[in] checksum_alg Checksum algorithm for verifying the event data. It is used only when verify_checksum is true. @retval false Succeed @retval true Error */ templatebool read_event_data(unsigned char **data, unsigned int *length, ALLOCATOR *allocator, bool verify_checksum, enum_binlog_checksum_alg checksum_alg); /** Read the event header from the Basic_istream. @retval false Succeed @retval true Error */ virtual bool read_header(); It is virtual so the subclass can override it. There is one class derives from it. - Mysqlbinlog_event_data_istream It override read_header to skip multiple binlog magic for the case that mysqlbinlog reading binlog files through stdin. It also reimplement read_event_data() to for rewriting database names in the event data. * Class Binlog_event_object_istream It reads event_datas from an event_data stream and deserialize them to event objects. /** Stores the error encounted when reading header or body. It is a pointer. It muse be initiaized when in its constructor by caller. */ - Binlog_read_error *m_error /** Read an event ojbect from the stream @param[in] fde The Format_description_event for deserialization. @param[in] verify_checksum Verify the checksum of the event_data before @param[in] allocator It is used to allocate memory for the event data. @return An valid event object if succeed. @retval nullptr Error */ template Log_event *read_event_object(const Format_description_event &fde, bool verify_checksum, ALLOCATOR *allocator); * Class Basic_binlog_file_reader It owns a byte stream, an event_data stream and an event object stream. The stream pipeline is setup in constructor. All the objects required for reading a binlog file is initialized in reader class. It also includes a few convenient functions to encapsulate the access of BINLOG_IFILE, BINLOG_EVENT_DATA_ISTREAM, BINLOG_EVENT_OBJECT_ISTREAM. It makes the code simpler for reading a binlog file. /** Open a binlog file and set read position to offset. It will read and store Format_description_event automatically if offset is bigger than current position and fde is nullptr. Otherwise fde is use instead of finding fde from the file if fde is not null. @param[in] file_name name of the binlog file which will be opened. @param[in] offset The position where it starts to read. @param[in] fde The format_description_event for reading events. */ bool open(const char *file_name, my_off_t offset = 0, const Format_description_event *fde = nullptr); /** close the binlog file. */ - void close() /** Wrapper of BINLOG_EVENT_DATA_ISTREAM::read_event_data. */ bool read_event_data(unsigned char **data, unsigned int *length); /** wrapper of BINLOG_EVENT_OBJECT_ISTREAM::read_event_object. */ Log_event *read_event_object(); * Allocators There are two allocator classes in this worklog. - Class Default_binlog_event_allocator It is an allocator which using my_malloc to allocate memory - Class Binlog_sender::Event_allocator It uses a String as shared memory for reading event data. * Class Binlog_file_reader The class for reading binlog files generated on master side. typedef Basic_binlog_file_reader Binlog_file_reader; * Class Relaylog_file_reader The class for reading binlog files generated on master side. typedef Basic_binlog_file_reader Relaylog_file_reader; * Class Mysqlbinlog_file_reader typedef Basic_binlog_file_reader Mysqlbinlog_file_reader; * Class Binlog_sender::File_reader typedef Basic_binlog_file_reader File_reader;
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.