WL#2735: Refactor Replication
Affects: Server-7.1
—
Status: On-Hold
SUMMARY ------------------ This WL contains various ideas for refactoring replication. We'll probably do them piece by piece and not all of them might be done. SOME IDEAS (More below) --------------------------------------------- 1. The new file master.cc should only contain code for the master. All slave code should move to slave.cc 2. Move log_loaded_block into sql_load.cc Guilhem wrote: > [G] This log_loaded_block() was in sql_repl.cc but it had nothing to do > [G] there. Maybe in a further reorg we could move it to where it belongs: > [G] binlogging of LOAD DATA on the master side, so maybe sql_load.cc Should be very easy to fix after WL#1697 is pushed.
Copyright (c) 2001-2007 by MySQL AB. All rights reserved. ------------------------------------------------------------------- SPLIT INTO FILES ---------------- - sql_slave.* (was sql_repl.*) Interface from MySQL server to slave functionality (roughly one call per SQL statement) - sql_master.* (was sql_repl.*) Interface from MySQL server to master functionality (roughly one call per SQL statement) - binlog.* (to be created, move code from sql_base,sql_class and away from thd object) Interface from MySQL server to binlogging. The idea is that the server does not know how things are logged and does not know if there are multiple binlogs below this interface. The future handler binlog interface will be below this interface (there could, in far future, be multiple handlers storing the binlog here). I need to look at this in some more detail to figure our if there are any obstacles to do this. - rpl_mi.* (already created) Master info functionality (will, in far future, no longer be directly accessible from MySQL server) - rpl_rli.* (already created) Relay log info functionality (will, in far future, no longer be directly accessible from MySQL server) repl_failsafe.* to be moved into sql_slave.* and sql_master.* (the stuff that actually are interface functions). Some code to be removed. ------------------------------------------------------------------- REFACTOR LOG_EVENT.CC --------------------- I think log_event.h/cc needs some improvements. Frequently we get questions about the binlog format, and it usually takes hours to understand things that should only take seconds to look up if the code was simpler. I think it - takes hrs to try to understand what the binary encoding of an event is, - is hard to test the binary format of the events (would be super with readers and writers of events), - functionality of events, e.g. header_len is spread into multiple classes making it hard to read, - versioning info is in multiple places making it hard to understand what each version needs and makes it hard to deprecate code and make sure the code when the master and slave has different versions, and - both client and server code is mixed in same file making the code filled with #ifdef making it harder to read. I'm thinking of changing it in some way like this: 1. Separate client-server code (example: Query_log_event): - Let all event classes (e.g. Log_event_query) only contain things that the *client* needs. - Create a server class for each event (e.g. Server_event_query) that inherits from the client class. Negative: the server will then have print functions compiled into it. Positive: the client object can be used as a binlog API. Mats explands this into: Move common items into a common base class: let the client inherit from the common parts and add the print member function, which will be based on the common parts. Let the server class inherit from the common base class and add whatever it needs. It is common to use a POD as the root base class, and maybe add a layer of common functionality in a subclass. That way, there is an easy and efficient way to process the raw data, if necessary (for serialization purposes, for example). 2. Move headlen info into the events. The event itself should know its header length dependent on the binlog version. (Idea is that one should only need to look at the event class to see the binary encoding.) Mats: Agree. That will also allow it to be used with template function, if the need arises. In general, it's a good idea to put all information associated with an event *in* the event class. That allows us to build generic methods for handling events that are *far* more efficient and easy to maintain, and also with good locality, which makes cached architectures work more efficiently. 3. Institute a policy that binlog version X and event type Y might correpond to a different class than binlog version X' with event type Y. This so that we can move into future versions in two ways: - Adding new event types for new events. - If the number of event types exceed 256, we can increase the binlog version to get a whole new range of types. (This is thoughts for the far future though.) 4. Move some of the code into: log_event_rbr.h/cc : RBR events log_event_load.h/cc : LOAD DATA INFILE (3 different implementations!) so that events that belong together are in same file, but that not all events are in the same huge file. 5. Factor out decode/encode (Mats idea) Factor out the parts that handle the decoding/encoding of log events into a separate class. That way the events does not have to carry around a lot of representational information, making them faster to build and destroy. It would also be easy to look up encodings, since they will be written separately and probably in a condensed format (just to give an idea, something along the lines): class Query_log_event_code : public Log_event_code { void encode(Query_log_event const& ev, Output& out) { Log_event_code::encode(ev, out); out.write(ev.slave_proxy_id); out.write(ev.exec_time); out.write(ev.sql_mode); . . . } }; This would also make it easy to switch encoding, or keep two different encodings for the same events (essentially creating an Abstract Factory and implementing different concrete factories for each encoding). 6. (Mats idea) Develop a little language that can take a format description and generate an encoding and decoding function. Changing or adding fields to the log events is then done in an instant. With a generic tool, we can do the same for other parts of the server. That way, data format documentation and coding is one, and the code can be organized radically different. If the format description format is (X)HTML, or something similar, generation of documentation is immediate. Lars thinks this is cool but might be over-kill. 7. Problem: query log event contains a log of state information and this makes the events fat, i.e. many bytes. Refactor into: 1) master start by writing its global state in log (possibly in format desc event), 2) if master session != master global, add info to event, 3) if global change, then this must be propagated to all slaves (same as item 1), 4) slave needs to remember the global state of the master (for the log it is reading)...
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.