WL#2735: Refactor Replication
Affects: Server-7.1
—
Status: On-Hold
SUMMARY ------------------ This WL contains various ideas for refactoring replication. We'll probably do them piece by piece and not all of them might be done. SOME IDEAS (More below) --------------------------------------------- 1. The new file master.cc should only contain code for the master. All slave code should move to slave.cc 2. Move log_loaded_block into sql_load.cc Guilhem wrote: > [G] This log_loaded_block() was in sql_repl.cc but it had nothing to do > [G] there. Maybe in a further reorg we could move it to where it belongs: > [G] binlogging of LOAD DATA on the master side, so maybe sql_load.cc Should be very easy to fix after WL#1697 is pushed.
Copyright (c) 2001-2007 by MySQL AB. All rights reserved.
-------------------------------------------------------------------
SPLIT INTO FILES
----------------
- sql_slave.* (was sql_repl.*)
Interface from MySQL server to slave functionality
(roughly one call per SQL statement)
- sql_master.* (was sql_repl.*)
Interface from MySQL server to master functionality
(roughly one call per SQL statement)
- binlog.* (to be created, move code from sql_base,sql_class
and away from thd object)
Interface from MySQL server to binlogging. The idea is that
the server does not know how things are logged and does not
know if there are multiple binlogs below this interface.
The future handler binlog interface will be below this interface
(there could, in far future, be multiple handlers storing the
binlog here). I need to look at this in some more detail
to figure our if there are any obstacles to do this.
- rpl_mi.* (already created)
Master info functionality (will, in far future, no longer be
directly accessible from MySQL server)
- rpl_rli.* (already created)
Relay log info functionality (will, in far future, no longer be
directly accessible from MySQL server)
repl_failsafe.* to be moved into sql_slave.* and sql_master.* (the
stuff that actually are interface functions). Some code to be
removed.
-------------------------------------------------------------------
REFACTOR LOG_EVENT.CC
---------------------
I think log_event.h/cc needs some improvements. Frequently we get
questions about the binlog format, and it usually takes hours to
understand things that should only take seconds to look up if the code
was simpler. I think it
- takes hrs to try to understand what the binary encoding of an
event is,
- is hard to test the binary format of the events (would be super with
readers and writers of events),
- functionality of events, e.g. header_len is spread into
multiple classes making it hard to read,
- versioning info is in multiple places making it hard to
understand what each version needs and makes it hard to deprecate
code and make sure the code when the master and slave has different
versions, and
- both client and server code is mixed in same file making
the code filled with #ifdef making it harder to read.
I'm thinking of changing it in some way like this:
1. Separate client-server code (example: Query_log_event):
- Let all event classes (e.g. Log_event_query) only contain things
that the *client* needs.
- Create a server class for each event (e.g. Server_event_query)
that inherits from the client class.
Negative: the server will then have print functions compiled into it.
Positive: the client object can be used as a binlog API.
Mats explands this into: Move common items into a common base
class: let the client inherit from the common parts and add the print
member function, which will be based on the common parts. Let the
server class inherit from the common base class and add whatever it
needs. It is common to use a POD as the root base class, and maybe add
a layer of common functionality in a subclass. That way, there is an
easy and efficient way to process the raw data, if necessary (for
serialization purposes, for example).
2. Move headlen info into the events. The event itself should know its
header length dependent on the binlog version. (Idea is that one
should only need to look at the event class to see the binary
encoding.)
Mats: Agree. That will also allow it to be used with template
function, if the need arises. In general, it's a good idea to put all
information associated with an event *in* the event class. That allows
us to build generic methods for handling events that are *far* more
efficient and easy to maintain, and also with good locality, which
makes cached architectures work more efficiently.
3. Institute a policy that binlog version X and event type Y might
correpond to a different class than binlog version X' with
event type Y. This so that we can move into future versions
in two ways:
- Adding new event types for new events.
- If the number of event types exceed 256, we can increase the
binlog version to get a whole new range of types.
(This is thoughts for the far future though.)
4. Move some of the code into:
log_event_rbr.h/cc : RBR events
log_event_load.h/cc : LOAD DATA INFILE (3 different implementations!)
so that events that belong together are in same file, but that not
all events are in the same huge file.
5. Factor out decode/encode (Mats idea)
Factor out the parts that handle the decoding/encoding of log
events into a separate class. That way the events does not have to
carry around a lot of representational information, making them faster
to build and destroy. It would also be easy to look up encodings,
since they will be written separately and probably in a condensed
format (just to give an idea, something along the lines):
class Query_log_event_code : public Log_event_code {
void encode(Query_log_event const& ev, Output& out) {
Log_event_code::encode(ev, out);
out.write(ev.slave_proxy_id);
out.write(ev.exec_time);
out.write(ev.sql_mode);
.
.
.
}
};
This would also make it easy to switch encoding, or keep two
different encodings for the same events (essentially creating an
Abstract Factory and implementing different concrete factories for
each encoding).
6. (Mats idea) Develop a little language that can take a format
description and generate an encoding and decoding function. Changing
or adding fields to the log events is then done in an instant. With a
generic tool, we can do the same for other parts of the server. That
way, data format documentation and coding is one, and the code can be
organized radically different. If the format description format is
(X)HTML, or something similar, generation of documentation is
immediate.
Lars thinks this is cool but might be over-kill.
7. Problem: query log event contains a log of state information
and this makes the events fat, i.e. many bytes.
Refactor into:
1) master start by writing its global state in log (possibly in format desc
event),
2) if master session != master global, add info to event,
3) if global change, then this must be propagated to all slaves (same as item
1),
4) slave needs to remember the global state of the master (for the log it is
reading)...
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.