WL#4008: Embed scripting support in replication threads
Affects: Server-Prototype Only
—
Status: Un-Assigned
Rationale ========= Possible uses for this technology: 1. Filtering (on master before binlog, on master when sending, and on slave when receiving) Reference: BUG#36429 Summary ======= By adding an interface to the replication threads (the dump thread, the slave I/O thread, and the SQL thread), it is possible to add scripting support allowing a user to write simple scripts to read or manipulate the replication stream at various stages of the replication. By providing such support, we will be able to handle a number of outstanding requests for improvements to the replication in a simple manner. In addition, the feature will allow MySQL to internally focus on developing clear and versatile interfaces, which will further promote the continuous improvement of the server's internal structure. Instead of deciding on a scripting language to support, we should focus on: 1. Defining interfaces that allows reading and manipulation of the replication stream at various points of the replication process. Where and how we provide these accesses is important both for security reasons and for making the server resilient to change, as well as for usability reasons. 2. Deciding on measure that needs to be taken to allow the dynamic loading of plug-ins to support either one of several candidate scripting languages (we should at least aim to support Perl, Python, and Lua since they have different approaches to embedding). 3. Implement a plug-in for one scripting language to provide initial support and also to act as an example for how a plug-in should be used. Reasons for providing this support ================================== We have identified the following reasons for providing this support [tentative description, this needs to be elaborated]: 1. It allows users to solve problems without needing to write external scripts 2. It encourages providing support for other scripting languages 3. It allow MySQL to quickly solve problems and identify candidate features for implementation inside the server code. Example use cases ================= The following use cases can serve as examples of what should be possible to do with scripts. Tag events with extra information --------------------------------- For events that are being written to the binary log, it is possible to write extra information, e.g., updating tables to indicate the progress of the master. This is similar to placing a trigger on the binary log, if the binary log had been put in a table. Require mutable events. Fail-over to passive master --------------------------- If a slave detects that a master has timed out and potentially stopped, a script in the slave threads can automatically switch to the passive master. Requires notifications in the I/O thread, but not events at all. Tapping the replication stream ------------------------------ It is also possible to tap the replication stream, sending events that being written to the binary log to other hosts to secure the binary log. Requires read events. Rewriting queries before applying to slave ------------------------------------------ Occasionally, a user want to rewrite a query to another form, for instance when there is a need to replicate to a table with a different definition. By adding the necessary means to replace the query string with another one, or alternatively just feeding the new query using a separate client thread, this can be achieved using a simple script attached to the SQL thread. Requires mutable events.
Definitions =========== For the purpose of this specification: - An *extension* is a software component that adds functionality to the server in some manner. This can be by means of a script, a sharable library, or other ways. - An *event* (not to be confused with binary log events) is a notification of the occurrence of something inside the server. The events contain information about the occurrence in the form of a set of values. - Inside the server, there is a number of *producers* that produce events. - The events are passed from the producer to one or more *observers* or *consumers* (the two terms are used interchangeably). Typically, an observer allow the event to be further processed while consumers absorb the event, but the distinction between these two concepts are often blurred. - An *observer interface* is a collection of functions in the server that can be called for events in the server. Each extension registers for one or more observer interfaces. The idea is that each interface represents one coherent unit that needs a full implementation to function correctly. The typical example is the `binary log interface`_ below, which needs to be fully implemented to work reasonably. .. _`binary log interface` : `Replication Observer Interfaces`_ Task decomposition ================== In order to split the project into separate tasks, we identify the following separate tasks: - Provide a mechanism to load support for a specific scripting language into the server. This requires the server to define the placement of scripts as well as a way for the plugin to inform the server what files it is able to handle. - Adding an actual observer interface is a task in itself. In this case, we are going to specify observer interfaces for the replication process. - Creating a plugin for a specific language consists of creating a plugin that registers with the server and contain adapters for each observer interface. The adapters will convert events into a form that can be passed to the scripts and is specific to each scripting language. Open Issues =========== * When receiving an event from the master, there are some processing done to decide if it is a set of special events (e.g., a Rotate). Shall the call site be before or after this checking code? * Suppose that it is necessary to have different logic for reading from a master and writing to a slave. How shall this code be organized? Assuming that these two actions are a single interface, this will place the code in an awkward situation. Replication Callback Points =========================== We have identified the following callback points in the server, where potentially a scripting solution might be interested in getting access to the binary log event: In a client thread: - Before writing the event to the binary log. At this point, the event can be altered, but nothing else in the control flow can be altered. In a dump thread: - After reading the event from the binary log, but before sending the event to the slave. In the I/O thread when: - After receiving an event, but before it is put in the relay log. - If a timeout occurs while waiting for an event. In the SQL thread when: - Reading the event from the relay log but before sending it off to ``mysql_parse()`` Replication Observer Interfaces ================================ There are three interfaces in play here: one for writing and reading to the binary log, one for writing and reading to the network, and one for writing and reading the relay log.
Implementation support class ============================ In order to support some basic lookup services, a base class with functionality is supplied. The class is intended to inherit from, but just offer basic support functions for ease of implementation. :: class Extension_interface { public: const char* get_name() const; protected: Extension_interface(const char *name); }; Extension interface =================== Synopsis -------- Each extension interface is represented as a class of the following form:: class Slave_threads_extension : public Extension_interface { public: Slave_threads_extension() : Extension_interface("replication/threads") {;} ~Slave_threads_extension(); *extension interface functions* }; The design is deliberately focused on using explicit classes and not using generic callback functions, since this allow strong type-checking. The design does not limit the usability in since it is known at each call site what extension interface that will be used. In order to handle generic parts of the extension interface handling (such as dispatching calls to all registered extension functions), support libraries will be written. Usage ----- Extension interfaces are identified by a class Here is a typical usage of an extension interface:: Binlog_extension binlog_ext; binlog_ext.start_statement(query); ... binlog_ext.end_statement(query); Note that the extension interface class follow the RAII idiom, and that resources associated with the actual implementation is handled elsewhere. The intention is to not restrict the scalability by unnecessarily requiring that a memory allocator be used as would be the case if the instances were allocated using ``new``.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.