WL#5286: Support for server-specific extension directory

Affects: Server-Prototype Only — Status: On-Hold

Description
High Level Architecture
Low Level Design

Rationale
=========

With the introduction of service interfaces, several new interfaces
have started to emerge. These interfaces allow the server to be
extended dynamically by loading dynamic libraries (a.k.a. DSO) into
the server.

In order to support the extension of the server using scripts
that are executed on certain events, it is necessary to have a
structure for where script files are placed and how they are loaded.


Background
==========

It is desirable to be able to extend the server with support for
executing small (or not so small) scripts whenever certain events
occur. Typical examples of such events are:

* When the network is lost
* When a client connects
* When a transaction is processed

In order to do this, it is necessary to add hooks inside the server
that are called when these events occur, but in order to allow the
hooks to be executed in languages such as Python, Perl, Lua, or Tcl,
it is necessary to have a method for installing plugins with this
support.


Description
===========

Within the server, we have a number of *observer interfaces*, each consisting of
a set of functions as described in WL#4008. For each observer interface, it is
possible to register a observer that will be called whenever the event occur. 
For example, we could have a transaction observer interface with the functions
*transaction start*, *transaction commit*, and *transaction abort*
that is called whenever a transaction starts, commits, or aborts,
respectively.

With scriptable replication it will be possible to load support into
the server for executing small scripts in different languages as a
reaction to various events occurring inside the server.  For the
purpose of this worklog, we will use Python support as an example, but
the same ideas apply to other scripting languages such as Lua, Perl,
Tcl, JavaScript, etc.

To arrange so that a specific language, such as Python, is called as a
result of certain events occurring inside the server it is necessary to
load an *adapter library* to support this language (Python in our
case).  The library would contain adapters that register themselves
for each observer interface and which in turn then will perform the
necessary job of calling the Python scripts. This adapter is generic
and distributed and written by, for example, MySQL.

The adapter library for Python will be developed in WL#5288.

Since the database administrator need to configure the server to
his/her specific needs, it is necessary to have a structure where
the scripts can be installed.

The purpose of this worklog is to:

* Specify where and how these scripts are organized and installed

* Specify service interfaces for adapter libraries to use

Open Issues
===========

None

Resolved Issues
===============

* What caching strategy should be used?

  Mats: Loading scripts on server startup (or plugin load) was chosen.


Observer Interfaces
===================

Within the server, there is a number of observer interfaces each consisting of
a set of functions as described in WL#4008. At designated points in the server
execution, these functions will be called.

For example, there could be an observer interface defined as follows:

interface transaction {
  int begin(MYSQL_THD, TRANS_ID);
  int prepare(MYSQL_THD, TRANS_ID);
  int commit(MYSQL_THD, TRANS_ID);
  int abort(MYSQL_THD, TRANS_ID);
}

This is a very simplified interface for observing transaction processing (in
reality, there are several more functions necessary to properly trace
transaction handling). Note that the introduction of these interfaces are not
part of the worklog.

Each interface is identified by a multi-position name called the *(observer)
interface identifier*. For example, the observer interface to handle relay-log
reading and writing could be identified by the name 'replication/relay_log'.
Note that each interface consists of a set of functions.

The multi-position name consists of a sequence of / separated C identifiers.
The name is case-insensitive, so "Replication/Relay_Log" and
"replication/relay_log" represent the same extension interface.


File Locations
==============

In order to configure the server and allow the replication threads to read and
compile scripts, the scripts has to be placed at a location that is not
accessible to anybody except administrators with access to the machine. For
example, placing the scripts at a location accessible using ``LOAD DATA INFILE``
is not advisable since that poses a potential security risk. (Currently, ``LOAD
DATA INFILE`` can be used to read a file from anywhere.)

For this purpose, we assume that the extensions should be placed in a directory
parallel to the ``my.cnf`` file, meaning:

  ===================  ======================================================
  Filename             Purpose
  ===================  ======================================================
  /etc/mysql/ext       Global extensions directory (since 5.1.15)
  *SYSCONFDIR*/ext     Global extensions directory
  $MYSQL_HOME/ext      Server-specific extensions directory
  *EXTRADIR*/ext       Extra extensions, if ``--defaults-extra-file=path``
                       was specified.
  ===================  ======================================================

The *SYSCONFDIR* represents the directory specified with the
``--sysconfdir`` option to configure when MySQL was built. By default, this is
the ``etc`` directory located under the compiled-in installation directory. Note
that the sysconfdir can be specified in addition to the ``/etc/mysql/ext``
directory.

*EXTRADIR* represents the directory where the file specified with
``--defaults-extra-file=path`` is.

The $MYSQL_HOME/ext directory is relative to the MYSQL_HOME, which is the
installation directory of the server and has the value described in

  http://dev.mysql.com/doc/refman/5.1/en/option-files.html


Mapping interface identifiers to file names
-------------------------------------------

In order to support a generic and modular structure for the various scripts, we
introduce a few new concepts and compute the file to read in the
following manner:

- For each observer interface, we associate a *base file name* that represents
  a location in the file system. The transformation of a an observer interface
  identifier to a base file name can be handled different depending on the
  installation. For our purposes, we map the identifier to a file name under
  .../ext/*observer interface identifier*. For example, on a Unix system, the
  identifier 'replication/relay_log' could be mapped to the base file name
  '/etc/mysql/ext/replication/relay_log'.

- For each base file name, we add an extension to the file that is based on
  the extension used. So, for Lua extensions, the file name extension (without
  the dot) is 'lua', while for Perl, they will be 'pl', and for shared
  libraries, they will be 'so'.

For each extension handler (see below), each observer interface thus has a
one-to-one mapping to a file in the file system and each file contain a set of
function definitions for that extension handler corresponding to the functions
in the observer interface. It is therefore possible to register multiple files
for each observer interface as long as they are for different extension handlers.

This design allows us to separate the concepts as follows:

- We separate the call site from the observer interface function. This is
  important since there can conceptually be several point in the code that map
  to the same observer interface function. For example, the case of "committing
  a transaction" is done at several places in the code, but it will call the
  same function in each case.

- We separate the files from the server configuration, allowing
  use to keep different scripts for different servers.

- We separate the languages used for extending the server from what is
  actually available on the server. For example, this allow us to install
  multiple solutions to the same problems (for example, both a Perl and Lua
  script to handle the problem) and select the correct file based on what
  the server can handle (for example, there might be a Perl support installed,
  but no Lua support). 


Reading and compiling files
---------------------------

When locating all observer interface files to load, only the "most specific" of
the files is used (more specific directories are later in the list above). This
will allow the global configuration directory to contain default
implementations, but allow servers to override this behavior by providing their
own version of an extension interface.

When loading/compiling the files, there are three basic strategies on when to
read information about what extensions there are and when to load the files:

1. When the server is started
2. When a thread is started, e.g., a client thread or a slave thread
3. Whenever an extension function is called

If approach 1 is picked, it is necessary to load and compile all the scripts
when the server starts. This means that it is not possible to add new extensions
without restarting the server. This is similar to how my.cnf file is handled, so
it is a natural approach.

If approach 2 is picked, the scripts are loaded/compiled when a thread starts.
Even though this seems easy, it would require tracking different versions of the
compiled script since otherwise different threads might potentially have
different behaviors. The alternative would be that the behavior of a thread
might change because another thread is started.

Approach 3 would ensure consistency among the threads and allow extensions to be
changed on-line. It will, however, require file access with each call to check
if the file requires loading.

For this reason, we pick approach 1, which is consistent and also the most
efficient.  We make an exception to when a new extension handler plugin is
loaded using the ``INSTALL PLUGIN`` command and call the init function in this
case as if the server was just started.


Loadable Scripting Support
==========================

In order to support loadable scripting support it is necessary to allow a plugin
to register that it supports a certain language as well as allowing the server
to detect and load all recognizable scripts on startup.

To communicate with the server, a plugin can register an *extension handler*,
which is just a structure with callback functions.

Procedure for registering scripting support
-------------------------------------------

In order to be able to load scripting support into the server, it is necessary
to have means for informing the server that the plugin provide extension
handling for one or more extensions.

For the purpose of this worklog, the interfaces of Python, Perl, Lua, and Tcl
has been studied, but the interface should be applicable to other languages as well.

The initialization and de-initialization of a plugin is already handled by the
plugin's init() and deinit() functions, so it is not necessary to incorporate
these into the interface. Typically, a plugin would create the interpreter and
register an extension handler for one (or more) extensions.

In addition, plugin have to register observers to any observer interfaces that
it has support for.

When receiving a extension handler registration from a plugin, the server notes
down the extension supported by the extension handler as well as a handle (a
pointer) to the extension handler.
 

Procedure for loading scripts
-----------------------------

When starting the server (or installing a plugin that adds scripting support),
it is necessary to find all files that needs to be loaded.

Collect the files that are to be loaded by recursing over all files in the
directories given above. If there are two files with the same extension
identifier, the latest one (most specific one) in the table above is used and
the other name discarded.

For each file name collected, extract the extension of the file and call the
load function of the extension handler for that extension. If there is no
extension handler registered for that extension, skip the file and proceed with
the next file.

Note that if files are available for multiple plugins and they are all loaded,
all scripts will be loaded.

For example, if there is a replication/relay_log.lua (Lua) and a
replication/relay_log.pl (Perl) available and both the Lua and Perl scripting
support is installed, both scripts will be loaded into the server and registered
with the observer interface.

Extension handling service interface
====================================

Synopsis
--------

::

  struct Extension_handler {
      const char *extension;
      int (*load_file)(const char *filename);
  };

  int extension_register(const Extension_handler *handler)
  int extension_unregister(const Extension_handler *handler)

Field descriptions
------------------

extension
  This is the extension (without the dot) for files that this extension handler
  is able to handle. It is not possible to register multiple extension handlers
  for the same extension.

load_file
  This is a pointer to a function that will be called when the file should be 
  loaded/compiled.  This function will be called with the fully qualified file
  name of the file to load.


Function descriptions
---------------------

extension_register(handler)
  This function is called to register a new extension handler.

extension_unregister
  This function is called to unregister an already registered extension handler


Security issues
===============

If a file to be loaded is owned by a different user from the one the server
executes as, it might be possible to trick the server into loading potentially
malicious scripts. For that reason, the following checks are made by the server
before calling ``load_file``:

- The owner of the file should be the same as the user id of the process
  running the server.

- The file to be loaded should only be writable by the owner and not by anybody
  else.