John David Duncan is a consulting engineer with MySQL Inc., and lives in San Francisco, CA.
Learn about new MySQL releases, technical articles, events and more.
By John David Duncan
The MySQL database server features a distinctive modular architecture. The core "upper level" features of the server, such as the network layer, SQL parser, and query optimizer, are separated from the underlying data storage by a well-defined interface. Most MySQL databases store data either in MyISAM tables or in a transactional InnoDB data store, but the server can include several other storage engines in addition to MyISAM and InnoDB. Adding a storage engine to MySQL is a fairly straightforward task.
This article will outline the process of adding a storage engine to MySQL using a hypothetical "ABC" storage engine as an example. Without ever really defining or implementing the ABC data store, we will provide an overview of all the steps that would be necessary to integrate it into MySQL, including all of the files that will be created or modified. The bulk of a storage engine lies in the implementation a new subclass of the C++ class handler. The handler class is described briefly below, but the most essential documentation for it is actually found in the MySQL source tree in the form of an example storage engine.
Any new storage engine also requires a certain fixed set of formulaic
changes, which prepare the core server to interact with the new handler. Those
changes are described here in some detail. The amount of work required beyond
the basic steps varies according to the amount of functionality the new engine
will provide. The most important function is the ability to read data from a
file; for a non-indexed, read-only storage format, that ability would mark the
end of the development work. Writable storage formats require a bit more work
to allow sequential INSERT statements to append to the end of a file, and a bit more work
still to provide random-access UPDATE and DELETE handling. Adding indexed
access or transactional abilities requires more work still.
Beginning with MySQL release 4.1.3, the sql/examples directory of the MySQL
source distribution contains an example storage engine implementation. The
example engine code and comments in the files ha_example.h and ha_example.cc
serve as the primary documentation for MySQL's storage engine API. The example
storage engine cannot actually store any data, but it does compile; supplying
the --with-example-storage-engine option to configure will produce a mysqld
binary containing the example code. The example storage engine first appears in
MySQL versions 4.1.3 and 5.0.1, but the example files and the instructions here
also apply to MySQL 4.0. At some point after MySQL version 5.1, it should be
possible to dynamically link new storage engines into a server, and these
instructions will become at least partially obsolete.
Creating the ABC storage engine for MySQL requires modifying the following 15 files:
config.h.in (or acconfig.h, depending on the source version) acinclude.m4 configure.in
sql/
sql/Makefile.am sql/handler.h sql/handler.cc sql/set_var.cc sql/mysql_priv.h sql/mysqld.cc
sql/examples/
sql/examples/ha_abc.h sql/examples/ha_abc.cc
mysql-test/
mysql-test/include/have_abc.inc mysql-test/r/have_abc.require mysql-test/t/abc.test mysql-test/r/abc.result
The changes in sections 1 and 2 of this list are the formulaic
"housekeeping" modifications, needed to support a new storage engine in the
build system and core server. The .h and .cc files in section 3 implement the
storage engine as a C++ subclass of the class handler; these files are the
focus of most of the real work. The files in section 4 represent new test cases
within MySQL's regression test framework, usable to verify the new handler's
functionality.
With each file, the code for existing storage engines is available as a model for the implementation. Along with the example storage engine files, the MySQL 4.1 and 5.0 source trees contain a fully-functional storage engine called the archive storage engine, which uses the zlib compression library for data storage. The archive storage engine can also be used to help find your place in the source tree. At each step in the development process, searching within files for strings such as "archive", "ARCHIVE", and "EXAMPLE" will reveal a useful model for the needed code.
MySQL uses GNU autoconf to manage cross-platform code portability and
compile-time options. Introducing a new feature into the code tree begins with
modifying some autoconf files.
To implement the ABC storage engine, start by "undefining" a new configure
variable in the file config.h.in (or acconfig.h, as it is called in some source
trees); the variable will be used to include the new storage engine in a
server:
/* Builds Example DB */ #undef HAVE_EXAMPLE_DB /* Builds abc storage engine */ #undef HAVE_ABC_DB
Next, in acinclude.m4, create a new rule for MYSQL_CHECK_ABCDB based on the
exsiting rule for the example engine or the archive engine. All that is
required is a simple copy of the following section, followed by a
search-and-replace operation. The final product will look something like the
following (though the exact rule varies depending on the source tree):
dnl -----------------------------------------------------
dnl Macro: MYSQL_CHECK_ABCDB
dnl Sets HAVE_ABC_DB if --with-abc-storage-engine is used
dnl -----------------------------------------------------
AC_DEFUN([MYSQL_CHECK_ABCDB], [
AC_ARG_WITH([abc-storage-engine],
[
... some lines omitted here ...
;;
esac
])
Finally, add a rule for MYSQL_CHECK_ABCDB to configure.in. Where the file
reads,
MYSQL_CHECK_ISAM MYSQL_CHECK_BDB MYSQL_CHECK_INNODB MYSQL_CHECK_EXAMPLEDB MYSQL_CHECK_ARCHIVEDB MYSQL_CHECK_NDBCLUSTER
... add a line stating:
MYSQL_CHECK_ABCDB
sql/The sql/ subdirectory of the MySQL source tree contains many of the core
source files for parsing and processing queries. Here, the file Makefile.am
requires at least two changes, and possibly three.
examples/ha_abc.h) to the definition of $noinst_HEADERSexamples/ha_abc.cc) to the definition of $mysqld_SOURCES$LDADDThe file handler.h defines the handler class — the layer that sits between
the MySQL server and the underlying storage. This file requires one change:
add DB_TYPE_ABC_DB to the end of the enumerated list definition, db_type. The
exact ordering of the items in the list is of crucial significance, as the
constants defined by the order of the list are used in .frm files to specify a
tables' storage engine.
The file handler.cc, the main implementation file for the handler class,
also requires three changes.
#ifdef HAVE_ABC_DB #include "examples/ha_abc.h" #endif
sys_table_types[]:
struct show_table_type_st sys_table_types[]=
{
{"MyISAM", &have_yes,
"Default type from 3.23 with great performance", DB_TYPE_MYISAM},
...
{"ABC",&have_abc_db,
"ABC storage engine", DB_TYPE_ABC_DB},
};
case DB_TYPE_ABC_DB to the switch in the function get_new_handler():
#ifdef HAVE_ABC_DB
case DB_TYPE_ABC_DB:
return new ha_abc(table);
#endif
Along with the ABC storage engine, you should create an SQL variable called
have_abc. At runtime, have_abc will be available to the SHOW VARIABLES
command, and its value will be set to YES whenever the ABC engine
is enabled in the server. The variable also plays an important role in the
process of developing and debugging the new code; when creating unit test cases
for testing the ABC storage engine, you will configure MySQL's test suite to
check the value of have_abc in order to determine whether to run them.
Still in the sql/ directory, the code for managing SQL variables is in the
file set_var.cc. In this file, create the the variable by adding it to the
appropriate place in the alphabetically-ordered init_vars[] array:
struct show_var_st init_vars[]= {
...
{"have_abc", (char*) &have_abc_db, SHOW_HAVE },
}
Note that while the user-visible name of the variable is the string
have_abc, its internal storage is the variable have_abc_db. The next step in
creating have_abc is to inform the core server about have_abc_db. The file
mysql_priv.h is a "private" header file for the core server; in mysql_priv.h,
add a declaration for have_abc_db as an external variable of type
SHOW_COMP_OPTION. The declaration looks like this:
extern SHOW_COMP_OPTION have_abc_db;
It goes in the block of the file that contains similar definitions for other storage engines:
/* optional things, have_* variables */ extern SHOW_COMP_OPTION have_isam, have_innodb, have_berkeley_db; extern SHOW_COMP_OPTION have_example_db, have_archive_db; extern SHOW_COMP_OPTION have_raid, have_openssl, have_symlink;
Finally, have_abc_db has to be mentioned in two places in mysqld.cc, the
core server implementation file.
SHOW_COMP_OPTION have_example_db, have_archive_db, have_abc_db ;
mysql_init_variables() function:
#ifdef HAVE_ABC_DB have_abc_db= SHOW_OPTION_YES; #else have_abc_db= SHOW_OPTION_NO; #endif
Finally, we get to the real work. The storage engine itself (or, at least,
the interface between MySQL and the underlying data store) is defined in two
files. The templates for these are the files ha_example.h and ha_example.cc in
the sql/examples/ directory.
To build your storage engine, make a copy of ha_example.h named ha_abc.h,
and also copy ha_example.cc to ha_abc.cc. Then perform some global search and
replace operations on each step, to transform the uppercase string EXAMPLE and
the all-lowercase example into the strings ABC and abc.
If you are familiar with the sed utility, you might do this all at once:
sed s/EXAMPLE/ABC/g ha_example.h | sed s/example/abc/g > ha_abc.h sed s/EXAMPLE/ABC/g ha_example.cc | sed s/example/abc/g > ha_abc.cc
If everything up to this point is correctly in place, you can actually build
a MySQL server containing ABC. The new handler won't do anything (it's
implemented entirely in stubs, copied from the ha_example files), but it should
at least compile cleanly.
Because we have edited some autoconf files, the first step in building the
new server is to run the suite of autoconf commands and rebuild any
dependencies. There are four commands required:
aclocal autoheader autoconf automake
Then, configure the build:
./configure --with-abc-storage-engine \
--prefix=/home/jdd/mysql-builds/abc \
--without-innodb
Notice the three parameters in this command. --with-abc-storage-engine is
the switch to enable the in-progress storage engine (and should appear in the
documentation, if you should run ./configure --help.) You created it when you
edited acinclude.m4. --prefix=/home/jdd/mysql-builds/abc allows the build to
be installed in a special development directory, so that it does not conflict
with some other installation of MySQL on the same machine. Finally,
--without-innodb disables InnoDB, which allows the test server to start up and
shut down more quickly while you're doing development work on it.
When configuration has finished, run make. If everything has gone well, the
server will compile without errors, and make install will install a copy into
the designated build directory. You can even go into the build directory,
start up the server, and check out the new have_abc variable:
% cd ~/myqsl-builds/abc % share/mysql/mysql.server start % bin/mysql Welcome to the MySQL monitor... mysql> show variables like 'have_abc'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | have_abc | YES | +---------------+-------+ 1 row in set (0.01 sec)
You can even create a table with type=ABC, since the base handler class
(implemented in handler.cc) is able to create a .frm file for the table.
However, you cannot go so far as trying to insert any data into it.
mysql> use test; Database changed mysql> create table t1 (a int not null) type=abc; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> insert into t1 values (1); ERROR 1031 (HY000): Table storage engine for 't1' doesn't have this option
mysql-test/Back in the source directory, it's time to create some test cases. This is
test-driven development: create a test case for a new feature, and then write
code until you can pass it. The README file in the mysql-test/ directory gives
an overview of MySQL's regression test framework, which is based on a
specialized MySQL client application, mysqltest, designed to execute a query
and compare its results to a known result file.
The test suite performs a lot of conditional execution in order to choose
which tests to run according to what set of features are compiled in. Indeed,
the first file you should create is one to test whether ABC has been compiled
into the server. In the directory mysql-test/include/, look at the file called
have_archive.inc (we use the archive storage engine as our example, now, since
the example engine has no test cases). The file is only a few lines long. Create a
copy of it called have_abc.inc and edit this as needed to test for have_abc
rather than have_archive. Similarly, in the results directory
mysql-test/include/r/, make a copy of have_archive.require called
have_abc.require and perform the same change.
Next, create the actual test case for ABC. In the test-case directory
— mysql-test/t/ — create a file called abc.test like this:
-- source include/have_abc.inc --disable_warnings drop table if exists t1; --enable_warnings CREATE TABLE t1 ( col1 int unsigned NOT NULL ) ENGINE=abc; INSERT INTO t1 VALUES (42);
In this file, the lines beginning with two dashes denote special mysqltest
commands, and all other lines contain plain SQL. Once the test exists, you can
run it using the mysql-test-run command:
% ./mysql-test-run abc
The test will fail until you have succesfully implemented INSERT statements, but once
it succeeds, you will be able to use the --record switch to mysql-test-run to
record the results for future use. Running mysql-test-run --record abc causes
the test results to be stored under the r/ subdirectory in a file called
abc.result.
ha_abc filesNext, it is time to really read the ha_example.h and ha_example.cc files,
which serve not only as a template for new handler code but also as the primary
documentation for implementing a handler. In many ways, they begin where this
document leaves off.
When you are ready to begin coding for the handler, one early step is to
define table_flags() and index_flags() in ha_abc.h, using the set of flags
defined in handler.h to delineate precisely which features are supported in
ABC. The CSV storage engine, for instance — ha_tina.h — defines them like
this:
ulong table_flags() const
{
return (HA_REC_NOT_IN_SEQ | HA_NOT_EXACT_COUNT |
HA_NO_AUTO_INCREMENT );
}
ulong index_flags(uint idx, uint part, bool all_parts) const
{
/* We will never have indexes */
return 0;
}
The functions implemented in the handler class provide what is essentially a
row-level interface to the data. Just above the handler in the server stack is
an execution plan, generated by the optimizer from the parsed SQL query; below
it, a set of data access calls such as Berkeley DB, zlib, or standard
filesystem I/O. Many of a handler's methods resemble the I/O calls: open() to
open a table; update_row() changes a row of data; store_lock() gets a
particular sort of lock.
Here's the minimum set of five functions you must implement in order to perform a full table scan:
| Function | Description |
|---|---|
open() | Open the table |
info() | Provide the optimizer with some information about the table |
rnd_init() | Prepare for a table scan |
rnd_next() | Read the next row into a buffer |
extra() | Called with HA_EXTRA_RESET to reset the table when the scan is complete |
The most serious of these is rnd_next(), which is passed a pointer to a row
buffer and is expected to fill in that buffer with a row of data encoded in
MySQL's internal data format. Second to providing access to the underlying
storage mechanism beneath the handler, the handler's biggest task is that of
converting a row of data between its underlying format and the internal format
used by the server.
The server uses three data formats: fixed-length rows, variable-length rows,
and variable-length rows with BLOB pointers. In each format, the columns
appear in the order in which they were defined by the CREATE TABLE statement.
(The table definition is stored in the .frm file, and the optimizer and the
handler are both able access table metadata from the same source, its TABLE
structure).
Each format begins with a "NULL bitmap," of one bit per column. A table with as many as eight columns will have a 1-byte bitmap; a table with nine to sixteen columns will have a two-byte bitmap, and so forth. To indicate that a particular value is NULL, you set that column's NULL bit to 1.
After the NULL bitmap come the columns, one by one. Each column is of the
size inidicated in the "MySQL Data Types" section of the MySQL manual. In the
server, column data types are defined in the sql/field.cc file. In the fixed
length row format, the columns are simply laid out one by one. In a
variable-length row, VARCHAR columns are coded as a one-byte length, followed
by a string of characters. In a variable length row with BLOB columns, each blob is
represented by two parts: first an integer representing the actual size of the
BLOB, and then a pointer to the BLOB in memory.
Examples of row conversion (or "packing") can be found by starting at
rnd_next() in any table handler. In ha_tina.cc, for example, the code in
find_current_row() illustrates how the TABLE structure (pointed to by table)
and a String object (named buffer) can be used to pack character data from a
CSV file. Writing a row back to disk requires the opposite conversion,
unpacking from the internal format.
You should now have a high-level understanding of MySQL storage engines, a reference guide to the steps needed to implement a new one, and knowledge of where to find the more detailed documentation for the handler API and internal data formats.
