MySQL Internals Custom Engine

From MySQLForge Wiki

Jump to: navigation, search

← Back to MySQL Internals overview page

Contents

[edit] Writing a Custom Storage Engine

With MySQL 5.1, MySQL AB has introduced a pluggable storage engine architecture that makes it possible to create new storage engines and add them to a running MySQL server without recompiling the server itself.

This architecture makes it easier to develop new storage engines for MySQL and deploy them.

This chapter is intended as a guide to assist you in developing a storage engine for the new pluggable storage engine architecture.

[edit] Additional resources

[edit] Overview

The MySQL server is built in a modular fashion:

Image:Custom-engine-overview.png

The storage engines manage data storage and index management for MySQL. The MySQL server communicates with the storage engines through a defined API.

Each storage engine is a class with each instance of the class communicating with the MySQL server through a special handler interface.

Handlers are instanced on the basis of one handler for each thread that needs to work with a specific table. For example: If three connections all start working with the same table, three handler instances will need to be created.

Once a handler instance is created, the MySQL server issues commands to the handler to perform data storage and retrieval tasks such as opening a table, manipulating rows, and managing indexes.

Custom storage engines can be built in a progressive manner: Developers can start with a read-only storage engine and later add support for INSERT, UPDATE, and DELETE operations, and even later add support for indexing, transactions, and other advanced operations.

[edit] Creating Storage Engine Source Files

The easiest way to implement a new storage engine is to begin by copying and modifying the EXAMPLE storage engine. The files ha_example.cc and ha_example.h can be found in the storage/example directory of the MySQL 5.1 source tree. For instructions on how to obtain the 5.1 source tree, see MySQL Installation Using a Source Distribution.

When copying the files, change the names from ha_example.cc and ha_example.h to something appropriate to your storage engine, such as ha_foo.cc and ha_foo.h.

After you have copied and renamed the files you must replace all instances of EXAMPLE and example with the name of your storage engine. If you are familiar with sed, these steps can be done automatically (in this example, the name of your storage engine would be FOO):

sed s/EXAMPLE/FOO/g ha_example.h | sed s/example/foo/g ha_foo.h
sed s/EXAMPLE/FOO/g ha_example.cc | sed s/example/foo/g ha_foo.cc

[edit] Creating the handlerton

The handlerton (short for handler singleton) defines the storage engine and contains method pointers to those methods that apply to the storage engine as a whole, as opposed to methods that work on a per-table basis. Some examples of such methods include transaction methods to handle commits and rollbacks.

Here's an example from the EXAMPLE storage engine:

handlerton example_hton= {
  "EXAMPLE",
  SHOW_OPTION_YES,
  "Example storage engine",
  DB_TYPE_EXAMPLE_DB,
  NULL,    /* Initialize */
  0,       /* slot */
  0,       /* savepoint size. */
  NULL,    /* close_connection */
  NULL,    /* savepoint */
  NULL,    /* rollback to savepoint */
  NULL,    /* release savepoint */
  NULL,    /* commit */
  NULL,    /* rollback */
  NULL,    /* prepare */
  NULL,    /* recover */
  NULL,    /* commit_by_xid */
  NULL,    /* rollback_by_xid */
  NULL,    /* create_cursor_read_view */
  NULL,    /* set_cursor_read_view */
  NULL,    /* close_cursor_read_view */
  example_create_handler,    /* Create a new handler */
  NULL,    /* Drop a database */
  NULL,    /* Panic call */
  NULL,    /* Release temporary latches */
  NULL,    /* Update Statistics */
  NULL,    /* Start Consistent Snapshot */
  NULL,    /* Flush logs */
  NULL,    /* Show status */
  NULL,    /* Replication Report Sent Binlog */
  HTON_CAN_RECREATE
};

This is the definition of the handlerton from handler.h:

typedef struct
  {
    const char *name;
    SHOW_COMP_OPTION state;
    const char *comment;
    enum db_type db_type;
    bool (*init)();
    uint slot;
    uint savepoint_offset;
    int  (*close_connection)(THD *thd);
    int  (*savepoint_set)(THD *thd, void *sv);
    int  (*savepoint_rollback)(THD *thd, void *sv);
    int  (*savepoint_release)(THD *thd, void *sv);
    int  (*commit)(THD *thd, bool all);
    int  (*rollback)(THD *thd, bool all);
    int  (*prepare)(THD *thd, bool all);
    int  (*recover)(XID *xid_list, uint len);
    int  (*commit_by_xid)(XID *xid);
    int  (*rollback_by_xid)(XID *xid);
    void *(*create_cursor_read_view)();
    void (*set_cursor_read_view)(void *);
    void (*close_cursor_read_view)(void *);
    handler *(*create)(TABLE *table);
    void (*drop_database)(char* path);
    int (*panic)(enum ha_panic_function flag);
    int (*release_temporary_latches)(THD *thd);
    int (*update_statistics)();
    int (*start_consistent_snapshot)(THD *thd);
    bool (*flush_logs)();
    bool (*show_status)(THD *thd, stat_print_fn *print, enum ha_stat_type stat);
    int (*repl_report_sent_binlog)(THD *thd, char *log_file_name, my_off_t end_offset);
    uint32 flags;
  } handlerton;

There are a total of 30 handlerton elements, only a few of which are mandatory (specifically the first four elements and the create() method).

  1. The name of the storage engine. This is the name that will be used when creating tables (CREATE TABLE ... ENGINE = FOO;).
  2. The value to be displayed in the status field when a user issues the SHOW STORAGE ENGINES command.
  3. The storage engine comment, a description of the storage engine displayed when using the SHOW STORAGE ENGINES command.
  4. An integer that uniquely identifies the storage engine within the MySQL server. The constants used by the built-in storage engines are defined in the handler.h file. Custom engines should use DB_TYPE_UNKOWN.
  5. A method pointer to the storage engine initializer. This method is only called once when the server starts to allow the storage engine class to perform any housekeeping that is necessary before handlers are instanced.
  6. The slot. Each storage engine has its own memory area (actually a pointer) in the thd, for storing per-connection information. It is accessed as thd->ha_data[foo_hton.slot]. The slot number is initialized by MySQL after foo_init() is called. For more information on the thd, see #Implementing ROLLBACK.
  7. The savepoint offset. To store per-savepoint data the storage engine is provided with an area of a requested size (0, if no savepoint memory is necessary).

The savepoint offset must be initialized statically to the size of the needed memory to store per-savepoint information. After foo_init it is changed to be an offset to the savepoint storage area and need not be used by the storage engine. For more information, see #Specifying the Savepoint Offset.

  1. Used by transactional storage engines, clean up any memory allocated in their slot.
  2. A method pointer to the handler's savepoint_set() method. This is used to create a savepoint and store it in memory of the requested size.

For more information, see #Implementing the savepoint_set Method.

  1. A method pointer to the handler's rollback_to_savepoint() method. This is used to return to a savepoint during a transaction. It's only populated for storage engines that support savepoints.

For more information, see #Implementing the savepoint_rollback() Method.

  1. A method pointer to the handler's release_savepoint() method. This is used to release the resources of a savepoint during a transaction. It's optionally populated for storage engines that support savepoints.

For more information, see #Implementing the savepoint_release() Method.

  1. A method pointer to the handler's commit() method. This is used to commit a transaction. It's only populated for storage engines that support transactions.

For more information, see #Implementing COMMIT.

  1. A method pointer to the handler's rollback() method. This is used to roll back a transaction. It's only populated for storage engines that support transactions.

For more information, see #Implementing ROLLBACK.

  1. Required for XA transactional storage engines. Prepare transaction for commit.
  2. Required for XA transactional storage engines. Returns a list of transactions that are in the prepared state.
  3. Required for XA transactional storage engines. Commit transaction identified by XID.
  4. Required for XA transactional storage engines. Rollback transaction identified by XID.
  5. Called when a cursor is created to allow the storage engine to create a consistent read view.
  6. Called to switch to a specific consistent read view.
  7. Called to close a specific read view.
  8. MANDATORY - Construct and return a handler instance.

For more information, see #Handling Handler Instantiation.

  1. Used if the storage engine needs to perform special steps when a schema is dropped (such as in a storage engine that uses tablespaces).
  2. Cleanup method called during server shutdown and crashes.
  3. InnoDB-specific method.
  4. InnoDB-specific method called at start of SHOW ENGINE InnoDB STATUS.
  5. Method called to begin a consistent read.
  6. Called to indicate that logs should be flushed to reliable storage.
  7. Provides human readable status information on the storage engine for SHOW ENGINE foo STATUS.
  8. InnoDB-specific method used for replication.
  9. Handlerton flags that indicate the capabilities of the storage engine. Possible values are defined in sql/handler.h and copied here:
#define HTON_NO_FLAGS                 0
 #define HTON_CLOSE_CURSORS_AT_COMMIT (1 << 0)
 #define HTON_ALTER_NOT_SUPPORTED     (1 << 1)
 #define HTON_CAN_RECREATE            (1 << 2)
 #define HTON_FLUSH_AFTER_RENAME      (1 << 3)
 #define HTON_NOT_USER_SELECTABLE     (1 << 4)
 

HTON_ALTER_NOT_SUPPORTED is used to indicate that the storage engine cannot accept ALTER TABLE statements. The FEDERATED storage engine is an example. HTON_FLUSH_AFTER_RENAME indicates that FLUSH LOGS must be called after a table rename. HTON_NOT_USER_SELECTABLE indicates that the storage engine should not be shown when a user calls SHOW STORAGE ENGINES. Used for system storage engines such as the dummy storage engine for binary logs.

[edit] Handling Handler Instantiation

The first method call your storage engine needs to support is the call for a new handler instance.

Before the handlerton is defined in the storage engine source file, a method header for the instantiation method must be defined. Here is an example from the CSV engine:

static handler* tina_create_handler(TABLE *table);

As you can see, the method accepts a pointer to the table the handler is intended to manage, and returns a handler object.

After the method header is defined, the method is named with a method pointer in the create() handlerton element, identifying the method as being responsible for generating new handler instances.

Here is an example of the MyISAM storage engine's instantiation method:

static handler *myisam_create_handler(TABLE *table)
  {
    return new ha_myisam(table);
  }

This call then works in conjunction with the storage engine's constructor. Here is an example from the FEDERATED storage engine:

ha_federated::ha_federated(TABLE *table_arg)
  :handler(&federated_hton, table_arg),
  mysql(0), stored_result(0), scan_flag(0),
  ref_length(sizeof(MYSQL_ROW_OFFSET)), current_position(0)
  {}

And here's one more example from the EXAMPLE storage engine:

ha_example::ha_example(TABLE *table_arg)
  :handler(&example_hton, table_arg)
  {}

The additional elements in the FEDERATED example are extra initializations for the handler. The minimum implementation required is the handler() initialization shown in the EXAMPLE version.

[edit] Defining Filename Extensions

Storage engines are required to provide the MySQL server with a list of extensions used by the storage engine with regard to a given table, its data and indexes.

Extensions are expected in the form of a null-terminated string array. The following is the array used by the CSV engine:

static const char *ha_tina_exts[] = {
  ".CSV",
  NullS
};

This array is returned when the [custom-engine.html#custom-engine-api-reference-bas_ext bas_ext()] method is called:

const char **ha_tina::bas_ext() const
{
  return ha_tina_exts;
}

By providing extension information you can also omit implementing DROP TABLE functionality as the MySQL server will implement it for you by closing the table and deleting all files with the extensions you specify.

[edit] Creating Tables

Once a handler is instanced, the first operation that will likely be required is the creation of a table.

Your storage engine must implement the [custom-engine.html#custom-engine-api-reference-create create()] virtual method:

virtual int create(const char *name, TABLE *form, HA_CREATE_INFO *info)=0;

This method should create all necessary files but does not need to open the table. The MySQL server will call for the table to be opened later on.

The *name parameter is the name of the table. The *form parameter is a TABLE structure that defines the table and matches the contents of the tablename.frm file already created by the MySQL server. Storage engines must not modify the tablename.frm file.

The *info parameter is a structure containing information on the CREATE TABLE statement used to create the table. The structure is defined in handler.h and copied here for your convenience:

typedef struct st_ha_create_information
{
    CHARSET_INFO *table_charset, *default_table_charset;
    LEX_STRING connect_string;
    const char *comment,*password;
    const char *data_file_name, *index_file_name;
    const char *alias;
    ulonglong max_rows,min_rows;
    ulonglong auto_increment_value;
    ulong table_options;
    ulong avg_row_length;
    ulong raid_chunksize;
    ulong used_fields;
    SQL_LIST merge_list;
    enum db_type db_type;
    enum row_type row_type;
    uint null_bits;                       /* NULL bits at start of record */
    uint options;                         /* OR of HA_CREATE_ options */
    uint raid_type,raid_chunks;
    uint merge_insert_method;
    uint extra_size;                      /* length of extra data segment */
    bool table_existed;                /* 1 in create if table existed */
    bool frm_only;                        /* 1 if no ha_create_table() */
    bool varchar;                         /* 1 if table has a VARCHAR */
} HA_CREATE_INFO;

A basic storage engine can ignore the contents of *form and *info, as all that is really required is the creation and possibly the initialization of the data files used by the storage engine (assuming the storage engine is file-based).

For example, here is the implementation from the CSV storage engine:

int ha_tina::create(const char *name, TABLE *table_arg,
  HA_CREATE_INFO *create_info)
{
    char name_buff[FN_REFLEN];
    File create_file;
    DBUG_ENTER("ha_tina::create");

    if ((create_file= my_create(fn_format(name_buff, name, "", ".CSV",
          MY_REPLACE_EXT|MY_UNPACK_FILENAME),0,
          O_RDWR | O_TRUNC,MYF(MY_WME))) < 0)
    DBUG_RETURN(-1);

    my_close(create_file,MYF(0));

    DBUG_RETURN(0);
}

In the preceding example, the CSV engine does not refer at all to the *table_arg or *create_info parameters, but simply creates the required data files, closes them, and returns.

The my_create and my_close methods are helper methods defined in src/include/my_sys.h.

[edit] Opening a Table

Before any read or write operations are performed on a table, the MySQL server will call the [custom-engine.html#custom-engine-api-reference-open handler::open()] method to open the table data and index files (if they exist).

int open(const char *name, int mode, int test_if_locked);

The first parameter is the name of the table to be opened. The second parameter determines what file to open or what operation to take. The values are defined in handler.h and are copied here for your convenience:

O_RDONLY  -  Open read only
O_RDWR    -  Open read/write

The final option dictates whether the handler should check for a lock on the table before opening it. The following options are available:

#define HA_OPEN_ABORT_IF_LOCKED   0   /* default */
 #define HA_OPEN_WAIT_IF_LOCKED    1
 #define HA_OPEN_IGNORE_IF_LOCKED  2
 #define HA_OPEN_TMP_TABLE         4   /* Table is a temp table */
 #define HA_OPEN_DELAY_KEY_WRITE   8   /* Don't update index */
 #define HA_OPEN_ABORT_IF_CRASHED  16
 #define HA_OPEN_FOR_REPAIR        32  /* open even if crashed */
 

Typically your storage engine will need to implement some form of shared access control to prevent file corruption is a multi-threaded environment. For an example of how to implement file locking, see the get_share() and free_share() methods of sql/examples/ha_tina.cc.

[edit] Implementing Basic Table Scanning

The most basic storage engines implement read-only table scanning. Such engines might be used to support SQL queries of logs and other data files that are populated outside of MySQL.

The implementation of the methods in this section provide the first steps toward the creation of more advanced storage engines.

The following shows the method calls made during a nine-row table scan of the CSV engine:

ha_tina::store_lock
ha_tina::external_lock
ha_tina::info
ha_tina::rnd_init
ha_tina::extra - ENUM HA_EXTRA_CACHE   Cache record in HA_rrnd()
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::rnd_next
ha_tina::extra - ENUM HA_EXTRA_NO_CACHE   End caching of records (def)
ha_tina::external_lock
ha_tina::extra - ENUM HA_EXTRA_RESET   Reset database to after open

[edit] Implementing the store_lock() Method

The [custom-engine.html#custom-engine-api-reference-store_lock store_lock()] method is called before any reading or writing is performed.

Before adding the lock into the table lock handler mysqld calls store lock with the requested locks. Store lock can modify the lock level, for example change blocking write lock to non-blocking, ignore the lock (if we don't want to use MySQL table locks at all) or add locks for many tables (like we do when we are using a MERGE handler).

When releasing locks, store_lock() is also called. In this case, one usually doesn't have to do anything.

If the argument of store_lock is TL_IGNORE, it means that MySQL requests the handler to store the same lock level as the last time.

The potential lock types are defined in includes/thr_lock.h and are copied here:

enum thr_lock_type
{
  TL_IGNORE=-1,
  TL_UNLOCK,                  /* UNLOCK ANY LOCK */
  TL_READ,                    /* Read lock */
  TL_READ_WITH_SHARED_LOCKS,
  TL_READ_HIGH_PRIORITY,      /* High prior. than TL_WRITE. Allow concurrent insert */
  TL_READ_NO_INSERT,          /* READ, Don't allow concurrent insert */
  TL_WRITE_ALLOW_WRITE,       /*   Write lock, but allow other threads to read / write. */
  TL_WRITE_ALLOW_READ,        /* Write lock, but allow other threads to read / write. */
  TL_WRITE_CONCURRENT_INSERT, /* WRITE lock used by concurrent insert. */
  TL_WRITE_DELAYED,           /* Write used by INSERT DELAYED.  Allows READ locks */
  TL_WRITE_LOW_PRIORITY,      /* WRITE lock that has lower priority than TL_READ */
  TL_WRITE,                   /* Normal WRITE lock */
  TL_WRITE_ONLY               /* Abort new lock request with an error */
};

Actual lock handling will vary depending on your locking implementation and you may choose to implement some or none of the requested lock types, substituting your own methods as appropriate. The following is the minimal implementation, for a storage engine that does not need to downgrade locks:

THR_LOCK_DATA **ha_tina::store_lock(THD *thd,
                                     THR_LOCK_DATA **to,
                                     enum thr_lock_type lock_type)
{
   if (lock_type != TL_IGNORE && lock.type == TL_UNLOCK)
     lock.type=lock_type;
   *to++= &lock;
   return to;
}

See also ha_myisammrg::store_lock() for a more complex implementation.

[edit] Implementing the external_lock() Method

The [custom-engine.html#custom-engine-api-reference-external_lock external_lock()] method is called at the start of a statement or when a LOCK TABLES statement is issued.

Examples of using external_lock() can be found in the sql/ha_innodb.cc file, but most storage engines can simply return 0, as is the case with the EXAMPLE storage engine:

int ha_example::external_lock(THD *thd, int lock_type)
{
   DBUG_ENTER("ha_example::external_lock");
   DBUG_RETURN(0);
}

[edit] Implementing the rnd_init() Method

The method called before any table scan is the [custom-engine.html#custom-engine-api-reference-rnd_init rnd_init()] method. The rnd_init() method is used to prepare for a table scan, resetting counters and pointers to the start of the table.

The following example is from the CSV storage engine:

  int ha_tina::rnd_init(bool scan)
{
      DBUG_ENTER("ha_tina::rnd_init");

      current_position= next_position= 0;
      records= 0;
      chain_ptr= chain;

      DBUG_RETURN(0);
}

If the scan parameter is true, the MySQL server will perform a sequential table scan, if false the MySQL server will perform random reads by position.

[edit] Implementing the info() Method

Prior to commencing a table scan, the [custom-engine.html#custom-engine-api-reference-info info()] method is called to provide extra table information to the optimizer.

The information required by the optimizer is not given through return values but instead by populating certain properties of the storage engine class, which the optimizer reads after the info() call returns.

In addition to being used by the optimizer, many of the values set during a call to the info() method are also used for the SHOW TABLE STATUS statement.

The public properties are listed in full in sql/handler.h; several of the more common ones are copied here:

ulonglong data_file_length;      /* Length off data file */
ulonglong max_data_file_length;  /* Length off data file */
ulonglong index_file_length;
ulonglong max_index_file_length;
ulonglong delete_length;         /* Free bytes */
ulonglong auto_increment_value;
ha_rows records;                 /* Records in table */
ha_rows deleted;                 /* Deleted records */
ulong raid_chunksize;
ulong mean_rec_length;           /* physical reclength */
time_t create_time;              /* When table was created */
time_t check_time;
time_t update_time;

For the purposes of a table scan, the most important property is records, which indicates the number of records in the table. The optimizer will perform differently when the storage engine indicates that there are zero or one rows in the table than it will when there are two or more. For this reason it is important to return a value of two or greater when you do not actually know how many rows are in the table before you perform the table scan (such as in a situation where the data may be externally populated).

[edit] Implementing the extra() Method

Prior to some operations, the [custom-engine.html#custom-engine-api-reference-extra extra()] method is called to provide extra hints to the storage engine on how to perform certain operations.

Implementation of the hints in the extra call is not mandatory, and most storage engines return 0:

int ha_tina::extra(enum ha_extra_function operation)
{
   DBUG_ENTER("ha_tina::extra");
   DBUG_RETURN(0);
}

[edit] Implementing the rnd_next() Method

After the table is initialized, the MySQL server will call the handler's [custom-engine.html#custom-engine-api-reference-rnd_next rnd_next()] method once for every row to be scanned until the server's search condition is satisfied or an end of file is reached, in which case the handler returns HA_ERR_END_OF_FILE.

The rnd_next() method takes a single byte array parameter named *buf. The *buf parameter must be populated with the contents of the table row in the internal MySQL format.

The server uses three data formats: fixed-length rows, variable-length rows, and variable-length rows with BLOB pointers. In each format, the columns appear in the order in which they were defined by the CREATE TABLE statement. (The table definition is stored in the .frm file, and the optimizer and the handler are both able to access table metadata from the same source, its TABLE structure).

Each format begins with a NULL bitmap of one bit per nullable column. A table with as many as eight nullable columns will have a one-byte bitmap; a table with nine to sixteen nullable columns will have a two-byte bitmap, and so forth. One exception is fixed-width tables, which have an additional starting bit so that a table with eight nullable columns would have a two-byte bitmap.

After the NULL bitmap come the columns, one by one. Each column is of the size indicated in MySQL Data Types. In the server, column data types are defined in the sql/field.cc file. In the fixed length row format, the columns are simply laid out one by one. In a variable-length row, VARCHAR columns are coded as a one or two-byte length, followed by a string of characters. In a variable-length row with BLOB columns, each blob is represented by two parts: first an integer representing the actual size of the BLOB, and then a pointer to the BLOB in memory.

Examples of row conversion (or packing) can be found by starting at rnd_next() in any table handler. In ha_tina.cc, for example, the code in find_current_row() illustrates how the TABLE structure (pointed to by table) and a string object (named buffer) can be used to pack character data from a CSV file. Writing a row back to disk requires the opposite conversion, unpacking from the internal format.

The following example is from the CSV storage engine:

int ha_tina::rnd_next(byte *buf)
{
   DBUG_ENTER("ha_tina::rnd_next");

   statistic_increment(table->in_use->status_var.ha_read_rnd_next_count, &LOCK_status);

   current_position= next_position;
   if (!share->mapped_file)
     DBUG_RETURN(HA_ERR_END_OF_FILE);
   if (HA_ERR_END_OF_FILE == find_current_row(buf) )
     DBUG_RETURN(HA_ERR_END_OF_FILE);

   records++;
   DBUG_RETURN(0);
}

The conversion from the internal row format to CSV row format is performed in the find_current_row() method:

int ha_tina::find_current_row(byte *buf)
{
   byte *mapped_ptr= (byte *)share->mapped_file + current_position;
   byte *end_ptr;
   DBUG_ENTER("ha_tina::find_current_row");

   /* EOF should be counted as new line */
   if ((end_ptr=  find_eoln(share->mapped_file, current_position,
                            share->file_stat.st_size)) == 0)
     DBUG_RETURN(HA_ERR_END_OF_FILE);

   for (Field **field=table->field ; *field ; field++)
   {
     buffer.length(0);
     mapped_ptr++; // Increment past the first quote
     for(;mapped_ptr != end_ptr; mapped_ptr++)
     {
       // Need to convert line feeds!
       if (*mapped_ptr == '"' &&
           (((mapped_ptr[1] == ',') && (mapped_ptr[2] == '"')) ||
            (mapped_ptr == end_ptr -1 )))
       {
         mapped_ptr += Move past the , and the "
         break;
       }
       if (*mapped_ptr == '\\' && mapped_ptr != (end_ptr - 1))
       {
         mapped_ptr++;
         if (*mapped_ptr == 'r')
           buffer.append('\r');
         else if (*mapped_ptr == 'n' )
           buffer.append('\n');
         else if ((*mapped_ptr == '\\') || (*mapped_ptr == '"'))
           buffer.append(*mapped_ptr);
         else  /* This could only happed with an externally created file */
         {
           buffer.append('\\');
           buffer.append(*mapped_ptr);
         }
       }
       else
         buffer.append(*mapped_ptr);
     }
     (*field)->store(buffer.ptr(), buffer.length(), system_charset_info);
   }
   next_position= (end_ptr - share->mapped_file)+1;
   /* Maybe use \N for null? */
   memset(buf, 0, table->s->null_bytes); /* We do not implement nulls! */

   DBUG_RETURN(0);
}

[edit] Closing a Table

When the MySQL server is finished with a table, it will call the [custom-engine.html#custom-engine-api-reference-close close()] method to close file pointers and release any other resources.

Storage engines that use the shared access methods seen in the CSV engine and other example engines must remove themselves from the shared structure:

int ha_tina::close(void)
 {
   DBUG_ENTER("ha_tina::close");
   DBUG_RETURN(free_share(share));
 }

Storage engines using their own share management systems should use whatever methods are needed to remove the handler instance from the share for the table opened in their handler.

[edit] Adding Support for INSERT to a Storage Engine

Once you have read support in your storage engine, the next feature to implement is support for INSERT statements. With INSERT support in place, your storage engine can handle WORM (write once, read many) applications such as logging and archiving for later analysis.

All INSERT operations are handled through the [custom-engine.html#custom-engine-api-reference-write_row write_row()] method:

int ha_foo::write_row(byte *buf)

The *buf parameter contains the row to be inserted in the internal MySQL format. A basic storage engine could simply advance to the end of the data file and append the contents of the buffer directly (this would also make reading rows easier as you could read the row and pass it directly into the buffer parameter of the rnd_next() method.

The process for writing a row is the opposite of the process for reading one: take the data from the MySQL internal row format and write it to the data file. The following example is from the MyISAM storage engine:

int ha_myisam::write_row(byte * buf)
{
  statistic_increment(table->in_use->status_var.ha_write_count,&LOCK_status);

  /* If we have a timestamp column, update it to the current time */
  if (table->timestamp_field_type & TIMESTAMP_AUTO_SET_ON_INSERT)
    table->timestamp_field->set_time();

  /*
    If we have an auto_increment column and we are writing a changed row
    or a new row, then update the auto_increment value in the record.
  */
  if (table->next_number_field && buf == table->record[0])
    update_auto_increment();
  return mi_write(file,buf);
}

Three items of note in the preceding example include the updating of table statistics for writes, the setting of the timestamp prior to writing the row, and the updating of AUTO_INCREMENT values.

[edit] Adding Support for UPDATE to a Storage Engine

The MySQL server executes UPDATE statements by performing a (table/index/range/etc.) scan until it locates a row matching the WHERE clause of the UPDATE statement, then calling the [custom-engine.html#custom-engine-api-reference-update_row update_row() method:]

int ha_foo::update_row(const byte *old_data, byte *new_data)

The *old_data parameter contains the data that existed in the row prior to the update, while the *new_data parameter contains the new contents of the row (in the MySQL internal row format).

Performing an update will depend on row format and storage implementation. Some storage engines will replace data in-place, while other implementations delete the existing row and append the new row at the end of the data file.

Non-indexed storage engines can typically ignore the contents of the *old_data parameter and just deal with the *new_data buffer. Transactional engines may need to compare the buffers to determine what changes have been made for a later rollback.

If the table being updated contains timestamp columns, the updating of the timestamp will have to be managed in the update_row() call. The following example is from the CSV engine:

int ha_tina::update_row(const byte * old_data, byte * new_data)
{
   int size;
   DBUG_ENTER("ha_tina::update_row");

   statistic_increment(table->in_use->status_var.ha_read_rnd_next_count,
                      &LOCK_status);

   if (table->timestamp_field_type & TIMESTAMP_AUTO_SET_ON_UPDATE)
     table->timestamp_field->set_time();

   size= encode_quote(new_data);

   if (chain_append())
     DBUG_RETURN(-1);

   if (my_write(share->data_file, buffer.ptr(), size, MYF(MY_WME | MY_NABP)))
     DBUG_RETURN(-1);
   DBUG_RETURN(0);
}

Note the setting of the timestamp in the previous example.

[edit] Adding Support for DELETE to a Storage Engine

The MySQL server executes DELETE statements using the same approach as for UPDATE statements: It advances to the row to be deleted using the rnd_next() method and then calls the [custom-engine.html#custom-engine-api-reference-delete_row delete_row()] method to delete the row:

int ha_foo::delete_row(const byte *buf)

The *buf parameter contains the contents of the row to be deleted. For non-indexed storage engines the parameter can be ignored, but transactional storage engines may need to store the deleted data for rollback purposes.

The following example is from the CSV storage engine:

int ha_tina::delete_row(const byte * buf)
{
   DBUG_ENTER("ha_tina::delete_row");
   statistic_increment(table->in_use->status_var.ha_delete_count,
                       &LOCK_status);

   if (chain_append())
     DBUG_RETURN(-1);

   --records;

   DBUG_RETURN(0);
}

The steps of note in the preceding example are the update of the delete_count statistic and the record count.

[edit] Supporting Non-Sequential Reads

In addition to table scanning, storage engines can implement methods for non-sequential reading. (Note: this is not "can" but rather a "must" because certain operations rely on proper implementation of position() and rnd_pos() calls. Two examples of such operations are multi-table UPDATE and SELECT .. table.blob_column ORDER BY something). The MySQL server uses these methods for certain sort operations.

[edit] Implementing the position() Method

The [custom-engine.html#custom-engine-api-reference-position position()] method is called after every call to rnd_next() if the data needs to be reordered:

void ha_foo::position(const byte *record)

The contents of *record are up to you whatever value you provide will be returned in a later call to retrieve the row. Most storage engines will store some form of offset or primary key value.

[edit] Implementing the rnd_pos() Method

The [custom-engine.html#custom-engine-api-reference-rnd_pos rnd_pos()] method behaves in a similar fashion to the rnd_next() method but takes an additional parameter:

int ha_foo::rnd_pos(byte * buf, byte *pos)

The *pos parameter contains positioning information previously recorded using the position() method.

A storage engine must locate the row specified by the position and return it through *buf in the internal MySQL row format.

[edit] Supporting Indexing

Once basic read/write operations are implemented in a storage engine, the next stage is to add support for indexing. Without indexing, a storage engine's performance is quite limited.

This section documents the methods that must be implemented to add support for indexing to a storage engine.

[edit] Indexing Overview

Adding index support to a storage engine revolves around two tasks: providing information to the optimizer and implementing index-related methods. The information provided to the optimizer helps the optimizer to make better decisions about which index to use or even to skip using an index and instead perform a table scan.

The indexing methods either read rows that match a key, scan a set of rows by index order, or read information directly from the index.

The following example shows the method calls made during an UPDATE query that uses an index, such as UPDATE foo SET ts = now() WHERE id = 1:

ha_foo::index_init
ha_foo::index_read
ha_foo::index_read_idx
ha_foo::rnd_next
ha_foo::update_row

In addition to index reading methods, your storage engine must support the creation of new indexes and be able to keep table indexes up to date as rows are added, modified, and removed from tables.

[edit] Getting Index Information During CREATE TABLE Operations

It is preferable for storage engines that support indexing to read the index information provided during a CREATE TABLE operation and store it for future use. The reasoning behind this is that the index information is most readily available during table and index creation and is not as easily retrieved afterward.

The table index information is contained within the key_info structure of the TABLE argument of the create() method.

Within the key_info structure there is a flag that defines index behavior:

#define HA_NOSAME             1  /* Set if not duplicated records   */
 #define HA_PACK_KEY           2  /* Pack string key to previous key */
 #define HA_AUTO_KEY           16
 #define HA_BINARY_PACK_KEY    32 /* Packing of all keys to prev key */
 #define HA_FULLTEXT          128 /* For full-text search            */
 #define HA_UNIQUE_CHECK      256 /* Check the key for uniqueness    */
 #define HA_SPATIAL          1024 /* For spatial search              */
 #define HA_NULL_ARE_EQUAL   2048 /* NULL in key are cmp as equal    */
 #define HA_GENERATED_KEY    8192 /* Automatically generated key     */
 

In addition to the flag, there is an enumerator named algorithm that specifies the desired index type:

enum ha_key_alg {
  HA_KEY_ALG_UNDEF=     0,  /* Not specified (old file)     */
  HA_KEY_ALG_BTREE=     1,  /* B-tree, default one          */
  HA_KEY_ALG_RTREE=     2,  /* R-tree, for spatial searches */
  HA_KEY_ALG_HASH=      3,  /* HASH keys (HEAP tables)      */
  HA_KEY_ALG_FULLTEXT=  4   /* FULLTEXT (MyISAM tables)     */
};

In addition to the flag and algorithm, there is an array of key_part elements that describe the individual parts of a composite key.

The key parts define the field associated with the key part, whether the key should be packed, and the data type and length of the index part. See ha_myisam.cc for an example of how this information is parsed.

As an alternative, a storage engine can instead read index information from the TABLE structure of the handler during each operation.

[edit] Creating Index Keys

As part of every table-write operation (INSERT, UPDATE, DELETE), the storage engine is required to update its internal index information.

The method used to update indexes will vary from storage engine to storage engine, depending on the method used to store the index.

In general, the storage engine will have to use row information passed in methods such as [custom-engine.html#custom-engine-api-reference-write_row write_row()], [custom-engine.html#custom-engine-api-reference-delete_row delete_row()], and [custom-engine.html#custom-engine-api-reference-update_row update_row()] in combination with index information for the table to determine what index data needs to be modified, and make the needed changes.

The method of associating an index with its row will depend on your storage approach. Current storage engines store the row offset.

[edit] Parsing Key Information

Many of the index methods pass a byte array named *key that identifies the index entry to be read in a standard format. Your storage engine will need to extract the information stored in the key and translate it into its internal index format to identify the row associated with the index.

The information in the key is obtained by iterating through the key, which is formatted the same as the definition in table->key_info[index]->key_part[part_num].

Along with the key, handler methods pass a keypart_map parameter to indicate which parts of the key are present in the key parameter. keypart_map is a ulonglong bitmap with one bit per key part: 1 for keypart[0], 2 for keypart[1], 4 for keypart[2], and so forth. If a bit in keypart_map is set, the value for this key part is present in the key buffer. Bits following the bit for the last key part don't matter,so ~0 can be used for all keyparts. Currently, only key prefixes are supported. That is, assert((keypart_map + 1) & keypart_map == 0).

A keypart_map is part of the key_range structure used by records_in_range(), and a keypart_map value is passed directly to the index_read(), index_read_idx(), and index_read_last() methods.

Older handlers have a key_len parameter instead of keypart_map. The key_len value is a uint that indicates the prefix length when matching by prefix.

[edit] Providing Index Information to the Optimizer

In order for indexing to be used effectively, storage engines need to provide the optimizer with information about the table and its indexes. This information is used to choose whether to use an index, and if so, which index to use.

[edit] Implementing the info() Method

The optimizer requests an update of table information by calling the [custom-engine.html#custom-engine-api-reference-info handler::info()] method. The info() method does not have a return value, instead it is expected that the storage engine will set a variety of public variables that the server will then read as needed. These values are also used to populate certain SHOW outputs such as SHOW TABLE STATUS and for queries of the INFORMATION_SCHEMA.

All variables are optional but should be filled if possible:

  • records - The number of rows in the table. If you cannot provide an accurate number quickly you should set the value to be greater than 1 so that the optimizer does not perform optimizations for zero or one row tables.
  • deleted - Number of deleted rows in table. Used to identify table fragmentation, where applicable.
  • data_file_length - Size of the data file, in bytes. Helps optimizer calculate the cost of reads.
  • index_file_length - Size of the index file, in bytes. Helps optimizer calculate the cost of reads.
  • mean_rec_length - Average length of a single row, in bytes.
  • scan_time - Cost in I/O seeks to perform a full table scan.
  • delete_length -
  • check_time -

When calculating values, speed is more important than accuracy, as there is no sense in taking a long time to give the optimizer clues as to what approach may be the fastest. Estimates within an order of magnitude are usually good enough.

[edit] Implementing the records_in_range Method

The [custom-engine.html#custom-engine-api-reference-records_in_range records_in_range()] method is called by the optimizer to assist in choosing which index on a table to use for a query or join. It is defined as follows:

ha_rows ha_foo::records_in_range(uint inx, key_range *min_key, key_range *max_key)

The inx parameter is the index to be checked. The *min_key and *max_key parameters are key_range structures that indicate the low and high ends of the range. The key_range structure looks like this:

typedef struct st_key_range
{
  const byte *key;
  uint length;
  key_part_map keypart_map;
  enum ha_rkey_function flag;
} key_range;

key_range members are used as follows:

  • key is a pointer to the key buffer.
  • length is the key length.
  • keypart_map is a bitmap that indicates which key parts are passed in key (as described in #Parsing Key Information).
  • flag indicates whether to include the key in the range. Its value differs for min_key and max_key, as described following.

min_key.flag can have one of the following values:

  • HA_READ_KEY_EXACT - Include the key in the range
  • HA_READ_AFTER_KEY - Don't include key in range

max_key.flag can have one of the following values:

  • HA_READ_BEFORE_KEY - Don't include key in range
  • HA_READ_AFTER_KEY - Include all 'end_key' values in the range

The following return values are allowed:

  • 0 - There are no matching keys in the given range
  • number > 0 - There are approximately number matching rows in the range
  • HA_POS_ERROR - Something is wrong with the index tree

When calculating values, speed is more important than accuracy.

[edit] Preparing for Index Use with index_init()

The [custom-engine.html#custom-engine-api-reference-index_init index_init()] method is called before an index is used to allow the storage engine to perform any necessary preparation or optimization:

int ha_foo::index_init(uint keynr, bool sorted)

Most storage engines do not need to make special preparations, in which case a default implementation will be used if the method is not explicitly implemented in the storage engine:

int handler::index_init(uint idx) { active_index=idx; return 0; }

[edit] Cleaning up with index_end()

The [custom-engine.html#custom-engine-api-reference-index_end index_end()] method is a counterpart to the index_init() method. The purpose of the index_end() method is to clean up any preparations made by the index_init() method.

If a storage engine does not implement index_init() it does not need to implement index_end().

[edit] Implementing the index_read() Method

The [custom-engine.html#custom-engine-api-reference-index_read index_read()] method is used to retrieve a row based on a key:

int ha_foo::index_read(byte * buf, const byte * key,
                       ulonglong keypart_map,
                       enum ha_rkey_function find_flag)

The *buf parameter is a byte array that the storage engine populates with the row that matches the index key specified in *key. The keypart_map parameter is a bitmap that indicates which parts of the key are present in the key parameter. The find_flag parameter is an enumerator that dictates the search behavior to be used, as discussed in #Parsing Key Information.

The index to be used is previously defined in the [custom-engine.html#custom-engine-index-init index_init()] call and is stored in the active_index handler variable.

The following values are allowed for find_flag:

HA_READ_AFTER_KEY
HA_READ_BEFORE_KEY
HA_READ_KEY_EXACT
HA_READ_KEY_OR_NEXT
HA_READ_KEY_OR_PREV
HA_READ_PREFIX
HA_READ_PREFIX_LAST
HA_READ_PREFIX_LAST_OR_PREV

Storage engines must convert the *key parameter to a storage engine-specific format, use it to find the matching row according to the find_flag, and then populate *buf with the matching row in the MySQL internal row format. For more information on the internal row format, see #Implementing the rnd_next() Method.

In addition to returning a matching row, the storage engine must also set a cursor to support sequential index reads.

If the *key parameter is null, the storage engine should read the first key in the index.

[edit] Implementing the index_read_idx() Method

The [custom-engine.html#custom-engine-api-reference-index_read_idx index_read_idx()] method is identical to [custom-engine.html#custom-engine-index-read index_read()] with the exception that index_read_idx() accepts an additional keynr parameter:

int ha_foo::index_read_idx(byte * buf, uint keynr, const byte * key,
                           ulonglong keypart_map,
                           enum ha_rkey_function find_flag)

The keynr parameter specifies the index to be read, as opposed to index_read(), where the index is already set.

As with the index_read() method, the storage engine must return the row that matches the key according to the find_flag and set a cursor for future reads.

[edit] Implementing the index_read_last() Method

The [custom-engine.html#custom-engine-api-reference-index_read_last index_read_last()] method works like [custom-engine.html#custom-engine-index-read index_read()] but finds the last row with the current key value or prefix:

int ha_foo::index_read_last(byte * buf, const byte * key,
                            key_part_map keypart_map)

index_read_last() is used when optimizing away the ORDER BY clause for statements such as this:

SELECT * FROM t1 WHERE a=1 ORDER BY a DESC,b DESC;

[edit] Implementing the index_next() Method

The [custom-engine.html#custom-engine-api-reference-index_next index_next()] method is used for index scanning:

int ha_foo::index_next(byte * buf)

The *buf parameter is populated with the row that corresponds to the next matching key value according to the internal cursor set by the storage engine during operations such as index_read() and index_first().

[edit] Implementing the index_prev() Method

The [custom-engine.html#custom-engine-api-reference-index_prev index_prev()] method is used for reverse index scanning:

        int ha_foo::index_prev(byte * buf)

The *buf parameter is populated with the row that corresponds to the previous matching key value according to the internal cursor set by the storage engine during operations such as index_read() and index_last().

[edit] Implementing the index_first() Method

The [custom-engine.html#custom-engine-api-reference-index_first index_first()] method is used for index scanning:

        int ha_foo::index_first(byte * buf)

The *buf parameter is populated with the row that corresponds to the first key value in the index.

[edit] Implementing the index_last() Method

The [custom-engine.html#custom-engine-api-reference-index_last index_last()] method is used for reverse index scanning:

        int ha_foo::index_last(byte * buf)

The *buf parameter is populated with the row that corresponds to the last key value in the index.

[edit] Supporting Transactions

This section documents the methods that must be implemented to add support for transactions to a storage engine.

Please note that transaction management can be complicated and involve methods such as row versioning and redo logs, which is beyond the scope of this document. Instead coverage is limited to a description of required methods and not their implementation. For examples of implementation, please see ha_innodb.cc.

[edit] Transaction Overview

Transactions are not explicitly started on the storage engine level, but are instead implicitly started through calls to either start_stmt() or external_lock(). If the preceding methods are called and a transaction already exists the transaction is not replaced.

The storage engine stores transaction information in per-connection memory and also registers the transaction in the MySQL server to allow the server to later issue COMMIT and ROLLBACK operations.

As operations are performed the storage engine will have to implement some form of versioning or logging to permit a rollback of all operations executed within the transaction.

After work is completed, the MySQL server will call either the commit() method or the rollback() method defined in the storage engine's handlerton.

[edit] Starting a Transaction

A transaction is started by the storage engine in response to a call to either the start_stmt() or external_lock() methods.

If there is no active transaction, the storage engine must start a new transaction and register the transaction with the MySQL server so that ROLLBACK or COMMIT can later be called.

[edit] Starting a Transaction from a start_stmt() Call

The first method call that can start a transaction is the [custom-engine.html#custom-engine-transactions-starting-start-stmt start_stmt()] method.

The following example shows how a storage engine could register a transaction:

int my_handler::start_stmt(THD *thd, thr_lock_type lock_type)
{
  int error= 0;
  my_txn *txn= (my_txn *) thd->ha_data[my_handler_hton.slot];

  if (txn == NULL)
  {
    thd->ha_data[my_handler_hton.slot]= txn= new my_txn;
  }
  if (txn->stmt == NULL && !(error= txn->tx_begin()))
  {
    txn->stmt= txn->new_savepoint();
    trans_register_ha(thd, FALSE, &my_handler_hton);
  }
  return error;
}

THD is the current client connection. It holds state relevant data for the current client, such as identity, network connection and other per-connection data.

thd->ha_data[my_handler_hton.slot] is a pointer in thd to the connection-specific data of this storage engine. In this example we use it to store the transaction context.

An additional example of implementing start_stmt() can be found in ha_innodb.cc.

[edit] Starting a Transaction from a external_lock() Method

MySQL calls [custom-engine.html#custom-engine-api-reference-external_lock handler::external_lock()] for every table it is going to use at the beginning of every statement. Thus, if a table is touched for the first time, it implicitly starts a transaction.

Note that because of pre-locking, all tables that can be potentially used between the beginning and the end of a statement are locked before the statement execution begins and handler::external_lock() is called for all these tables. That is, if an INSERT fires a trigger, which calls a stored procedure, that invokes a stored method, and so forth, all tables used in the trigger, stored procedure, method, etc., are locked in the beginning of the INSERT. Additionally, if there's a construct like

IF
.. use one table
ELSE
.. use another table

both tables will be locked.

Also, if a user calls LOCK TABLES, MySQL will call handler::external_lock only once. In this case, MySQL will call handler::start_stmt() at the beginning of the statement.

The following example shows how a storage engine can start a transaction and take locking requests into account:

int my_handler::external_lock(THD *thd, int lock_type)
{
  int error= 0;
  my_txn *txn= (my_txn *) thd->ha_data[my_handler_hton.slot];

  if (txn == NULL)
  {
    thd->ha_data[my_handler_hton.slot]= txn= new my_txn;
  }

  if (lock_type != F_UNLCK)
  {
    bool all_tx= 0;
    if (txn->lock_count == 0)
    {
      txn->lock_count= 1;
      txn->tx_isolation= thd->variables.tx_isolation;

      all_tx= test(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN | OPTION_TABLE_LOCK));
    }

    if (all_tx)
    {
      txn->tx_begin();
      trans_register_ha(thd, TRUE, &my_handler_hton);
    }
    else
    if (txn->stmt == 0)
    {
      txn->stmt= txn->new_savepoint();
      trans_register_ha(thd, FALSE, &my_handler_hton);
    }
  }
  else
  {
    if (txn->stmt != NULL)
    {
      /* Commit the transaction if we're in auto-commit mode */
      my_handler_commit(thd, FALSE);

      delete txn->stmt; // delete savepoint
      txn->stmt= NULL;
    }
  }

  return error;
}

Every storage engine must call trans_register_ha() every time it starts a transaction. The trans_register_ha() method registers a transaction with the MySQL server to allow for future COMMIT and ROLLBACK calls.

An additional example of implementing external_lock() can be found in ha_innodb.cc.

[edit] Implementing ROLLBACK

Of the two major transactional operations, ROLLBACK is the more complicated to implement. All operations that occurred during the transaction must be reversed so that all rows are unchanged from before the transaction began.

To support ROLLBACK, create a method that matches this definition:

int  (*rollback)(THD *thd, bool all);

The method name is then listed in the rollback (thirteenth) entry of [custom-engine.html#custom-engine-handlerton the handlerton].

The THD parameter is used to identify the transaction that needs to be rolled back, while the bool all parameter indicates whether the entire transaction should be rolled back or just the last statement.

Details of implementing a ROLLBACK operation will vary by storage engine. Examples can be found in ha_innodb.cc.

[edit] Implementing COMMIT

During a commit operation, all changes made during a transaction are made permanent and a rollback operation is not possible after that. Depending on the transaction isolation used, this may be the first time such changes are visible to other threads.

To support COMMIT, create a method that matches this definition:

        int  (*commit)(THD *thd, bool all);

The method name is then listed in the commit (twelfth) entry of [custom-engine.html#custom-engine-handlerton the handlerton].

The THD parameter is used to identify the transaction that needs to be committed, while the bool all parameter indicates if this is a full transaction commit or just the end of a statement that is part of the transaction.

Details of implementing a COMMIT operation will vary by storage engine. Examples can be found in ha_innodb.cc.

If the server is in auto-commit mode, the storage engine should automatically commit all read-only statements such as SELECT.

In a storage engine, "auto-committing" works by counting locks. Increment the count for every call to external_lock(), decrement when external_lock() is called with an argument of F_UNLCK. When the count drops to zero, trigger a commit.

[edit] Adding Support for Savepoints

First, the implementor should know how many bytes are required to store savepoint information. This should be a fixed size, preferably not large as the MySQL server will allocate space to store the savepoint for all storage engines with each named savepoint.

The implementor should store the data in the space preallocated by mysqld - and use the contents from the preallocated space for rollback or release savepoint operations.

When a COMMIT or ROLLBACK operation occurs (with bool all set to true), all savepoints are assumed to be released. If the storage engine allocates resources for savepoints, it should free them.

The following handlerton elements need to be implemented to support savepoints (elements 7,9,10,11):

uint savepoint_offset;
int  (*savepoint_set)(THD *thd, void *sv);
int  (*savepoint_rollback)(THD *thd, void *sv);
int  (*savepoint_release)(THD *thd, void *sv);

[edit] Specifying the Savepoint Offset

The seventh element of the handlerton is the savepoint_offset:

uint savepoint_offset;

The savepoint_offset must be initialized statically to the size of the needed memory to store per-savepoint information.

[edit] Implementing the savepoint_set Method

The savepoint_set() method is called whenever a user issues the SAVEPOINT statement:

int  (*savepoint_set)(THD *thd, void *sv);

The *sv parameter points to an uninitialized storage area of the size defined by savepoint_offset.

When savepoint_set() is called, the storage engine needs to store savepoint information into sv so that the server can later roll back the transaction to the savepoint or release the savepoint resources.

[edit] Implementing the savepoint_rollback() Method

The savepoint_rollback() method is called whenever a user issues the ROLLBACK TO SAVEPOINT statement:

int  (*savepoint_rollback) (THD *thd, void *sv);

The *sv parameter points to the storage area that was previously passed to the savepoint_set() method.

[edit] Implementing the savepoint_release() Method

The savepoint_release() method is called whenever a user issues the RELEASE SAVEPOINT statement:

int  (*savepoint_release) (THD *thd, void *sv);

The *sv parameter points to the storage area that was previously passed to the savepoint_set() method.

[edit] API Reference

[edit] bas_ext

[edit] Purpose

Defines the file extensions used by the storage engine.

[edit] Synopsis

virtual const char ** bas_ext ( );
;

[edit] Description

This is the bas_ext method. It is called to provide the MySQL server with a list of file extensions used by the storage engine. The list returned is a null-terminated string array.

By providing a list of extensions, storage engines can in many cases omit the [custom-engine.html#custom-engine-api-reference-delete_table delete_table()] method as the MySQL server will close all references to the table and delete all files with the specified extension.

[edit] Parameters

There are no parameters for this method.

[edit] Return Values

  • Return value is a null-terminated string array of storage engine extensions. The following is an example from the CSV engine:
static const char *ha_tina_exts[] =
 {
   ".CSV",
   NullS
 };

[edit] Usage

static const char *ha_tina_exts[] =
 {
   ".CSV",
   NullS
 };

const char **ha_tina::bas_ext() const
 {
   return ha_tina_exts;
 }

[