CREATE TABLE
Operationsindex_init()index_end()index_read() Methodindex_read_idx() Methodindex_read_last() Methodindex_next() Methodindex_prev() Methodindex_first() Methodindex_last() MethodOnce basic read/write operations are implemented in a storage engine, the next stage is to add support for indexing. Without indexing, a storage engine's performance is quite limited.
This section documents the methods that must be implemented to add support for indexing to a storage engine.
Adding index support to a storage engine revolves around two tasks: providing information to the optimizer and implementing index-related methods. The information provided to the optimizer helps the optimizer to make better decisions about which index to use or even to skip using an index and instead perform a table scan.
The indexing methods either read rows that match a key, scan a set of rows by index order, or read information directly from the index.
The following example shows the method calls made during an
UPDATE query that uses an index, such as
UPDATE foo SET ts = now() WHERE id = 1:
ha_foo::index_init ha_foo::index_read ha_foo::index_read_idx ha_foo::rnd_next ha_foo::update_row
In addition to index reading methods, your storage engine must support the creation of new indexes and be able to keep table indexes up to date as rows are added, modified, and removed from tables.
It is preferable for storage engines that support indexing to
read the index information provided during a CREATE
TABLE operation and store it for future use. The
reasoning behind this is that the index information is most
readily available during table and index creation and is not as
easily retrieved afterward.
The table index information is contained within the
key_info structure of the
TABLE argument of the
create() method.
Within the key_info structure there is a
flag that defines index behavior:
#define HA_NOSAME 1 /* Set if not duplicated records */ #define HA_PACK_KEY 2 /* Pack string key to previous key */ #define HA_AUTO_KEY 16 #define HA_BINARY_PACK_KEY 32 /* Packing of all keys to prev key */ #define HA_FULLTEXT 128 /* For full-text search */ #define HA_UNIQUE_CHECK 256 /* Check the key for uniqueness */ #define HA_SPATIAL 1024 /* For spatial search */ #define HA_NULL_ARE_EQUAL 2048 /* NULL in key are cmp as equal */ #define HA_GENERATED_KEY 8192 /* Automatically generated key */
In addition to the flag, there is an
enumerator named algorithm that specifies the
desired index type:
enum ha_key_alg {
HA_KEY_ALG_UNDEF= 0, /* Not specified (old file) */
HA_KEY_ALG_BTREE= 1, /* B-tree, default one */
HA_KEY_ALG_RTREE= 2, /* R-tree, for spatial searches */
HA_KEY_ALG_HASH= 3, /* HASH keys (HEAP tables) */
HA_KEY_ALG_FULLTEXT= 4 /* FULLTEXT (MyISAM tables) */
};
In addition to the flag and
algorithm, there is an array of
key_part elements that describe the
individual parts of a composite key.
The key parts define the field associated with the key part,
whether the key should be packed, and the data type and length
of the index part. See ha_myisam.cc for an
example of how this information is parsed.
As an alternative, a storage engine can instead read index
information from the TABLE structure of the
handler during each operation.
As part of every table-write operation
(INSERT, UPDATE,
DELETE), the storage engine is required to
update its internal index information.
The method used to update indexes will vary from storage engine to storage engine, depending on the method used to store the index.
In general, the storage engine will have to use row information
passed in methods such as
[custom-engine.html#custom-engine-api-reference-write_row
write_row()],
[custom-engine.html#custom-engine-api-reference-delete_row
delete_row()], and
[custom-engine.html#custom-engine-api-reference-update_row
update_row()] in combination with index
information for the table to determine what index data needs to
be modified, and make the needed changes.
The method of associating an index with its row will depend on your storage approach. Current storage engines store the row offset.
Many of the index methods pass a byte array named
*key that identifies the index entry to be
read in a standard format. Your storage engine will need to
extract the information stored in the key and translate it into
its internal index format to identify the row associated with
the index.
The information in the key is obtained by iterating through the
key, which is formatted the same as the definition in
table->key_info[index]->key_part[part_num].
Along with the key, handler methods pass a
keypart_map parameter to indicate which parts
of the key are present in the key parameter.
keypart_map is a ulonglong
bitmap with one bit per key part: 1 for
keypart[0], 2 for
keypart[1], 4 for
keypart[2], and so forth. If a bit in
keypart_map is set, the value for this key
part is present in the key buffer. Bits following the bit for
the last key part don't matter,so ~0 can be used for all
keyparts. Currently, only key prefixes are supported. That is,
assert((keypart_map + 1) & keypart_map ==
0).
A keypart_map is part of the
key_range structure used by
records_in_range(), and a
keypart_map value is passed directly to the
index_read(),
index_read_idx(), and
index_read_last() methods.
Older handlers have a key_len parameter
instead of keypart_map. The
key_len value is a uint
that indicates the prefix length when matching by prefix.
In order for indexing to be used effectively, storage engines need to provide the optimizer with information about the table and its indexes. This information is used to choose whether to use an index, and if so, which index to use.
The optimizer requests an update of table information by
calling the
[custom-engine.html#custom-engine-api-reference-info
handler::info()] method. The
info() method does not have a return value,
instead it is expected that the storage engine will set a
variety of public variables that the server will then read as
needed. These values are also used to populate certain
SHOW outputs such as SHOW TABLE
STATUS and for queries of the
INFORMATION_SCHEMA.
All variables are optional but should be filled if possible:
records - The number of rows in the
table. If you cannot provide an accurate number quickly
you should set the value to be greater than 1 so that the
optimizer does not perform optimizations for zero or one
row tables.
deleted - Number of deleted rows in
table. Used to identify table fragmentation, where
applicable.
data_file_length - Size of the data
file, in bytes. Helps optimizer calculate the cost of
reads.
index_file_length - Size of the index
file, in bytes. Helps optimizer calculate the cost of
reads.
mean_rec_length - Average length of a
single row, in bytes.
scan_time - Cost in I/O seeks to
perform a full table scan.
delete_length -
check_time -
When calculating values, speed is more important than accuracy, as there is no sense in taking a long time to give the optimizer clues as to what approach may be the fastest. Estimates within an order of magnitude are usually good enough.
The
[custom-engine.html#custom-engine-api-reference-records_in_range
records_in_range()] method is called by the
optimizer to assist in choosing which index on a table to use
for a query or join. It is defined as follows:
ha_rows ha_foo::records_in_range(uint inx, key_range *min_key, key_range *max_key)
The inx parameter is the index to be
checked. The *min_key and
*max_key parameters are
key_range structures that indicate the low
and high ends of the range. The key_range
structure looks like this:
typedef struct st_key_range
{
const byte *key;
uint length;
key_part_map keypart_map;
enum ha_rkey_function flag;
} key_range;
key_range members are used as follows:
key is a pointer to the key buffer.
length is the key length.
keypart_map is a bitmap that indicates
which key parts are passed in key (as
described in
Parsing Key
Information).
flag indicates whether to include the
key in the range. Its value differs for
min_key and max_key,
as described following.
min_key.flag can have one of the following
values:
HA_READ_KEY_EXACT - Include the key in
the range
HA_READ_AFTER_KEY - Don't include key
in range
max_key.flag can have one of the following
values:
HA_READ_BEFORE_KEY - Don't include key
in range
HA_READ_AFTER_KEY - Include all
'end_key' values in the range
The following return values are allowed:
0 - There are no matching keys in the
given range
number > 0 - There are approximately number matching rows in the range
HA_POS_ERROR - Something is wrong with
the index tree
When calculating values, speed is more important than accuracy.
The [custom-engine.html#custom-engine-api-reference-index_init
index_init()] method is called before an
index is used to allow the storage engine to perform any
necessary preparation or optimization:
int ha_foo::index_init(uint keynr, bool sorted)
Most storage engines do not need to make special preparations, in which case a default implementation will be used if the method is not explicitly implemented in the storage engine:
int handler::index_init(uint idx) { active_index=idx; return 0; }
The [custom-engine.html#custom-engine-api-reference-index_end
index_end()] method is a counterpart to the
index_init() method. The purpose of the
index_end() method is to clean up any
preparations made by the index_init() method.
If a storage engine does not implement
index_init() it does not need to implement
index_end().
The [custom-engine.html#custom-engine-api-reference-index_read
index_read()] method is used to retrieve a
row based on a key:
int ha_foo::index_read(byte * buf, const byte * key,
ulonglong keypart_map,
enum ha_rkey_function find_flag)
The *buf parameter is a byte array that the
storage engine populates with the row that matches the index key
specified in *key. The
keypart_map parameter is a bitmap that
indicates which parts of the key are present in the
key parameter. The
find_flag parameter is an enumerator that
dictates the search behavior to be used, as discussed in
Parsing Key
Information.
The index to be used is previously defined in the
[custom-engine.html#custom-engine-index-init
index_init()] call and is stored in the
active_index handler variable.
The following values are allowed for
find_flag:
HA_READ_AFTER_KEY HA_READ_BEFORE_KEY HA_READ_KEY_EXACT HA_READ_KEY_OR_NEXT HA_READ_KEY_OR_PREV HA_READ_PREFIX HA_READ_PREFIX_LAST HA_READ_PREFIX_LAST_OR_PREV
Storage engines must convert the *key
parameter to a storage engine-specific format, use it to find
the matching row according to the find_flag,
and then populate *buf with the matching row
in the MySQL internal row format. For more information on the
internal row format, see
#Implementing
the rnd_next() Method.
In addition to returning a matching row, the storage engine must also set a cursor to support sequential index reads.
If the *key parameter is null, the storage
engine should read the first key in the index.
The
[custom-engine.html#custom-engine-api-reference-index_read_idx
index_read_idx()] method is identical to
[custom-engine.html#custom-engine-index-read
index_read()] with the exception that
index_read_idx() accepts an additional
keynr parameter:
int ha_foo::index_read_idx(byte * buf, uint keynr, const byte * key,
ulonglong keypart_map,
enum ha_rkey_function find_flag)
The keynr parameter specifies the index to be
read, as opposed to index_read(), where the
index is already set.
As with the index_read() method, the storage
engine must return the row that matches the key according to the
find_flag and set a cursor for future reads.
The
[custom-engine.html#custom-engine-api-reference-index_read_last
index_read_last()] method works like
[custom-engine.html#custom-engine-index-read
index_read()] but finds the last row with the
current key value or prefix:
int ha_foo::index_read_last(byte * buf, const byte * key,
key_part_map keypart_map)
index_read_last() is used when optimizing
away the ORDER BY clause for statements such
as this:
SELECT * FROM t1 WHERE a=1 ORDER BY a DESC,b DESC;
The [custom-engine.html#custom-engine-api-reference-index_next
index_next()] method is used for index
scanning:
int ha_foo::index_next(byte * buf)
The *buf parameter is populated with the row
that corresponds to the next matching key value according to the
internal cursor set by the storage engine during operations such
as index_read() and
index_first().
The [custom-engine.html#custom-engine-api-reference-index_prev
index_prev()] method is used for reverse
index scanning:
int ha_foo::index_prev(byte * buf)
The *buf parameter is populated with the row
that corresponds to the previous matching key value according to
the internal cursor set by the storage engine during operations
such as index_read() and
index_last().
The [custom-engine.html#custom-engine-api-reference-index_first
index_first()] method is used for index
scanning:
int ha_foo::index_first(byte * buf)
The *buf parameter is populated with the row
that corresponds to the first key value in the index.
