This section describes how InnoDB tables,
indexes, and their associated metadata is represented at the
physical level. This information is primarily useful for
performance tuning and troubleshooting.
MySQL stores its data dictionary information for tables in
.frm files in database
directories. Unlike other MySQL storage engines,
InnoDB also encodes information about the
table in its own internal data dictionary inside the
tablespace. When MySQL drops a table or a database, it deletes
one or more .frm files as well as the
corresponding entries inside the InnoDB
data dictionary. You cannot move InnoDB
tables between databases simply by moving the
.frm files.
Every InnoDB table has a special index
called the clustered
index where the data for the rows is stored. Typically,
the clustered index is synonymous with the
primary key. To get
the best performance from queries, inserts, and other database
operations, you must understand how InnoDB uses the clustered
index to optimize the most common lookup and DML operations
for each table.
When you define a PRIMARY KEY on your
table, InnoDB uses it as the clustered
index. Define a primary key for each table that you
create. If there is no logical unique and non-null column
or set of columns, add a new
auto-increment
column, whose values are filled in automatically.
If you do not define a PRIMARY KEY for
your table, MySQL locates the first
UNIQUE index where all the key columns
are NOT NULL and
InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or
suitable UNIQUE index,
InnoDB internally generates a hidden
clustered index on a synthetic column containing row ID
values. The rows are ordered by the ID that
InnoDB assigns to the rows in such a
table. The row ID is a 6-byte field that increases
monotonically as new rows are inserted. Thus, the rows
ordered by the row ID are physically in insertion order.
Accessing a row through the clustered index is fast because
the index search leads directly to the page with all the row
data. If a table is large, the clustered index architecture
often saves a disk I/O operation when compared to storage
organizations that store row data using a different page from
the index record. (For example, MyISAM uses
one file for data rows and another for index records.)
All indexes other than the clustered index are known as
secondary indexes.
In InnoDB, each record in a secondary index
contains the primary key columns for the row, as well as the
columns specified for the secondary index.
InnoDB uses this primary key value to
search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
For coding guidelines to take advantage of
InnoDB clustered and secondary indexes, see
Section 8.3.2, “Using Primary Keys”
Section 8.3, “Optimization and Indexes”
Section 8.5, “Optimizing for InnoDB Tables”
Section 8.3.2, “Using Primary Keys”.
A special kind of index, the FULLTEXT
index, helps InnoDB deal with queries and
DML operations involving text-based columns and the words they
contain. These indexes are physically represented as entire
InnoDB tables, which are acted upon by SQL
keywords such as the FULLTEXT clause of the
CREATE INDEX statement, the
MATCH() ... AGAINST syntax in a
SELECT statement, and the
OPTIMIZE TABLE statement. For
usage information, see Section 12.9, “Full-Text Search Functions”.
You can examine FULLTEXT indexes by
querying tables in the INFORMATION_SCHEMA
database. You can see basic index information for
FULLTEXT indexes by querying
INNODB_SYS_INDEXES. Although
InnoDB FULLTEXT indexes
are represented by tables, which show up in
INNODB_SYS_TABLES queries, the
way to monitor the special text-processing aspects of a
FULLTEXT index is to query the tables
INNODB_FT_CONFIG,
INNODB_FT_INDEX_TABLE,
INNODB_FT_INDEX_CACHE,
INNODB_FT_DEFAULT_STOPWORD,
INNODB_FT_DELETED, and
INNODB_FT_BEING_DELETED.
InnoDB FULLTEXT indexes
are updated by the OPTIMIZE
TABLE command, using a special mode controlled by
the configuration options
innodb_ft_num_word_optimize
and
innodb_optimize_fulltext_only.
All InnoDB indexes are
B-trees where the index
records are stored in the leaf pages of the tree. The default
size of an index page is 16KB. When new records are inserted,
InnoDB tries to leave 1/16 of the page free
for future insertions and updates of the index records.
If index records are inserted in a sequential order (ascending
or descending), the resulting index pages are about 15/16
full. If records are inserted in a random order, the pages are
from 1/2 to 15/16 full. If the
fill factor of an
index page drops below 1/2, InnoDB tries to
contract the index tree to free the page.
You can specify the page
size for all InnoDB tablespaces in
a MySQL instance by setting the
innodb_page_size
configuration option before creating the instance. Once the
page size for a MySQL instance is set, you cannot change it.
Supported sizes are 16KB, 8KB, and 4KB, corresponding to the
option values 16k, 8k,
and 4k.
A MySQL instance using a particular
InnoDB page size cannot use data files or
log files from an instance that uses a different page size.
Database applications often insert new rows in the ascending order of the primary key. In this case, due to the layout of the clustered index in the same order as the primary key, insertions into an InnoDB table do not require random reads from a disk.
On the other hand, secondary indexes are usually nonunique,
and insertions into secondary indexes happen in a relatively
random order. In the same way, deletes and updates can affect
data pages that are not adjacent in secondary indexes. This
would cause a lot of random disk I/O operations without a
special mechanism used in InnoDB.
When an index record is inserted, marked for deletion, or
deleted from a nonunique secondary index,
InnoDB checks whether the secondary index
page is in the buffer
pool. If that is the case, InnoDB
applies the change directly to the index page. If the index
page is not found in the buffer pool,
InnoDB records the change in a special
structure known as the
insert buffer. The
insert buffer is kept small so that it fits entirely in the
buffer pool, and changes can be applied very quickly. This
process is known as
change buffering.
(Formerly, it applied only to inserts and was called insert
buffering. The data structure is still called the insert
buffer.)
Periodically, the insert buffer is merged into the secondary index trees in the database. Often, it is possible to merge several changes into the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.
The insert buffer merging may continue to happen
after the transaction has been committed.
In fact, it may continue to happen after a server shutdown and
restart (see Section 14.2.4.6, “Starting InnoDB on a Corrupted Database”).
Insert buffer merging may take many hours when many secondary
indexes must be updated and many rows have been inserted.
During this time, disk I/O will be increased, which can cause
significant slowdown on disk-bound queries. Another
significant background I/O operation is the
purge thread (see
Section 14.2.3.11, “InnoDB Multi-Versioning”).
The feature known as the
adaptive hash
index (AHI) lets InnoDB perform more
like an in-memory database on systems with appropriate
combinations of workload and ample memory for the
buffer pool, without
sacrificing any transactional features or reliability. This
feature is enabled by the
innodb_adaptive_hash_index
option, or turned off by the
--skip-innodb_adaptive_hash_index at server
startup.
Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed.
If a table fits almost entirely in main memory, a hash index
can speed up queries by enabling direct lookup of any element,
turning the index value into a sort of pointer.
InnoDB has a mechanism that monitors index
searches. If InnoDB notices that queries
could benefit from building a hash index, it does so
automatically.
With some workloads, the
speedup from hash index lookups greatly outweighs the extra
work to monitor index lookups and maintain the hash index
structure. Sometimes, the read/write lock that guards access
to the adaptive hash index can become a source of contention
under heavy workloads, such as multiple concurrent joins.
Queries with LIKE operators and
% wildcards also tend not to benefit from
the AHI. For workloads where the adaptive hash index is not
needed, turning it off reduces unnecessary performance
overhead. Because it is difficult to predict in advance
whether this feature is appropriate for a particular system,
consider running benchmarks with it both enabled and disabled,
using a realistic workload. The architectural changes in MySQL
5.6 and higher make more workloads suitable for disabling the
adaptive hash index than in earlier releases, although it is
still enabled by default.
The hash index is always built based on an existing
B-tree index on the table.
InnoDB can build a hash index on a prefix
of any length of the key defined for the B-tree, depending on
the pattern of searches that InnoDB
observes for the B-tree index. A hash index can be partial,
covering only those pages of the index that are often
accessed.
You can monitor the use of the adaptive hash index and the
contention for its use in the SEMAPHORES
section of the output of the
SHOW ENGINE
INNODB STATUS command. If you see many threads
waiting on an RW-latch created in
btr0sea.c, then it might be useful to
disable adaptive hash indexing.
For more information about the performance characteristics of hash indexes, see Section 8.3.8, “Comparison of B-Tree and Hash Indexes”.
The physical row structure for an InnoDB
table depends on the row format specified when the table was
created. By default, InnoDB uses the
Antelope file format and
its COMPACT row format. The
REDUNDANT format is available to retain
compatibility with older versions of MySQL. When you enable
the innodb_file_per_table
setting, you can also make use of the newer Barracuda file
format, with its DYNAMIC and
COMPRESSED row formats, as explained in
Section 5.4.8, “How InnoDB Stores Variable-Length Columns” and
Section 5.4.6, “Working with InnoDB Compressed Tables”.
To check the row format of an InnoDB table,
use SHOW TABLE STATUS.
The COMPACT row format decreases row
storage space by about 20% at the cost of increasing CPU use
for some operations. If your workload is a typical one that is
limited by cache hit rates and disk speed,
COMPACT format is likely to be faster. If
the workload is a rare case that is limited by CPU speed,
COMPACT format might be slower.
Rows in InnoDB tables that use
REDUNDANT row format have the following
characteristics:
Each index record contains a 6-byte header. The header is used to link together consecutive records, and also in row-level locking.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
Internally, InnoDB stores fixed-length
character columns such as
CHAR(10) in a fixed-length
format. InnoDB does not truncate
trailing spaces from
VARCHAR columns.
An SQL NULL value reserves one or two
bytes in the record directory. Besides that, an SQL
NULL value reserves zero bytes in the
data part of the record if stored in a variable length
column. In a fixed-length column, it reserves the fixed
length of the column in the data part of the record.
Reserving the fixed space for NULL
values enables an update of the column from
NULL to a non-NULL
value to be done in place without causing fragmentation of
the index page.
Rows in InnoDB tables that use
COMPACT row format have the following
characteristics:
Each index record contains a 5-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
The variable-length part of the record header contains a
bit vector for indicating NULL columns.
If the number of columns in the index that can be
NULL is N,
the bit vector occupies
CEILING(
bytes. (For example, if there are anywhere from 9 to 15
columns that can be N/8)NULL, the bit
vector uses two bytes.) Columns that are
NULL do not occupy space other than the
bit in this vector. The variable-length part of the header
also contains the lengths of variable-length columns. Each
length takes one or two bytes, depending on the maximum
length of the column. If all columns in the index are
NOT NULL and have a fixed length, the
record header has no variable-length part.
For each non-NULL variable-length
field, the record header contains the length of the column
in one or two bytes. Two bytes will only be needed if part
of the column is stored externally in overflow pages or
the maximum length exceeds 255 bytes and the actual length
exceeds 127 bytes. For an externally stored column, the
2-byte length indicates the length of the internally
stored part plus the 20-byte pointer to the externally
stored part. The internal part is 768 bytes, so the length
is 768+20. The 20-byte pointer stores the true length of
the column.
The record header is followed by the data contents of the
non-NULL columns.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
Internally, InnoDB stores fixed-length,
fixed-width character columns such as
CHAR(10) in a fixed-length
format. InnoDB does not truncate
trailing spaces from
VARCHAR columns.
Internally, InnoDB attempts to store
UTF-8
CHAR(
columns in N)N bytes by trimming
trailing spaces. (With REDUNDANT row
format, such columns occupy 3 ×
N bytes.) Reserving the minimum
space N in many cases enables
column updates to be done in place without causing
fragmentation of the index page.

User Comments
Add your own comment.