This section summarizes the major InnoDB features and enhancements for performance and scalability. This information is useful to any DBA or developer who is concerned with performance and scalability. Although some of the enhancements do not require any action on your part, knowing this information can still help you diagnose performance issues more quickly and modernize systems and applications that rely on older, inefficient behavior.
InnoDB has always been highly efficient, and includes several unique architectural elements to assure high performance and scalability. The latest InnoDB storage engine includes new features that take advantage of advances in operating systems and hardware platforms, such as multi-core processors and improved memory allocation systems. In addition, new configuration options let you better control some InnoDB internal subsystems to achieve the best performance with your workload.
Starting with MySQL 5.5 and InnoDB 1.1, the built-in InnoDB
storage engine within MySQL is upgraded to the full feature set
and performance of the former InnoDB Plugin. This change makes
these performance and scalability enhancements available to a much
wider audience than before, and eliminates the separate
installation step of the InnoDB Plugin. After learning about the
InnoDB performance features in this section, continue with
Chapter 8, Optimization to learn the best practices for
overall MySQL performance, and Section 8.5, “Optimizing for InnoDB Tables”
in particular for InnoDB tips and guidelines.
Traditionally, the InnoDB
compression feature was
recommended primarily for read-only or read-mostly
workloads, such as in a
data warehouse
configuration. The rise of SSD
storage devices, which are fast but relatively small and
expensive, makes compression attractive also for
OLTP workloads: high-traffic, interactive web
sites can reduce their storage requirements and their I/O
operations per second (IOPS) by
using compressed tables with applications that do frequent
INSERT,
UPDATE, and
DELETE operations.
New configuration options in MySQL 5.6 let you adjust the way compression works for a particular MySQL instance, with an emphasis on performance and scalability for write-intensive operations:
innodb_compression_level lets
you turn the degree of compression up or down. A higher value
lets you fit more data onto a storage device, at the expense
of more CPU overhead during compression. A lower value lets
you reduce CPU overhead when storage space is not critical, or
you expect the data is not especially compressible.
innodb_compression_failure_threshold_pct
specifies a cutoff point for
compression
failures during updates to a compressed table. When
this threshold is passed, MySQL begins to leave additional
free space within each new compressed page, dynamically
adjusting the amount of free space up to the percentage of
page size specified by
innodb_compression_pad_pct_max
innodb_compression_pad_pct_max
lets you adjust the maximum amount of space reserved within
each page to record changes
to compressed rows, without needing to compress the entire
page again. The higher the value, the more changes can be
recorded without recompressing the page. MySQL uses a variable
amount of free space for the pages within each compressed
table, only when a designated percentage of compression
operations
“fail”
at runtime, requiring an expensive operation to split the
compressed page.
Because working with compressed data sometimes involves keeping
both compressed and uncompressed versions of a page in memory at
the same time, when using compression with an OLTP-style workload,
be prepared to increase the value of the
innodb_buffer_pool_size
configuration option.
For more information on MySQL data compression, see
Section 5.4.6, “Working with InnoDB Compressed Tables”. For the performance aspects,
especially see the section
Section 5.4.6.3, “Tuning Compression for InnoDB Tables”.
When a transaction is known to be read-only,
InnoDB can avoid the overhead associated with
setting up the transaction
ID (TRX_ID field). The transaction ID is
only needed for a
transaction that might
perform write operations or
locking reads such as
SELECT ... FOR UPDATE. Eliminating these
unnecessary transaction IDs reduces the size of internal data
structures that are consulted each time a query or DML statement
constructs a read view.
Currently, InnoDB detects the read-only nature
of the transaction and applies this optimization when any of the
following conditions are met:
The transaction is started with the
START TRANSACTION READ
ONLY statement. In this case, attempting to make any
changes to the database (for InnoDB,
MyISAM, or other types of tables) causes an
error, and the transaction continues in read-only state:
ERROR 1792 (25006): Cannot execute statement in a READ ONLY transaction.
You can still make changes to session-specific temporary tables in a read-only transaction, or issue locking queries for them, because those changes and locks are not visible to any other transaction.
The autocommit setting
is turned on, so that the transaction is guaranteed to be a
single statement, and the single statement making up the
transaction is a “non-locking”
SELECT statement. That is, a
SELECT that does not use a FOR
UPDATE or LOCK IN SHARED MODE
clause.
Thus, for a read-intensive application such as a report generator,
you can tune a sequence of InnoDB queries by
grouping them inside START TRANSACTION READ
ONLY and COMMIT, or by turning on the
autocommit setting before running the
SELECT statements, or simply by avoiding any
DML statements interspersed with
the queries.
Transactions that qualify as auto-commit, non-locking, and
read-only (AC-NL-RO) are kept out of certain internal
InnoDB data structures and are therefore not
listed in SHOW
ENGINE INNODB STATUS output. These transactions are
only visible in the Information Schema.
This feature optionally moves the InnoDB
undo log out of the
system tablespace
into one or more separate
tablespaces. The I/O
patterns for the undo log make these new tablespaces good
candidates to move to SSD storage,
while keeping the system tablespace on hard disk storage. Users
cannot drop the separate tablespaces created to hold
InnoDB undo logs, or the individual
segments inside those
tablespaces.
Because these files handle I/O operations formerly done inside the system tablespace, we broaden the definition of system tablespace to include these new files.
The undo logs are also known as the rollback segments.
This feature involves the following new or renamed configuration options:
innodb_rollback_segments becomes
innodb_undo_logs. The old
name is still available for compatibility.
To use this feature, follow these steps:
Decide on a path on a fast storage device to hold the undo
logs. You will specify that path as the argument to the
innodb_undo_directory option
in your MySQL configuration file or startup script.
Decide on a non-zero starting value for the
innodb_undo_logs option. You
can start with a relatively low value and increase it over
time to examine the effect on performance.
Decide on a non-zero value for the
innodb_undo_tablespaces
option. The multiple undo logs specified by the
innodb_undo_logs value are divided between
this many separate tablespaces (represented by
.ibd
files). This value is fixed for the life of the MySQL
instance, so if you are uncertain about the optimal value,
estimate on the high side.
Set up an entirely new MySQL instance for testing, using the values you chose in the configuration file or in your MySQL startup script. Use a realistic workload with data volume similar to your production servers.
Benchmark the performance of I/O intensive workloads.
Periodically increase the value of
innodb_undo_logs and re-do the performance
tests. Find the value where you stop experiencing gains in I/O
performance.
Deploy a new production instance using the ideal settings for these options. Set it up as a slave server in a replication configuration, or transfer data from an earlier production instance.
Keeping the undo logs in separate files allows the MySQL team to
implement I/O and memory optimizations related to this
transactional data. For example, because the undo data is written
to disk and then rarely used (only in case of crash recovery), it
does not need to be kept in the filesystem memory cache, in turn
allowing a higher percentage of system memory to be devoted to the
InnoDB buffer
pool.
The typical SSD best practice of keeping the
InnoDB system tablespace on a hard drive and
moving the per-table tablespaces to SSD, is assisted by moving the
undo information into separate tablespace files.
The physical tablespace files are named
undo, where
NN is the space ID, including leading
zeros.
Currently, MySQL instances containing separate undo tablespaces cannot be downgraded to earlier releases such as MySQL 5.5 or 5.1.
The benefits of the InnoDB
file-per-table setting
come with the tradeoff that each
.ibd
file is extended as the data inside the table grows. This I/O
operation can be a bottleneck for busy systems with many
InnoDB tables. When all
InnoDB tables are stored inside the
system tablespace,
this extension operation happens less frequently, as space freed
by DELETE or TRUNCATE
operations within one table can be reused by another table.
MySQL 5.6 improves the concurrency of the extension operation, so
that multiple .ibd files can be extended
simultaneously, and this operation does not block read or write
operations performed by other threads.
The code that detects
deadlocks in
InnoDB
transactions has been
modified to use a fixed-size work area rather than a recursive
algorithm. The resulting detection operation is faster as a
result. You do not need to do anything to take advantage of this
enhancement.
Under both the old and new detection mechanisms, you might
encounter a search too deep error that is not a
true deadlock, but requires you to re-try the transaction the same
way as with a deadlock.
You can enable the configuration option
innodb_checksum_algorithm=crc32
configuration setting to change the
checksum algorithm to a
faster one that scans the block 32 bits at a time rather than 8
bits at a time. When the CRC32 algorithm is enabled, data blocks
that are written to disk by InnoDB contain
different values in their checksum fields than before. This
process could be gradual, with a mix of old and new checksum
values within the same table or database.
For maximum downward compatibility, this setting is off by default:
Current versions of MySQL Enterprise Backup (up to 3.8.0) do not support backing up tablespaces that use crc32 checksums.
.ibd files containing crc32 checksums could
cause problems downgrading to MySQL versions prior to 5.6.3.
MySQL 5.6.3 and up recognizes either the new or old checksum
values for the block as correct when reading the block from
disk, ensuring that data blocks are compatible during upgrade
and downgrade regardless of the algorithm setting. If data
written with new checksum values is processed by an level of
MySQL earlier than 5.6.3, it could be reported as corrupted.
When you set up a new MySQL instance, and can be sure that all the
InnoDB data is created using the CRC32 checksum
algorithm, you can use the setting
innodb_checksum_algorithm=strict_crc32,
which can be faster than the crc32 setting
because it does not do the extra checksum calculations to support
both old and new values.
The innodb_checksum_algorithm
option has other values that allow it to replace the
innodb_checksums option.
innodb_checksum_algorithm=none is the same as
innodb_checksums=OFF.
innodb_checksum_algorithm=innodb is the same as
innodb_checksums=ON. To avoid conflicts, remove
references to innodb_checksums from your
configuration file and MySQL startup scripts. The new option also
accepts values strict_none and
strict_innodb, again offering better
performance in situations where all InnoDB data
in an instance is created with the same checksum algorithm.
The following table illustrates the difference between the
none, innodb, and
crc32 option values, and their
strict_ counterparts. none,
innodb, and crc32 write the
specified type checksum value into each data block, but for
compatibility accept any of the other checksum values when
verifying a block during a read operation. The
strict_ form of each parameter only recognizes
one kind of checksum, which makes verification faster but requires
that all InnoDB data files in an instance be
created under the identical
innodb_checksum_algorithm value.
Table 14.3. Allowed Settings for innodb_checksum_algorithm
| Value | Generated checksum (when writing) | Allowed checksums (when reading) |
|---|---|---|
| none | A constant number. | Any of the checksums generated by none,
innodb, or crc32. |
| innodb | A checksum calculated in software, using the original algorithm from
InnoDB. | Any of the checksums generated by none,
innodb, or crc32. |
| crc32 | A checksum calculated using the crc32 algorithm,
possibly done with a hardware assist. | Any of the checksums generated by none,
innodb, or crc32. |
| strict_none | A constant number | Only the checksum generated by none. |
| strict_innodb | A checksum calculated in software, using the original algorithm from
InnoDB. | Only the checksum generated by innodb. |
| strict_crc32 | A checksum calculated using the crc32 algorithm,
possibly done with a hardware assist. | Only the checksum generated by crc32. |
After you restart a busy server, there is typically a
warmup period with steadily
increasing throughput, as disk pages that were in the
InnoDB buffer
pool are brought back into memory as the same data is
queried, updated, and so on. Once the buffer pool holds a similar
set of pages as before the restart, many operations are performed
in memory rather than involving disk I/O, and throughput
stabilizes at a high level.
This feature shortens the warmup period by immediately reloading disk pages that were in the buffer pool before the restart, rather than waiting for DML operations to access the corresponding rows. The I/O requests can be performed in large batches, making the overall I/O faster. The page loading happens in the background, and does not delay the database startup.
In addition to saving the buffer pool state at shutdown and restoring it at startup, you can also save or restore the state at any time. For example, you might save the state of the buffer pool after reaching a stable throughput under a steady workload. You might restore the previous buffer pool state after running reports or maintenance jobs that bring data pages into the buffer pool that are only needed during the time period for those operations, or after some other period with a non-typical workload.
Although the buffer pool itself could be many gigabytes in size,
the data that InnoDB saves on disk to restore
the buffer pool is tiny by comparison: just the tablespace and
page IDs necessary to locate the appropriate pages on disk. This
information is derived from the
information_schema table
innodb_buffer_page_lru.
Because the data is cached in and aged out of the buffer pool the same as with regular database operations, there is no problem if the disk pages were updated recently, or if a DML operation involves data that has not yet been loaded. The loading mechanism skips any requested pages that no longer exist.
This feature involves the configuration variables:
and the status variables:
To save the current state of the InnoDB buffer
pool, issue the statement:
SET innodb_buffer_pool_dump_now=ON;
The underlying mechanism involves a background thread that is dispatched to perform the dump and load operations.
By default, the buffer pool state is saved in a file
ib_buffer_pool in the
InnoDB data directory.
Disk pages from compressed tables are loaded into the buffer pool in their compressed form. Uncompression happens as usual when the page contents are accessed in the course of DML operations. Because decompression is a CPU-intensive process, it is more efficient for concurrency to perform that operation in one of the connection threads rather than the single thread that performs the buffer pool restore operation.
Example 14.1. Examples of Dumping and Restoring the InnoDB Buffer Pool
Trigger a dump of the buffer pool manually:
SET innodb_buffer_pool_dump_now=ON;
Specify that a dump should be taken at shutdown:
SET innodb_buffer_pool_dump_at_shutdown=ON;
Specify that a dump should be loaded at startup:
SET innodb_buffer_pool_load_at_startup=ON;
Trigger a load of the buffer pool manually:
SET innodb_buffer_pool_load_now=ON;
Specify which filename to use for storing the dump to and loading the dump from:
SET innodb_buffer_pool_filename='filename';
Display progress of dump:
SHOW STATUS LIKE 'innodb_buffer_pool_dump_status';
or:
SELECT variable_value FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_DUMP_STATUS';
Outputs any of: not started, Dumping buffer pool 5/7, page 237/2873, Finished at 110505 12:18:02
Display progress of load:
SHOW STATUS LIKE 'innodb_buffer_pool_load_status';
or:
SELECT variable_value FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_LOAD_STATUS';
Outputs any of: not started, Loaded 123/22301 pages, Finished at 110505 12:23:24
Abort a buffer pool load:
SET innodb_buffer_pool_load_abort=ON;
The new configuration options
innodb_flush_neighbors and
innodb_lru_scan_depth let you
fine-tune certain aspects of the
flushing process for the
InnoDB buffer
pool. These options primarily help write-intensive
workloads. With heavy
DML activity, flushing can fall
behind if it is not aggressive enough, resulting in excessive
memory use in the buffer pool; or, disk writes due to flushing can
saturate your I/O capacity if that mechanism is too aggressive.
The ideal settings depend on your workload, data access patterns,
and storage configuration (for example, whether data is stored on
HDD or
SSD devices).
For systems with constant heavy
workloads, or workloads that
fluctuate widely, several new configuration options let you
fine-tune the flushing behavior
for InnoDB tables:
innodb_adaptive_flushing_lwm,
innodb_max_dirty_pages_pct_lwm,
innodb_io_capacity_max, and
innodb_flushing_avg_loops. These
options feed into an improved formula used by the
innodb_adaptive_flushing option.
The existing
innodb_adaptive_flushing,
innodb_io_capacity and
innodb_max_dirty_pages_pct
options work as before, except that they are limited or extended
by other options:
innodb_adaptive_flushing_lwm,
innodb_io_capacity_max and
innodb_max_dirty_pages_pct_lwm:
The InnoDB
adaptive
flushing mechanism is not appropriate in all cases. It
gives the most benefit when the
redo log is in danger of
filling up. The
innodb_adaptive_flushing_lwm
option specifies a percentage of redo log capacity; when that
threshold is crossed, InnoDB turns on
adaptive flushing even if not specified by the
innodb_adaptive_flushing
option.
If flushing activity falls far behind,
InnoDB can flush more aggressively than
specified by
innodb_io_capacity.
innodb_io_capacity_max
represents an upper limit on the I/O capacity used in such
emergency situations, so that the spike in I/O does not
consume all the capacity of the server.
When the
innodb_max_dirty_pages_pct
threshold is crossed, InnoDB can begin
aggressively flushing pages to disk. The
innodb_max_dirty_pages_pct_lwm
option specifies a higher value at which
InnoDB begins gently flushing pages,
ideally preventing the percentage of dirty pages from reaching
innodb_max_dirty_pages_pct. A
value of
innodb_max_dirty_pages_pct_lwm=0
disables this “preflushing” behavior.
All of these options are most applicable for servers running heavy
workloads for long periods of time, when there is rarely enough
idle time to catch up with changes waiting to be written to disk.
The innodb_flushing_avg_loops
lets you distinguish between a server that is running at full
capacity 24x7 and one that experiences periodic spikes in
workload. For a server with a consistently high workload, keep
this value high so that the adaptive flushing algorithm responds
immediately to changes in the I/O rate. For a server that
experiences peaks and troughs in its workload, keep this value low
so that InnoDB does not overreact to sudden
spikes in DML activity.
Plan stability is a desirable goal for your biggest and most important queries. InnoDB has always computed statistics for each InnoDB table to help the optimizer find the most efficient query execution plan. Now you can make these statistics persistent, so that the index usage and join order for a particular query is less likely to change.
This feature is on by default, enabled by the configuration option
innodb_stats_persistent.
You control how much sampling is done to collect the statistics by
setting the configuration options
innodb_stats_persistent_sample_pages
and
innodb_stats_transient_sample_pages.
The configuration option
innodb_stats_auto_recalc
determines whether the statistics are calculated automatically
whenever a table undergoes substantial changes (to more than 10%
of the rows). If that setting is disabled, ensure the accuracy of
optimizer statistics by issuing the ANALYZE
TABLE statement for each applicable table after creating
an index or making substantial changes to indexed columns. You
might run this statement in your setup scripts after
representative data has been loaded into the table, and run it
periodically after DML operations significantly change the
contents of indexed columns, or on a schedule at times of low
activity.
To ensure statistics are gathered when a new index is created,
either enable the
innodb_stats_auto_recalc
option, or run ANALYZE TABLE
after creating each new index when the persistent statistics
mode is enabled.
You can also set
innodb_stats_persistent,
innodb_stats_auto_recalc, and
innodb_stats_sample_pages options at the
session level before creating a table, or use the
STATS_PERSISTENT,
STATS_AUTO_RECALC, and
STATS_SAMPLE_PAGES clauses on the
CREATE TABLE and
ALTER TABLE statements, to override
the system-wide setting and configure persistent statistics for
individual tables.
Formerly, these statistics were cleared on each server restart and after some other operations, and recomputed when the table was next accessed. The statistics are computed using a random sampling technique that could produce different estimates the next time, leading to different choices in the execution plan and thus variations in query performance.
To revert to the previous method of collecting statistics that are
periodically erased, run the command ALTER TABLE
.
tbl_name STATS_PERSISTENT=0
The persistent statistics feature relies on the internally managed
tables in the mysql database, named
innodb_table_stats and
innodb_index_stats. These tables are set up
automatically in all install, upgrade, and build-from-source
procedures.
The innodb_table_stats and
innodb_index_stats tables both include a
last_update column showing when index
statistics were last updated, as shown in the following example:
mysql> select * from INNODB_TABLE_STATS \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
last_update: 2013-05-28 16:16:44
n_rows: 200
clustered_index_size: 1
sum_of_other_index_sizes: 1
...mysql> select * from INNODB_INDEX_STATS \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
index_name: PRIMARY
last_update: 2013-05-28 16:16:44
stat_name: n_diff_pfx01
stat_value: 200
sample_size: 1
...
If you manually update the statistics in the tables during
troubleshooting or tuning, issue the command FLUSH TABLE
to make MySQL reload
the updated statistics.
tbl_name
In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. InnoDB has historically protected the internal state of a read/write lock with an InnoDB mutex. On Unix and Linux platforms, the internal state of an InnoDB mutex is protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).
On many platforms, there is a more efficient way to implement mutexes and read/write locks. Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, and thus result in less wasted time when threads are contending for access to shared data structures. This in turn means greater scalability on multi-core platforms.
InnoDB implements mutexes and read/write locks with the
built-in
functions provided by the GNU Compiler Collection (GCC) for atomic
memory access instead of using the Pthreads approach
previously used. More specifically, an InnoDB that is compiled
with GCC version 4.1.2 or later uses the atomic builtins instead
of a pthread_mutex_t to implement InnoDB
mutexes and read/write locks.
On 32-bit Microsoft Windows, InnoDB has implemented mutexes (but not read/write locks) with hand-written assembler instructions. Beginning with Microsoft Windows 2000, functions for Interlocked Variable Access are available that are similar to the built-in functions provided by GCC. On Windows 2000 and higher, InnoDB makes use of the Interlocked functions. Unlike the old hand-written assembler code, the new implementation supports read/write locks and 64-bit platforms.
Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default. When MySQL is compiled on Solaris 10 with a compiler that does not support the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access, InnoDB uses the library functions.
This change improves the scalability of InnoDB on multi-core systems. This feature is enabled out-of-the-box on the platforms where it is supported. You do not have to set any parameter or option to take advantage of the improved performance. On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.
When MySQL starts, InnoDB writes a message to the log file
indicating whether atomic memory access is used for mutexes, for
mutexes and read/write locks, or neither. If suitable tools are
used to build InnoDB and the target CPU supports the atomic
operations required, InnoDB uses the built-in functions for
mutexing. If, in addition, the compare-and-swap operation can be
used on thread identifiers (pthread_t), then
InnoDB uses the instructions for read-write locks as well.
Note: If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.
For more information about the performance implications of locking, see Section 8.10, “Optimizing Locking Operations”.
When InnoDB was developed, the memory allocators supplied with
operating systems and run-time libraries were often lacking in
performance and scalability. At that time, there were no memory
allocator libraries tuned for multi-core CPUs. Therefore, InnoDB
implemented its own memory allocator in the mem
subsystem. This allocator is guarded by a single mutex, which may
become a bottleneck. InnoDB
also implements a wrapper interface around the system allocator
(malloc and free) that is
likewise guarded by a single mutex.
Today, as multi-core systems have become more widely available,
and as operating systems have matured, significant improvements
have been made in the memory allocators provided with operating
systems. New memory allocators perform better and are more
scalable than they were in the past. The leading high-performance
memory allocators include Hoard,
libumem, mtmalloc,
ptmalloc, tbbmalloc, and
TCMalloc. Most workloads, especially those
where memory is frequently allocated and released (such as
multi-table joins), benefit from using a more highly tuned memory
allocator as opposed to the internal, InnoDB-specific memory
allocator.
You can control whether InnoDB uses its own memory allocator or an
allocator of the operating system, by setting the value of the
system configuration parameter
innodb_use_sys_malloc in the
MySQL option file (my.cnf or
my.ini). If set to ON or
1 (the default), InnoDB uses the
malloc and free functions of
the underlying system rather than manage memory pools itself. This
parameter is not dynamic, and takes effect only when the system is
started. To continue to use the InnoDB memory allocator, set
innodb_use_sys_malloc to
0.
When the InnoDB memory allocator is disabled, InnoDB ignores the
value of the parameter
innodb_additional_mem_pool_size.
The InnoDB memory allocator uses an additional memory pool for
satisfying allocation requests without having to fall back to
the system memory allocator. When the InnoDB memory allocator is
disabled, all such allocation requests are fulfilled by the
system memory allocator.
On Unix-like systems that use dynamic linking, replacing the
memory allocator may be as easy as making the environment
variable LD_PRELOAD or
LD_LIBRARY_PATH point to the dynamic library
that implements the allocator. On other systems, some relinking
may be necessary. Please refer to the documentation of the
memory allocator library of your choice.
Since InnoDB cannot track all memory use when the system memory
allocator is used
(innodb_use_sys_malloc is
ON), the section “BUFFER POOL AND
MEMORY” in the output of the SHOW ENGINE INNODB
STATUS command only includes the buffer pool
statistics in the “Total memory allocated”. Any
memory allocated using the mem subsystem or
using ut_malloc is excluded.
For more information about the performance implications of InnoDB memory usage, see Section 8.9, “Buffering and Caching”.
When INSERT, UPDATE, and
DELETE operations are done to a table, often
the values of indexed columns (particularly the values of
secondary keys) are not in sorted order, requiring substantial I/O
to bring secondary indexes up to date. InnoDB has an
insert buffer that
caches changes to secondary index entries when the relevant
page is not in the
buffer pool, thus
avoiding I/O operations by not reading in the page from the disk.
The buffered changes are merged when the page is loaded to the
buffer pool, and the updated page is later flushed to disk using
the normal mechanism. The InnoDB main thread merges buffered
changes when the server is nearly idle, and during a
slow shutdown.
Because it can result in fewer disk reads and writes, this feature is most valuable for workloads that are I/O-bound, for example applications with a high volume of DML operations such as bulk inserts.
However, the insert buffer occupies a part of the buffer pool, reducing the memory available to cache data pages. If the working set almost fits in the buffer pool, or if your tables have relatively few secondary indexes, it may be useful to disable insert buffering. If the working set entirely fits in the buffer pool, insert buffering does not impose any extra overhead, because it only applies to pages that are not in the buffer pool.
You can control the extent to which InnoDB performs insert
buffering with the system configuration parameter
innodb_change_buffering. You can
turn on and off buffering for inserts, delete operations (when
index records are initially marked for deletion) and purge
operations (when index records are physically deleted). An update
operation is represented as a combination of an insert and a
delete. In MySQL 5.5 and higher, the default value is changed from
inserts to all.
The allowed values of
innodb_change_buffering
are:
all
The default value: buffer inserts, delete-marking operations, and purges.
none
Do not buffer any operations.
inserts
Buffer insert operations.
deletes
Buffer delete-marking operations.
changes
Buffer both inserts and delete-marking.
purges
Buffer the physical deletion operations that happen in the background.
You can set the value of this parameter in the MySQL option file
(my.cnf or my.ini) or change
it dynamically with the SET GLOBAL command,
which requires the SUPER privilege. Changing
the setting affects the buffering of new operations; the merging
of already buffered entries is not affected.
For more information about speeding up INSERT,
UPDATE, and DELETE
statements, see Section 8.2.2, “Optimizing DML Statements”.
If a table fits almost entirely in main memory, the fastest way to
perform queries on it is to use
hash indexes rather
than B-tree lookups. MySQL
monitors searches on each index defined for an InnoDB table. If it
notices that certain index values are being accessed frequently,
it automatically builds an in-memory hash table for that index.
See Section 14.2.3.12.6, “Adaptive Hash Indexes” for background
information and usage guidelines for the
adaptive hash
index feature and the
innodb_adaptive_hash_index
configuration option.
InnoDB uses operating system threads to process requests from user transactions. (Transactions may issue many requests to InnoDB before they commit or roll back.) On modern operating systems and servers with multi-core processors, where context switching is efficient, most workloads run well without any limit on the number of concurrent threads. Scalability improvements in MySQL 5.5 and up reduce the need to limit the number of concurrently executing threads inside InnoDB.
In situations where it is helpful to minimize context switching between threads, InnoDB can use a number of techniques to limit the number of concurrently executing operating system threads (and thus the number of requests that are processed at any one time). When InnoDB receives a new request from a user session, if the number of threads concurrently executing is at a pre-defined limit, the new request sleeps for a short time before it tries again. A request that cannot be rescheduled after the sleep is put in a first-in/first-out queue and eventually is processed. Threads waiting for locks are not counted in the number of concurrently executing threads.
You can limit the number of concurrent threads by setting the
configuration parameter
innodb_thread_concurrency.
Once the number of executing threads reaches this limit,
additional threads sleep for a number of microseconds, set by the
configuration parameter
innodb_thread_sleep_delay,
before being placed into the queue.
Previously, it required experimentation to find the optimal value
for innodb_thread_sleep_delay, and the optimal
value could change depending on the workload. In MySQL 5.6.3 and
higher, you can set the configuration option
innodb_adaptive_max_sleep_delay
to the highest value you would allow for
innodb_thread_sleep_delay, and InnoDB
automatically adjusts innodb_thread_sleep_delay
up or down depending on the current thread-scheduling activity.
This dynamic adjustment helps the thread scheduling mechanism to
work smoothly during times when the system is lightly loaded and
when it is operating near full capacity.
The default value for
innodb_thread_concurrency
and the implied default limit on the number of concurrent threads
has been changed in various releases of MySQL and InnoDB.
Currently, the default value of
innodb_thread_concurrency is
0, so that by default there is no limit on the
number of concurrently executing threads, as shown in
Table 14.4, “Changes to innodb_thread_concurrency”.
Table 14.4. Changes to innodb_thread_concurrency
| InnoDB Version | MySQL Version | Default value | Default limit of concurrent threads | Value to allow unlimited threads |
|---|---|---|---|---|
| Built-in | Earlier than 5.1.11 | 20 | No limit | 20 or higher |
| Built-in | 5.1.11 and newer | 8 | 8 | 0 |
| InnoDB before 1.0.3 | (corresponding to Plugin) | 8 | 8 | 0 |
| InnoDB 1.0.3 and newer | (corresponding to Plugin) | 0 | No limit | 0 |
Note that InnoDB causes threads to sleep only when the number of
concurrent threads is limited. When there is no limit on the
number of threads, all contend equally to be scheduled. That is,
if innodb_thread_concurrency is
0, the value of
innodb_thread_sleep_delay is
ignored.
When there is a limit on the number of threads, InnoDB reduces
context switching overhead by permitting multiple requests made
during the execution of a single SQL statement to enter InnoDB
without observing the limit set by
innodb_thread_concurrency.
Since an SQL statement (such as a join) may comprise multiple row
operations within InnoDB, InnoDB assigns “tickets”
that allow a thread to be scheduled repeatedly with minimal
overhead.
When a new SQL statement starts, a thread has no tickets, and it
must observe
innodb_thread_concurrency.
Once the thread is entitled to enter InnoDB, it is assigned a
number of tickets that it can use for subsequently entering
InnoDB. If the tickets run out,
innodb_thread_concurrency
is observed again and further tickets are assigned. The number of
tickets to assign is specified by the global option
innodb_concurrency_tickets,
which is 500 by default. A thread that is waiting for a lock is
given one ticket once the lock becomes available.
The correct values of these variables depend on your environment and workload. Try a range of different values to determine what value works for your applications. Before limiting the number of concurrently executing threads, review configuration options that may improve the performance of InnoDB on multi-core and multi-processor computers, such as innodb_use_sys_malloc and innodb_adaptive_hash_index.
For general performance information about MySQL thread handling, see Section 8.11.5.1, “How MySQL Uses Threads for Client Connections”.
A read-ahead request is an I/O request to prefetch multiple pages in the buffer pool asynchronously, in anticipation that these pages will be needed soon. The requests bring in all the pages in one extent. InnoDB uses two read-ahead algorithms to improve I/O performance:
Linear read-ahead is a
technique that predicts what pages might be needed soon based on
pages in the buffer pool being accessed sequentially. You control
when InnoDB performs a read-ahead operation by adjusting the
number of sequential page accesses required to trigger an
asynchronous read request, using the configuration parameter
innodb_read_ahead_threshold.
Before this parameter was added, InnoDB would only calculate
whether to issue an asynchronous prefetch request for the entire
next extent when it read in the last page of the current extent.
The new configuration parameter
innodb_read_ahead_threshold
controls how sensitive InnoDB is in detecting patterns of
sequential page access. If the number of pages read sequentially
from an extent is greater than or equal to
innodb_read_ahead_threshold,
InnoDB initiates an asynchronous read-ahead operation of the
entire following extent. It can be set to any value from 0-64. The
default value is 56. The higher the value, the more strict the
access pattern check. For example, if you set the value to 48,
InnoDB triggers a linear read-ahead request only when 48 pages in
the current extent have been accessed sequentially. If the value
is 8, InnoDB would trigger an asynchronous read-ahead even if as
few as 8 pages in the extent were accessed sequentially. You can
set the value of this parameter in the MySQL
configuration file,
or change it dynamically with the SET GLOBAL
command, which requires the SUPER privilege.
Random read-ahead is a
technique that predicts when pages might be needed soon based on
pages already in the buffer pool, regardless of the order in which
those pages were read. If 13 consecutive pages from the same
extent are found in the buffer pool, InnoDB asynchronously issues
a request to prefetch the remaining pages of the extent. This
feature was initially turned off in MySQL 5.5. It is available
once again starting in MySQL 5.1.59 and 5.5.16 and higher, turned
off by default. To enable this feature, set the configuration
variable
innodb_random_read_ahead.
The SHOW ENGINE INNODB STATUS command displays
statistics to help you evaluate the effectiveness of the
read-ahead algorithm. With the return of random read-ahead in
MySQL 5.6, the SHOW ENGINE INNODB STATUS
command once again includes
Innodb_buffer_pool_read_ahead_rnd.
Innodb_buffer_pool_read_ahead keeps its current
name. (In earlier releases, it was listed as
Innodb_buffer_pool_read_ahead_seq.) See
Section 14.2.5.11, “More Read-Ahead Statistics” for more
information.
For more information about I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O” and
Section 8.11.3, “Optimizing Disk I/O”.
InnoDB uses background threads
to service various types of I/O requests. You can configure the
number of background threads that service read and write I/O on
data pages, using the configuration parameters
innodb_read_io_threads and
innodb_write_io_threads. These
parameters signify the number of background threads used for read
and write requests respectively. They are effective on all
supported platforms. You can set the value of these parameters in
the MySQL option file (my.cnf or
my.ini); you cannot change them dynamically.
The default value for these parameters is 4 and
the permissible values range from 1-64.
The purpose of this change is to make InnoDB more scalable on high
end systems. Each background thread can handle up to 256 pending
I/O requests. A major source of background I/O is the
read-ahead requests. InnoDB
tries to balance the load of incoming requests in such way that
most of the background threads share work equally. InnoDB also
attempts to allocate read requests from the same extent to the
same thread to increase the chances of coalescing the requests
together. If you have a high end I/O subsystem and you see more
than 64 ×
innodb_read_io_threads pending
read requests in SHOW ENGINE INNODB STATUS, you
might gain by increasing the value of
innodb_read_io_threads.
For more information about InnoDB I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O”.
Starting in InnoDB 1.1 with MySQL 5.5, the
asynchronous I/O
capability that InnoDB has had on Windows systems is now available
on Linux systems. (Other Unix-like systems continue to use
synchronous I/O calls.) This feature improves the scalability of
heavily I/O-bound systems, which typically show many pending
reads/writes in the output of the command SHOW ENGINE
INNODB STATUS\G.
Running with a large number of InnoDB I/O
threads, and especially running multiple such instances on the
same server machine, can exceed capacity limits on Linux systems.
In this case, you can fix the error:
EAGAIN: The specified maxevents exceeds the user's limit of available events.
by writing a higher limit to
/proc/sys/fs/aio-max-nr.
In general, if a problem with the asynchronous I/O subsystem in
the OS prevents InnoDB from starting, set the option
innodb_use_native_aio=0 in the
configuration file. This new configuration option applies to Linux
systems only, and cannot be changed once the server is running.
For more information about InnoDB I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O”.
InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. Historically, InnoDB used group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.
Group commit in InnoDB worked until MySQL 4.x, and works once again with MySQL 5.1 with the InnoDB Plugin, and MySQL 5.5 and higher. The introduction of support for the distributed transactions and Two Phase Commit (2PC) in MySQL 5.0 interfered with the InnoDB group commit functionality. This issue is now resolved.
The group commit functionality inside InnoDB works with the Two
Phase Commit protocol in MySQL. Re-enabling of the group commit
functionality fully ensures that the ordering of commit in the
MySQL binlog and the InnoDB logfile is the same as it was before.
It means it is totally safe to use the
MySQL Enterprise Backup product with InnoDB 1.0.4 (that
is, the InnoDB Plugin with MySQL 5.1) and above. When the binlog
is enabled, you typically also set the configuration option
sync_binlog=0, because group commit for the
binary log is only supported if it is set to 0.
Group commit is transparent; you do not need to do anything to take advantage of this significant performance improvement.
For more information about performance of
COMMIT and other transactional operations, see
Section 8.5.2, “Optimizing InnoDB Transaction Management”.
The master thread in InnoDB is a thread that performs various tasks in the background. Most of these tasks are I/O related, such as flushing dirty pages from the buffer pool or writing changes from the insert buffer to the appropriate secondary indexes. The master thread attempts to perform these tasks in a way that does not adversely affect the normal working of the server. It tries to estimate the free I/O bandwidth available and tune its activities to take advantage of this free capacity. Historically, InnoDB has used a hard coded value of 100 IOPs (input/output operations per second) as the total I/O capacity of the server.
The parameter innodb_io_capacity
indicates the overall I/O capacity available to InnoDB, per buffer
pool instance. This parameter should be set to approximately the
number of I/O operations that the system can perform per second.
The value depends on your system configuration. When
innodb_io_capacity is set, the
master threads estimates the I/O bandwidth available for
background tasks based on the set value. Setting the value to
100 reverts to the old behavior.
You can set the value of
innodb_io_capacity to any
number 100 or greater. The default value is
200, reflecting that the performance of typical
modern I/O devices is higher than in the early days of MySQL.
Typically, values around the previous default of 100 are
appropriate for consumer-level storage devices, such as hard
drives up to 7200 RPMs. Faster hard drives, RAID configurations,
and SSDs benefit from higher values.
You can set the value of this parameter in the MySQL option file
(my.cnf or my.ini) or change
it dynamically with the SET GLOBAL command,
which requires the SUPER privilege.
Formerly, the InnoDB master thread also
performed any needed purge
operations. In MySQL 5.6.5 and higher, those I/O operations are
moved to other background threads, whose number is controlled by
the innodb_purge_threads
configuration option.
For more information about InnoDB I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O”.
InnoDB performs certain tasks in the background, including
flushing of
dirty pages (those
pages that have been changed but are not yet written to the
database files) from the
buffer pool, a task
performed by the master
thread. Currently, InnoDB aggressively flushes buffer pool
pages if the percentage of dirty pages in the buffer pool exceeds
innodb_max_dirty_pages_pct.
InnoDB uses a new algorithm to estimate the required rate of flushing, based on the speed of redo log generation and the current rate of flushing. The intent is to smooth overall performance by ensuring that buffer flush activity keeps up with the need to keep the buffer pool “clean”. Automatically adjusting the rate of flushing can help to avoid sudden dips in throughput, when excessive buffer pool flushing limits the I/O capacity available for ordinary read and write activity.
InnoDB uses its log files in a circular fashion. Before reusing a
portion of a log file, InnoDB flushes to disk all dirty buffer
pool pages whose redo entries are contained in that portion of the
log file, a process known as a
sharp checkpoint. If
a workload is write-intensive, it generates a lot of redo
information, all written to the log file. If all available space
in the log files is used up, a sharp checkpoint occurs, causing a
temporary reduction in throughput. This situation can happen even
though innodb_max_dirty_pages_pct
is not reached.
InnoDB uses a heuristic-based algorithm to avoid such a scenario, by measuring the number of dirty pages in the buffer pool and the rate at which redo is being generated. Based on these numbers, InnoDB decides how many dirty pages to flush from the buffer pool each second. This self-adapting algorithm is able to deal with sudden changes in the workload.
Internal benchmarking has also shown that this algorithm not only maintains throughput over time, but can also improve overall throughput significantly.
Because adaptive flushing is a new feature that can significantly
affect the I/O pattern of a workload, a new configuration
parameter lets you turn off this feature. The default value of the
boolean parameter
innodb_adaptive_flushing is
TRUE, enabling the new algorithm. You can set
the value of this parameter in the MySQL option file
(my.cnf or my.ini) or change
it dynamically with the SET GLOBAL command,
which requires the SUPER privilege.
For more information about InnoDB I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O”.
Synchronization inside InnoDB frequently involves the use of
spin loops: while waiting,
InnoDB executes a tight loop of instructions repeatedly to avoid
having the InnoDB process and
threads be rescheduled by the
operating system. If the spin loops are executed too quickly,
system resources are wasted, imposing a performance penalty on
transaction throughput. Most modern processors implement the
PAUSE instruction for use in spin loops, so the
processor can be more efficient.
InnoDB uses a PAUSE instruction in its spin
loops on all platforms where such an instruction is available.
This technique increases overall performance with CPU-bound
workloads, and has the added benefit of minimizing power
consumption during the execution of the spin loops.
You do not have to do anything to take advantage of this performance improvement.
For performance considerations for InnoDB locking operations, see Section 8.10, “Optimizing Locking Operations”.
Many InnoDB mutexes and rw-locks are reserved for a short time. On a multi-core system, it can be more efficient for a thread to continuously check if it can acquire a mutex or rw-lock for a while before sleeping. If the mutex or rw-lock becomes available during this polling period, the thread can continue immediately, in the same time slice. However, too-frequent polling by multiple threads of a shared object can cause “cache ping pong”, different processors invalidating portions of each others' cache. InnoDB minimizes this issue by waiting a random time between subsequent polls. The delay is implemented as a busy loop.
You can control the maximum delay between testing a mutex or
rw-lock using the parameter
innodb_spin_wait_delay. The
duration of the delay loop depends on the C compiler and the
target processor. (In the 100MHz Pentium era, the unit of delay
was one microsecond.) On a system where all processor cores share
a fast cache memory, you might reduce the maximum delay or disable
the busy loop altogether by setting
innodb_spin_wait_delay=0. On a system with
multiple processor chips, the effect of cache invalidation can be
more significant and you might increase the maximum delay.
The default value of
innodb_spin_wait_delay is
6. The spin wait delay is a dynamic global
parameter that you can specify in the MySQL option file
(my.cnf or my.ini) or change
at runtime with the command SET GLOBAL
innodb_spin_wait_delay=,
where delay is the
desired maximum delay. Changing the setting requires the
delaySUPER privilege.
For performance considerations for InnoDB locking operations, see Section 8.10, “Optimizing Locking Operations”.
Rather than using a strictly LRU algorithm, InnoDB uses a technique to minimize the amount of data that is brought into the buffer pool and never accessed again. The goal is to make sure that frequently accessed (“hot”) pages remain in the buffer pool, even as read-ahead and full table scans bring in new blocks that might or might not be accessed afterward.
Newly read blocks are inserted into the middle of the list
representing the buffer pool. of the LRU list. All newly read
pages are inserted at a location that by default is
3/8 from the tail of the LRU list. The pages
are moved to the front of the list (the most-recently used end)
when they are accessed in the buffer pool for the first time. Thus
pages that are never accessed never make it to the front portion
of the LRU list, and “age out” sooner than with a
strict LRU approach. This arrangement divides the LRU list into
two segments, where the pages downstream of the insertion point
are considered “old” and are desirable victims for
LRU eviction.
For an explanation of the inner workings of the InnoDB buffer pool
and the specifics of its LRU replacement algorithm, see
Section 8.9.1, “The InnoDB Buffer Pool”.
You can control the insertion point in the LRU list, and choose
whether InnoDB applies the same optimization to blocks brought
into the buffer pool by table or index scans. The configuration
parameter
innodb_old_blocks_pct
controls the percentage of “old” blocks in the LRU
list. The default value of
innodb_old_blocks_pct is
37, corresponding to the original fixed ratio
of 3/8. The value range is 5 (new pages in the
buffer pool age out very quickly) to 95 (only
5% of the buffer pool is reserved for hot pages, making the
algorithm close to the familiar LRU strategy).
The optimization that keeps the buffer pool from being churned by
read-ahead can avoid similar problems due to table or index scans.
In these scans, a data page is typically accessed a few times in
quick succession and is never touched again. The configuration
parameter innodb_old_blocks_time
specifies the time window (in milliseconds) after the first access
to a page during which it can be accessed without being moved to
the front (most-recently used end) of the LRU list. The default
value of innodb_old_blocks_time
is 0, corresponding to the original behavior of
moving a page to the most-recently used end of the buffer pool
list when it is first accessed in the buffer pool. Increasing this
value makes more and more blocks likely to age out faster from the
buffer pool.
Both innodb_old_blocks_pct and
innodb_old_blocks_time are
dynamic, global and can be specified in the MySQL option file
(my.cnf or my.ini) or
changed at runtime with the SET GLOBAL command.
Changing the setting requires the SUPER
privilege.
To help you gauge the effect of setting these parameters, the
SHOW ENGINE INNODB STATUS command reports
additional statistics. The BUFFER POOL AND
MEMORY section looks like:
Total memory allocated 1107296256; in additional pool allocated 0 Dictionary memory allocated 80360 Buffer pool size 65535 Free buffers 0 Database pages 63920 Old database pages 23600 Modified db pages 34969 Pending reads 32 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 414946, not young 2930673 1274.75 youngs/s, 16521.90 non-youngs/s Pages read 486005, created 3178, written 160585 2132.37 reads/s, 3.40 creates/s, 323.74 writes/s Buffer pool hit rate 950 / 1000, young-making rate 30 / 1000 not 392 / 1000 Pages read ahead 1510.10/s, evicted without access 0.00/s LRU len: 63920, unzip_LRU len: 0 I/O sum[43690]:cur[221], unzip sum[0]:cur[0]
Old database pages is the number of pages
in the “old” segment of the LRU list.
Pages made young and not
young is the total number of “old”
pages that have been made young or not respectively.
youngs/s and non-young/s
is the rate at which page accesses to the “old”
pages have resulted in making such pages young or otherwise
respectively since the last invocation of the command.
young-making rate and
not provides the same rate but in terms of
overall buffer pool accesses instead of accesses just to the
“old” pages.
Per second averages provided in InnoDB
Monitor output are based on the elapsed time between the current
time and the last time InnoDB Monitor output
was printed.
Because the effects of these parameters can vary widely based on your hardware configuration, your data, and the details of your workload, always benchmark to verify the effectiveness before changing these settings in any performance-critical or production environment.
In mixed workloads where most of the activity is OLTP type with
periodic batch reporting queries which result in large scans,
setting the value of
innodb_old_blocks_time
during the batch runs can help keep the working set of the normal
workload in the buffer pool.
When scanning large tables that cannot fit entirely in the buffer
pool, setting
innodb_old_blocks_pct to a
small value keeps the data that is only read once from consuming a
significant portion of the buffer pool. For example, setting
innodb_old_blocks_pct=5 restricts this data
that is only read once to 5% of the buffer pool.
When scanning small tables that do fit into memory, there is less
overhead for moving pages around within the buffer pool, so you
can leave innodb_old_blocks_pct
at its default value, or even higher, such as
innodb_old_blocks_pct=50.
The effect of the
innodb_old_blocks_time
parameter is harder to predict than the
innodb_old_blocks_pct
parameter, is relatively small, and varies more with the workload.
To arrive at an optimal value, conduct your own benchmarks if the
performance improvement from adjusting
innodb_old_blocks_pct is
not sufficient.
For more information about the InnoDB buffer pool, see
Section 8.9.1, “The InnoDB Buffer Pool”.
A number of optimizations speed up certain steps of the recovery that happens on the next startup after a crash. In particular, scanning the redo log and applying the redo log are faster than in MySQL 5.1 and earlier, due to improved algorithms for memory management. You do not need to take any actions to take advantage of this performance enhancement. If you kept the size of your redo log files artificially low because recovery took a long time, you can consider increasing the file size.
For more information about InnoDB recovery, see
Section 14.2.3.13, “The InnoDB Recovery Process”.
Starting with InnoDB 1.1 with MySQL 5.5, you can profile certain internal InnoDB operations using the MySQL Performance Schema feature. This type of tuning is primarily for expert users, those who push the limits of MySQL performance, read the MySQL source code, and evaluate optimization strategies to overcome performance bottlenecks. DBAs can also use this feature for capacity planning, to see whether their typical workload encounters any performance bottlenecks with a particular combination of CPU, RAM, and disk storage; and if so, to judge whether performance can be improved by increasing the capacity of some part of the system.
To use this feature to examine InnoDB performance:
You must be running MySQL 5.5 or higher. You must build the
database server from source, enabling the Performance Schema
feature by building with the
--with-perfschema option. Since the
Performance Schema feature introduces some performance
overhead, you should use it on a test or development system
rather than on a production system.
You must be running InnoDB 1.1 or higher.
You must be generally familiar with how to use the
Performance Schema
feature, for example to query tables in the
performance_schema database.
Examine the following kinds of InnoDB objects by querying the
appropriate performance_schema tables. The
items associated with InnoDB all contain the substring
innodb in the EVENT_NAME
column.
For the definitions of the *_instances
tables, see
Section 21.9.2, “Performance Schema Instance Tables”. For the
definitions of the *_summary_* tables, see
Section 21.9.8, “Performance Schema Summary Tables”. For the
definition of the thread table, see
Section 21.9.9, “Performance Schema Miscellaneous Tables”. For
the definition of the *_current_* and
*_history_* tables, see
Section 21.9.3, “Performance Schema Wait Event Tables”.
Mutexes in the
mutex_instances table. (Mutexes and
RW-locks related to the InnoDB buffer
pool are not included in this coverage; the same applies
to the output of the SHOW ENGINE INNODB
MUTEX command.)
RW-locks in the
rwlock_instances table.
RW-locks in the rwlock_instances table.
File I/O operations in the
file_instances,
file_summary_by_event_name, and
file_summary_by_instance tables.
Threads in the
PROCESSLIST table.
During performance testing, examine the performance data in
the events_waits_current and
events_waits_history_long tables. If you
are interested especially in InnoDB-related objects, use the
clause WHERE EVENT_NAME LIKE '%innodb%' to
see just those entries; otherwise, examine the performance
statistics for the overall MySQL server.
You must be running MySQL 5.5, with the Performance Schema
enabled by building with the
--with-perfschema build option.
For more information about the MySQL Performance Schema, see Chapter 21, MySQL Performance Schema.
This performance enhancement is primarily useful for people with a
large buffer pool size,
typically in the multi-gigabyte range. To take advantage of this
speedup, you must set the new
innodb_buffer_pool_instances
configuration option, and you might also adjust the
innodb_buffer_pool_size value.
When the InnoDB buffer pool is large, many data requests can be satisfied by retrieving from memory. You might encounter bottlenecks from multiple threads trying to access the buffer pool at once. Starting in InnoDB 1.1 and MySQL 5.5, you can enable multiple buffer pools to minimize this contention. Each page that is stored in or read from the buffer pool is assigned to one of the buffer pools randomly, using a hashing function. Each buffer pool manages its own free lists, flush lists, LRUs, and all other data structures connected to a buffer pool, and is protected by its own buffer pool mutex.
To enable this feature, set the
innodb_buffer_pool_instances configuration
option to a value greater than 1 (the default) up to 64 (the
maximum). This option takes effect only when you set the
innodb_buffer_pool_size to a size of 1 gigabyte
or more. The total size you specify is divided among all the
buffer pools. For best efficiency, specify a combination of
innodb_buffer_pool_instances and
innodb_buffer_pool_size so that
each buffer pool instance is at least 1 gigabyte.
For more information about the InnoDB buffer pool, see
Section 8.9.1, “The InnoDB Buffer Pool”.
Starting in InnoDB 1.1 with MySQL 5.5, the limit on concurrent transactions is greatly expanded, removing a bottleneck with the InnoDB rollback segment that affected high-capacity systems. The limit applies to concurrent transactions that change any data; read-only transactions do not count against that maximum.
The single rollback segment is now divided into 128 segments, each of which can support up to 1023 transactions that perform writes, for a total of approximately 128K concurrent transactions. The original transaction limit was 1023.
Each transaction is assigned to one of the rollback segments, and remains tied to that rollback segment for the duration. This enhancement improves both scalability (higher number of concurrent transactions) and performance (less contention when different transactions access the rollback segments).
To take advantage of this feature, you do not need to create any new database or tables, or reconfigure anything. You must do a slow shutdown before upgrading from MySQL 5.1 or earlier, or some time afterward. InnoDB makes the required changes inside the system tablespace automatically, the first time you restart after performing a slow shutdown.
If your workload was not constrained by the original limit of 1023
concurrent transactions, you can reduce the number of rollback
segments used within a MySQL instance or within a session by
setting the configuration option
innodb_rollback_segments.
For more information about performance of InnoDB under high
transactional load, see
Section 8.5.2, “Optimizing InnoDB Transaction Management”.
The purge operations (a type of garbage collection) that InnoDB performs automatically is now done in one or more separate threads, rather than as part of the master thread. This change improves scalability, because the main database operations run independently from maintenance work happening in the background.
To control this feature, increase the value of the configuration
option
innodb_purge_threads=.
If DML action is concentrated on a single table or a few tables,
keep the setting low so that the threads do not contend with each
other for access to the busy tables. If DML operations are spread
across many tables, increase the setting. Its maximum is 32.
n
There is another related configuration option,
innodb_purge_batch_size with a default of 20
and maximum of 5000. This option is mainly intended for
experimentation and tuning of purge operations, and should not be
interesting to typical users.
For more information about InnoDB I/O performance, see
Section 8.5.7, “Optimizing InnoDB Disk I/O”.
This is another performance improvement that comes for free, with no user action or configuration needed. The details here are intended for performance experts who delve into the InnoDB source code, or interpret reports with keywords such as “mutex” and “log_sys”.
The mutex known as the log sys
mutex has historically done double duty, controlling access to
internal data structures related to log records and the
LSN, as well as pages in the
buffer pool that are
changed when a
mini-transaction is
committed. Starting in InnoDB 1.1 with MySQL 5.5, these two kinds
of operations are protected by separate mutexes, with a new
log_buf mutex controlling writes to buffer pool
pages due to mini-transactions.
For performance considerations for InnoDB locking operations, see Section 8.10, “Optimizing Locking Operations”.
Starting with InnoDB 1.1 with MySQL 5.5, concurrent access to the buffer pool is faster. Operations involving the flush list, a data structure related to the buffer pool, are now controlled by a separate mutex and do not block access to the buffer pool. You do not need to configure anything to take advantage of this speedup; it is fully automatic.
For more information about the InnoDB buffer pool, see
Section 8.9.1, “The InnoDB Buffer Pool”.
The mutex controlling concurrent
access to the InnoDB kernel is now divided into
separate mutexes and rw-locks
to reduce contention. You do not need to configure anything to
take advantage of this speedup; it is fully automatic.
To ease the memory load on systems with huge numbers of tables,
InnoDB now frees up the memory associated with an opened table,
using an LRU algorithm to select tables that have gone the longest
without being accessed. To reserve more memory to hold metadata
for open InnoDB tables, increase the value of the
table_definition_cache
configuration option. InnoDB treats this value as a “soft
limit”. The actual number of tables with cached metadata
could be higher, because metadata for InnoDB system tables, and
parent and child tables in foreign key relationships, is never
evicted from memory.
Several new features extend the file-per-table mode enabled by the
innodb_file_per_table
configuration option, allowing more flexibility in how the
.ibd files are placed, exported, and
restored. We characterize this as a performance enhancement
because it solves the common customer request to put data from
different tables onto different storage devices, for best
price/performance depending on the access patterns of the data.
For example, tables with high levels of random reads and writes
might be placed on an SSD device,
while less-often-accessed data or data processed with large
batches of sequential I/O might be placed on an
HDD device. See
Section 5.4.1, “Managing InnoDB Tablespaces” for details.
The memcached daemon is frequently used as an
in-memory caching layer in front of a MySQL database server. Now
MySQL allows direct access to InnoDB tables
using the familiar memcached protocol and
client libraries. Instead of formulating queries in SQL, you can
perform simple get, set, and increment operations that avoid the
performance overhead of SQL parsing and constructing a query
optimization plan. You can also access the underlying
InnoDB tables through SQL to load data,
generate reports, or perform multi-step transactional
computations.
This technique allows the data to be stored in MySQL for reliability and consistency, while coding application logic that uses the database as a fast key-value store.
This feature combines the best of both worlds:
Data that is written using the memcached protocol is transparently written to an InnoDB table, without going through the MySQL SQL layer. You can control the frequency of writes to achieve higher raw performance when updating non-critical data.
Data that is requested data through the memcached protocol is transparently queried from an InnoDB table, without going through the MySQL SQL layer.
Subsequent requests for the same data will be served from the
InnoDB buffer pool. The buffer pool handles
the in-memory caching. You can tune the performance of
data-intensive operations using the familiar
InnoDB configuration options.
InnoDB can handle composing and decomposing multiple column
values into a single memcached item value,
reducing the amount of string parsing and concatenation
required in your application. For example, you might store a
string value 2|4|6|8 in the
memcached cache, and InnoDB splits that
value based on a separator character, then stores the result
into four numeric columns.
For details on using this NoSQL-style interface to MySQL, see Section 14.2.9, “InnoDB Integration with memcached”. For additional background on memcached and considerations for writing applications for its API, see Section 15.6, “Using MySQL with memcached”.
This feature is a continuation of the “Fast Index Creation” feature introduced in Fast Index Creation in the InnoDB Storage Engine. Now you can perform other kinds of DDL operations on InnoDB tables online: that is, with minimal delay for operations on that table, without rebuilding the entire table, or both. This enhancement improves responsiveness and availability in busy production environments, where making a table unavailable for minutes or hours whenever its column definitions change is not practical.
For full details, see Section 5.5, “Online DDL for InnoDB Tables”.
The DDL operations enhanced by this feature are these variations
on the ALTER TABLE statement:
Create secondary
indexes: CREATE INDEX
or
name ON
table
(col_list)ALTER TABLE . (Creating a
primary key or a table ADD
INDEX name
(col_list)FULLTEXT index still
requires locking the table.)
Drop secondary
indexes: DROP INDEX
or name ON
table;ALTER
TABLE
table DROP INDEX
name
Creating and dropping secondary indexes on
InnoDB tables has avoided the table-copying
behavior since the days of MySQL 5.1 with the
InnoDB Plugin. Now, the table remains
available for read and write operations while the index is
being created or dropped. The CREATE
TABLE or DROP TABLE
statement only finishes after all transactions that are
modifying the table are completed, so that the initial state
of the index reflects the most recent contents of the table.
Previously, modifying the table while an index was being created or dropped typically resulted in a deadlock that cancelled the insert, update, or delete statement on the table.
Changing the
auto-increment
value for a column: ALTER TABLE
table
AUTO_INCREMENT=next_value;
Especially in a distributed system using replication or sharding, you sometimes reset the auto-increment counter for a table to a specific value. The next row inserted into the table uses the specified value for its auto-increment column. You might also use this technique in a data warehousing environment where you periodically empty all the tables and reload them, and you can restart the auto-increment sequence from 1.
Adding or dropping a foreign key constraint:
ALTER TABLEtbl1ADD CONSTRAINTfk_nameFOREIGN KEYindex(col1) REFERENCEStbl2(col2)referential_actions; ALTER TABLEtblDROP FOREIGN KEYfk_name;
Dropping a foreign key can be performed online with the
foreign_key_checks option
enabled or disabled. Creating a foreign key online requires
foreign_key_checks to be
disabled.
If you do not know the names of the foreign key constraints on
a particular table, issue the following statement and find the
constraint name in the CONSTRAINT clause
for each foreign key:
show create table table\G
Or, query the
information_schema.table_constraints
table and use the constraint_name and
constraint_type columns to identify the
foreign key names.
As a consequence of this enhancement, you can now also drop a foreign key and its associated index in a single statement, which previously required separate statements in a strict order:
ALTER TABLEtableDROP FOREIGN KEYconstraint, DROP INDEXindex;
Renaming a column: ALTER TABLE
tbl CHANGE
old_col_name
new_col_name
datatype
When you keep the same data type and only change the column name, this operation can always be performed online. As part of this enhancement, you can now rename a column that is part of a foreign key constraint, which was not allowed before.
Some other ALTER TABLE
operations are non-blocking, and are faster than before
because the table-copying operation is optimized, even though
a table copy is still required:
Changing the ROW_FORMAT or
KEY_BLOCK_SIZE properties for a table.
Changing the nullable status for a column.
Adding, dropping, or reordering columns.
As your database schema evolves with new columns, data types,
constraints, indexes, and so on, keep your
CREATE TABLE statements up to
date with the latest table definitions. Even with the
performance improvements of online DDL, it is more efficient to
create stable database structures at the beginning, rather than
creating part of the schema and then issuing
ALTER TABLE statements afterward.
The main exception to this guideline is for secondary indexes on tables with large numbers of rows. It is typically most efficient to create the table with all details specified except the secondary indexes, load the data, then create the secondary indexes.
Whatever sequence of CREATE
TABLE, CREATE INDEX,
ALTER TABLE, and similar
statements went into putting a table together, you can capture
the SQL needed to reconstruct the current form of the table by
issuing the statement SHOW CREATE TABLE
(uppercase
table\G\G required for tidy formatting). This output
shows clauses such as numeric precision, NOT
NULL, and CHARACTER SET that are
sometimes added behind the scenes, and you might otherwise leave
out when cloning the table on a new system or setting up foreign
key columns with identical type.

User Comments
Add your own comment.