InnoDB is a multi-version storage engine. It
keeps information about old versions of changed rows to support
transactional features such as concurrency and rollback. This
information is stored in undo tablespaces in a data structure called
a rollback segment. See Section 18.104.22.168, “Undo Tablespaces”.
InnoDB uses the information in the rollback
segment to perform the undo operations needed in a transaction
rollback. It also uses the information to build earlier versions of
a row for a consistent read. See
Section 22.214.171.124, “Consistent Nonlocking Reads”.
InnoDB adds three fields to each row
stored in the database:
DB_TRX_IDfield indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted.
DB_ROLL_PTRfield called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.
DB_ROW_IDfield contains a row ID that increases monotonically as new rows are inserted. If
InnoDBgenerates a clustered index automatically, the index contains row ID values. Otherwise, the
DB_ROW_IDcolumn does not appear in any index.
Undo logs in the rollback segment are divided into insert and update
undo logs. Insert undo logs are needed only in transaction rollback
and can be discarded as soon as the transaction commits. Update undo
logs are used also in consistent reads, but they can be discarded
only after there is no transaction present for which
InnoDB has assigned a snapshot that in a
consistent read could require the information in the update undo log
to build an earlier version of a database row. For additional
information about undo logs, see Section 15.6.6, “Undo Logs”.
It is recommend that you commit transactions regularly, including
transactions that issue only consistent reads. Otherwise,
InnoDB cannot discard data from the update undo
logs, and the rollback segment may grow too big, filling up the undo
tablespace in which it resides. For information about managing undo
tablespaces, see Section 126.96.36.199, “Undo Tablespaces”.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
InnoDB multi-versioning scheme, a row is
not physically removed from the database immediately when you delete
it with an SQL statement.
InnoDB only physically
removes the corresponding row and its index records when it discards
the update undo log record written for the deletion. This removal
operation is called a purge, and it is quite fast, usually taking
the same order of time as the SQL statement that did the deletion.
If you insert and delete rows in smallish batches at about the same
rate in the table, the purge thread can start to lag behind and the
table can grow bigger and bigger because of all the
“dead” rows, making everything disk-bound and very
slow. In such cases, throttle new row operations, and allocate more
resources to the purge thread by tuning the
variable. For more information, see
Section 15.8.9, “Purge Configuration”.
InnoDB multiversion concurrency control (MVCC)
treats secondary indexes differently than clustered indexes.
Records in a clustered index are updated in-place, and their
hidden system columns point undo log entries from which earlier
versions of records can be reconstructed. Unlike clustered index
records, secondary index records do not contain hidden system
columns nor are they updated in-place.
When a secondary index column is updated, old secondary index
records are delete-marked, new records are inserted, and
delete-marked records are eventually purged. When a secondary
index record is delete-marked or the secondary index page is
updated by a newer transaction,
InnoDB looks up
the database record in the clustered index. In the clustered
index, the record's
DB_TRX_ID is checked, and
the correct version of the record is retrieved from the undo log
if the record was modified after the reading transaction was
If a secondary index record is marked for deletion or the
secondary index page is updated by a newer transaction, the
technique is not used. Instead of returning values from the index
InnoDB looks up the record in the
However, if the
condition pushdown (ICP) optimization is enabled, and parts
WHERE condition can be evaluated using
only fields from the index, the MySQL server still pushes this
part of the
WHERE condition down to the storage
engine where it is evaluated using the index. If no matching
records are found, the clustered index lookup is avoided. If
matching records are found, even among delete-marked records,
InnoDB looks up the record in the clustered