InnoDB
is a multi-version storage engine. It
keeps information about old versions of changed rows to support
transactional features such as concurrency and rollback. This
information is stored in undo tablespaces in a data structure called
a rollback segment. See Section 17.6.3.4, “Undo Tablespaces”.
InnoDB
uses the information in the rollback
segment to perform the undo operations needed in a transaction
rollback. It also uses the information to build earlier versions of
a row for a consistent read. See
Section 17.7.2.3, “Consistent Nonlocking Reads”.
Internally, InnoDB
adds three fields to each row
stored in the database:
A 6-byte
DB_TRX_ID
field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted.A 7-byte
DB_ROLL_PTR
field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.A 6-byte
DB_ROW_ID
field contains a row ID that increases monotonically as new rows are inserted. IfInnoDB
generates a clustered index automatically, the index contains row ID values. Otherwise, theDB_ROW_ID
column does not appear in any index.
Undo logs in the rollback segment are divided into insert and update
undo logs. Insert undo logs are needed only in transaction rollback
and can be discarded as soon as the transaction commits. Update undo
logs are used also in consistent reads, but they can be discarded
only after there is no transaction present for which
InnoDB
has assigned a snapshot that in a
consistent read could require the information in the update undo log
to build an earlier version of a database row. For additional
information about undo logs, see Section 17.6.6, “Undo Logs”.
It is recommend that you commit transactions regularly, including
transactions that issue only consistent reads. Otherwise,
InnoDB
cannot discard data from the update undo
logs, and the rollback segment may grow too big, filling up the undo
tablespace in which it resides. For information about managing undo
tablespaces, see Section 17.6.3.4, “Undo Tablespaces”.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
In the InnoDB
multi-versioning scheme, a row is
not physically removed from the database immediately when you delete
it with an SQL statement. InnoDB
only physically
removes the corresponding row and its index records when it discards
the update undo log record written for the deletion. This removal
operation is called a purge, and it is quite fast, usually taking
the same order of time as the SQL statement that did the deletion.
If you insert and delete rows in smallish batches at about the same
rate in the table, the purge thread can start to lag behind and the
table can grow bigger and bigger because of all the
“dead” rows, making everything disk-bound and very
slow. In such cases, throttle new row operations, and allocate more
resources to the purge thread by tuning the
innodb_max_purge_lag
system
variable. For more information, see
Section 17.8.9, “Purge Configuration”.
InnoDB
multiversion concurrency control (MVCC)
treats secondary indexes differently than clustered indexes.
Records in a clustered index are updated in-place, and their
hidden system columns point undo log entries from which earlier
versions of records can be reconstructed. Unlike clustered index
records, secondary index records do not contain hidden system
columns nor are they updated in-place.
When a secondary index column is updated, old secondary index
records are delete-marked, new records are inserted, and
delete-marked records are eventually purged. When a secondary
index record is delete-marked or the secondary index page is
updated by a newer transaction, InnoDB
looks up
the database record in the clustered index. In the clustered
index, the record's DB_TRX_ID
is checked, and
the correct version of the record is retrieved from the undo log
if the record was modified after the reading transaction was
initiated.
If a secondary index record is marked for deletion or the
secondary index page is updated by a newer transaction, the
covering index
technique is not used. Instead of returning values from the index
structure, InnoDB
looks up the record in the
clustered index.
However, if the
index
condition pushdown (ICP) optimization is enabled, and parts
of the WHERE
condition can be evaluated using
only fields from the index, the MySQL server still pushes this
part of the WHERE
condition down to the storage
engine where it is evaluated using the index. If no matching
records are found, the clustered index lookup is avoided. If
matching records are found, even among delete-marked records,
InnoDB
looks up the record in the clustered
index.