InnoDB is a multi-versioned storage engine: it
keeps information about old versions of changed rows, to support
transactional features such as concurrency and rollback. This
information is stored in the tablespace in a data structure called
a rollback segment
(after an analogous data structure in Oracle).
InnoDB uses the information in the rollback
segment to perform the undo operations needed in a transaction
rollback. It also uses the information to build earlier versions
of a row for a consistent read.
InnoDB adds three fields to each
row stored in the database. A 6-byte
field indicates the transaction identifier for the last
transaction that inserted or updated the row. Also, a deletion is
treated internally as an update where a special bit in the row is
set to mark it as deleted. Each row also contains a 7-byte
DB_ROLL_PTR field called the roll pointer. The
roll pointer points to an undo log record written to the rollback
segment. If the row was updated, the undo log record contains the
information necessary to rebuild the content of the row before it
was updated. A 6-byte
DB_ROW_ID field contains
a row ID that increases monotonically as new rows are inserted. If
InnoDB generates a clustered index
automatically, the index contains row ID values. Otherwise, the
DB_ROW_ID column does not appear in any index.
Undo logs in the rollback segment are divided into insert and
update undo logs. Insert undo logs are needed only in transaction
rollback and can be discarded as soon as the transaction commits.
Update undo logs are used also in consistent reads, but they can
be discarded only after there is no transaction present for which
InnoDB has assigned a snapshot that in a
consistent read could need the information in the update undo log
to build an earlier version of a database row.
Commit your transactions regularly, including those transactions
that issue only consistent reads. Otherwise,
InnoDB cannot discard data from the update undo
logs, and the rollback segment may grow too big, filling up your
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
InnoDB multi-versioning scheme, a row is
not physically removed from the database immediately when you
delete it with an SQL statement.
physically removes the corresponding row and its index records
when it discards the update undo log record written for the
deletion. This removal operation is called a
purge, and it is quite fast,
usually taking the same order of time as the SQL statement that
did the deletion.
If you insert and delete rows in smallish batches at about the
same rate in the table, the purge thread can start to lag behind
and the table can grow bigger and bigger because of all the
“dead” rows, making everything disk-bound and very
slow. In such a case, throttle new row operations, and allocate
more resources to the purge thread by tuning the
variable. See Section 14.6.7, “InnoDB Startup Options and System Variables” for more
InnoDB multiversion concurrency control
(MVCC) treats secondary indexes differently than clustered
indexes. Records in a clustered index are updated in-place, and
their hidden system columns point undo log entries from which
earlier versions of records can be reconstructed. Unlike
clustered index records, secondary index records do not contain
hidden system columns nor are they updated in-place.
When a secondary index column is updated, old secondary index
records are delete-marked, new records are inserted, and
delete-marked records are eventually purged. When a secondary
index record is delete-marked or the secondary index page is
updated by a newer transaction,
up the database record in the clustered index. In the clustered
index, the record's
DB_TRX_ID is checked, and
the correct version of the record is retrieved from the undo log
if the record was modified after the reading transaction was
If a secondary index record is marked for deletion or the
secondary index page is updated by a newer transaction, the
technique is not used. Instead of returning values from the
InnoDB looks up the record
in the clustered index.