In databases, rows are persisted on disk and for every row, columns' value are stored on disk. Now, if a new column is to be added or an old column is to be dropped from a table, the contents of existing rows should be changed to reflect the correct values of columns present in the table. And, as the number of rows in the table increases, time taken to modify all existing rows to reflect the table definition change because of ADD/DROP columns also increases.
InnoDB, the default storage engine for MySQL, is not an exception here. And that's why one of the most asked feature requests from MySQL users is to have capability to ADD/DROP new/existing column(s) to/from an InnoDB Table, instantly.
The root cause of problem statement is clear : "the need of modifying all existing records in the table for ADD/DROP COLUMN". Therefore, if we could come up with a mechanism in which this "need" is omitted and we could achieve ADD/DROP COLUMN by just modifying the metadata. In other words, "don't touch any row but update the metadata only" is the core idea we came up with to solve it.
Acknowledging users requests, an early implementation of ADD COLUMN with ALGORITHM=INSTANT was added in MySQL 8.0.12. This feature enables users to ADD new columns "INSTANTLY" to a table irrespective of table size.
The following depicts how "ALTER TABLE ... ADD COLUMN" had behaved in MySQL 8.0.28 (before MySQL 8.0.29 introduced the optimization, described in this article):
Although, we had few limitations in this early implementation.
- With ALGORITHM=INSTANT, new column can be added only as the last column of table.
- ALGORITHM=INSTANT, is not supported for DROP COLUMN and it would still need a table rebuild.
We went back on whiteboard discussion to design something to make even the DROP COLUMN "instant". During this, we also had the limitations of earlier INSTANT ADD implementation in mind. Finally we came up with a new design which enables user to DROP column(s) from "any position" from a table with ALGORITHM=INSTANT. And this design can be easily consumed by the ADD COLUMN, ALGORITHM=INSTANT as well. So goal is achieved!
This design is implemented in 8.0.29. With this new implementation, one can :
- ADD new column(s) at "any position" to a table with ALGORITHM=INSTANT
- DROP existing column(s) from "any position" from a table with ALGORITHM=INSTANT
This design also follow the same idea i.e. "don't touch any row but update the metadata only". Therefore, ADD/DROP COLUMN operations aren't dependent on the table size anymore. In other words, the time taken to ADD/DROP new/old column(s) to/from a table with 1 row would be the same as time taken to ADD/DROP new/old column(s) to/from a table having 100M rows.
ALTER TABLE <table_name> ADD COLUMN <column_name> <column_type> [DEFAULT default_value] [FIRST]/[AFTER column_name], ALGORITHM=INSTANT;
ALTER TABLE <table_name> DROP COLUMN <column_name>, ALGORITHM=INSTANT;
ALGORITHM=INSTANT is optional here as, by default, all ADD/DROP columns are done with ALGORITHM=INSTANT.
Multiple columns can be ADD/DROP in a single ALTER TABLE STATEMENT.
How Does it work?
A new concept of row version is introduced in table metadata. The way this feature works is, a row is stamped with a "row version" which was the current row version in table metadata when this row was inserted. If table hasn't gone through any ALTER TABLE ... ADD/DROP COLUMN, all rows in the table are considered at row version 0. In the record header, there are 4 bits of metadata, "info-bits", in which one bit was unused. I've used this bit to indicate that the record has a row version. By default, this bit is always unset. And if this bit is set, then the record version number is stored in the record header. With this, it is easy to do "ALTER TABLE ... ADD/DROP COLUMN" without doing a table rebuild. And also an existing bit is used, INSTANT ADD/DROP DDL works fine on upgraded tables as well. I think the explanation of these low level details is worth a separate blog and I'll be explaining them in my follow-up blog. Stay tuned!
For every ALTER TABLE ... ADD/DROP COLUMN statement, a new row version is created. And any new row inserted after this ALTER TABLE statement will be stamped with this new row version. So one can guess that in a table, which has gone through multiple ALTER TABLE ... ADD/DROP COLUMN, there could be multiple rows with different row versions. And once the rows are fetched from this table, they are transformed from their stamped row version to the latest row version in the table metadata. This current row version is table metadata whereas the row version, a record belongs to, is row metadata.
NOTE: The row version is bumped only for an ALTER TABLE statement and not for each COLUMN being added/dropped. i.e. if in an ALTER TABLE ... ADD/DROP COLUMN statement, n columns are added and m columns are dropped, the row version is bumped by 1 only.
This transformation and keeping this row versions takes some maintenance behind the scene, thus there is a limit added to the row versions. It is 64 as of now. It means, on a table, there could be at max 64 ALTER TABLE operations to ADD/DROP columns. For any further ALTER TABLE ... ADD/DROP, goes for old way i.e. table rebuild. We started with the expectation that 64 might be a sufficient number as ADD/DROP columns might not be very regular operation in real world scenarios. This is where we are looking for feedback too. If this cap of 64 turns out to be smaller, it may be increased in later MySQL versions. NOTE : This is not configurable.
Any attempt to do ADD/DROP with ALGORITHM=INSTANT after the limit is reached, will result in following error :
NOTE : Above error is thrown only if ALGORITHM=INSTANT is used explicitly in ALTER TABLE statement. Otherwise, till version=64, INSTANT algorithm is used implicitly and after that it falls back to ALGORITHM=INPLACE implicitly.
A new column TOTAL_ROW_VERSIONS, is added to INFORMATION_SCHEMA.INNODB_TABLES showing the current row version for a table
Table rebuild (OPTIMIZE TABLE, ALTER TABLE) and TRUNCATE TABLE
There are other ALTER TABLE operations (eg : OPTIMIZE TABLE) which cause table to be rebuilt. Once the table is rebuild, INSTANT Metadata is cleared. i.e. table will be as good as having no INSTANT ADD/DROP COLUMN done.
The same is valid for TRUNCATE TABLE as well, as there are no rows left in the table so it is as good as a newly created table.
Is it really INSTANT?
Here is what I tried on my system :
So it actually is INSTANT
ADD/DROP column to/from a table with ALGORITHM=INSTANT is :
- not supported for table having FTS indexes.
- not supported for table having row_format=compressed.
- not supported for temporary tables
Here is the MySQL doc which talks about ALGORITHM=INSTANT.
NOTE : For this INSTANT ADD/DROP feature, we recommend using MySQL 8.0.32 or later releases.
Keep watching this space. I will be coming up with another blog which talks about more of internal details (row format changes, bits and bytes, how rows are transformed etc.)
Thanks for using MySQL!