WL#11250: Support Instant Add Column
Affects: Server-8.0
—
Status: Complete
Porting Tecent's patch on Instant Add Column. To summarize the idea: 1. It introduced a new info bits on the record to signify the row has new rows added instantly 2. If the bit is set, then there will be one or two additional bytes in the row header to signify the extra columns added 3. A new system table is added to record the "default value" for each added column 4. The rows before instant add column would always have the old number of columns for its life time (until deleted). Such number of "core" rows will be stored in se_private_data of the mysql.tables 5. For "old" rows, the default value will be looked up from the new system tables and appended before return to server. 6. For the duraton of instant adding column, MDL will be placed and maintained on the table. In this way, we can instantly add a column with full back version compatibility with minimum code change. Some restrictions of this approach: 1. Columns can only be added last, it cannot be added in between existing columns 2. Instantly added columns cannot be PK, unless a table rebuild 3. Instant add column does not and will not support compressed table format
FR1: To support this instant ADD COLUMN, there is one syntax change for ALTER TABLE. One option for the ALGORITHM will be introduced, called INSTANT: ALTER TABLE ... ALGORITHM = INSTANT; This INSTANT key word will support instant ADD COLUMN in this worklog, along with some no-op ALTER TABLE(no change in the table). If user specifies INSTANT for other ALTER TABLE(s) than the previous ones, server will return an error like HA_ALTER_INPLACE_NOT_SUPPORTED to user without doing anything to the table. Furthermore, the ALGORITHM = INSTANT would not work with LOCK clause, because this is meaningless. So if INSTANT is specified, and LOCK=NONE/SHARED/EXCLUSIVE is also specified, ER_WRONG_USAGE would be issued. However LOCK=DEFAULT is fine. FR2: Support Instant ALTER TABLE ADD COLUMN FR2.0: This only involves ADD COLUMN, and adding new columns at last(opposite to our existing key word FIRST) FR2.1: Adding columns instantly should work automatically when ALGORITHM=DEFAULT/INSTANT FR2.2: Support adding multiple new columns in one ALTER TABLE clause FR2.3: Support setting DEFAULT value for new column(s) during instant ADD COLUMN FR2.4: Support adding virtual columns along with regular columns in one clause FR2.5: Support ROW_FORMAT=DYNAMIC/COMPACT/REDUNDANT For example, for a table t1, following should be supported as instant ADD COLUMN, if ALGORITHM is INSTANT: a. ALTER TABLE t1 ADD COLUMN ... [,ALGORITHM=DEFAULT/INSTANT]; b. ALTER TABLE t1 ADD COLUMN ..., ADD COLUMN ..., ...; c. ALTER TABLE t1 ADD COLUMN ... DEFAULT, ADD COLUMN ... DEFAULT, ...; d. ALTER TABLE t1 ADD COLUMN ..., ADD COLUMN ... VIRTUAL, ADD COLUMN ...; e. ALTER TABLE t1 ADD COLUMN ... AFTER (last column) ...; But if other operations or ALGORITHM are involved, then the behavior for them are not changed. For example, following are not supported: f. ALTER TABLE t1 ADD COLUMN ..., ALGORITHM = INPLACE/COPY; g. ALTER TABLE t1 ADD COLUMN ..., ADD KEY ..., DROP KEY ...; h. ALTER TABLE t1 ADD COLUMN ..., DROP COLUMN ...; i. ALTER TABLE t1 ADD COLUMN FIRST[AFTER ...(but not last column)]...; j. ALTER TABLE t1 ADD COLUMN ..., FORCE; k. etc. FR3: Query of tables with added columns should return correct value FR4: DMLs on table with newly added columns should work FR5: Both crash recovery and rollback of the tables with added columns should work without problems FR6: Adding and cascading update/delete of Foreign Keys on newly added columns instantly should work FR7: IMPORT/EXPORT on a table with instantly added columns should work. There is also one change for generic IMPORT that after the IMPORT, the clustered index of the table would be checked, if the data is corrupted, the IMPORT would fail. FR8: There should be some relevant information in I_S.innodb_tables and I_S.innodb_columns for user to check. FR8.1 I_S.innodb_tables.instant_cols, which means the number of columns before first instant ADD COLUMN. Please note that this instant ADD COLUMN means stored columns, adding a virtual column doesn't introduce this number. FR8.2 I_S.innodb_columns.has_default/default_value, once an instant ADD COLUMN is done, the default value of the columns added instantly would be stored here. Please note that future update of the default value of these columns won't change the values here. FR9: There should be no requirement for data file upgrade FR10: Tables with fulltext index are not supported by this worklog FR11: Temporary tables are always working in ALGORITHM=COPY way, so it's not included in this worklog FR12: All tables in DD tablespace are not supported by this worklog, since there is little chance that user will change these tables or create tables in this tablespace FR13: After instant ADD COLUMN, internal record formats could be different, since some are not complete. However, after some kind of table rebuild or truncation, the record formats should become complete. Theses operations include: a) All ALTER TABLE which require table rebuild, like DROP COLUMN, etc. b) TRUNCATE TABLE c) If it's partitioned table, then all new partitions(ALTER TABLE ... PARTITION) and all truncated partitions d) OPTIMIZE TABLE, etc. FR14: For EXCHANGE PARTITION, to simplify the logic, if either the partition or the table to be swapped is instant, then the operation would be rejected with error ER_PARTITION_EXCHANGE_DIFFERENT_OPTION. FR15: There are some operations which may need to check data during altering, for example: adding a new geometry column defined as NOT NULL, etc. Because currently the data checking has to be done during the altering(copying data), so it's not possible to have them as INSTNAT.
0. Definitions ============== COLUMN: This is used when we say ADD COLUMN in SQL. FIELD: This is used when we say a column in a row. That is for every single row, there is a physical record in clustered index, and a column in the row maps to a field in the record in our code. instant ADD COLUMN: This means all kinds of ADD COLUMN operations supported by this worklog which can be done instantly. 1. General ========== 1.1. Problems of current ADD COLUMN ----------------------------------- Currently, InnoDB supports two types of ALTER TABLE ADD COLUMN, no matter what kind of columns and how many columns are added in one clause. There are ALGORITHM = INPLACE and ALGORITHM = COPY. Although it's called INPLACE algorithm which is online DDL indeed, however, a table rebuild is required to finish it. The reason is that the internal record format is not flexible enough for new fields(columns). For example, if it's a new style record, length of variable- length field and SQL-null flags would be stored in the record header. So to add every new column, the record header may change too, which applies to all exiting rows. So it looks like a simplest way is not reorganize every single record of the table, but read the old records and add the new fields, then insert them into a new table. Users have a strong request that some DDL like current ADD COLUMN should be done in a very short duration since it looks like only a metadata change. In this way, they don't need to wait a long time for the table coming to a final shape and lots of IO can be saved. 1.2. Tencent's contribution --------------------------- To fix this issue, a team from Tencent company figured out a solution to add columns instantly, without modifying records of the table. The rough idea has been already mentioned in the HLD section. The benefits of their contribution are: a) ALTER TABLE .. ADD COLUMN can be done instantly, without too much IO and waiting time b) Row format is compatible with existing data files 2. Operations which can be instant ================================== As mentioned in FR2, this worklog mainly supports the instant ADD COLUMN operations. Multiple columns, either regular columns or virtual columns, with or without default values are all supported. Refer to FR2 to see which kind of SQL is supported and which is not. 2.1 ALGORITHM and LOCK ---------------------- Basically, there are INPLACE and COPY algorithm, both of which would be kept as is. Only when user specifies the INSTANT algorithm, current ADD COLUMN could be done instantly with only some metadata changes, rather than an internal table rebuilding. So after this worklog, as long as the algorithm is ignored, or specified as DEFAULT/INSTANT, InnoDB will try an instant ADD COLUMN is possible. If users want to still do original online ADD COLUMN with table rebuilding, they can use the INPLACE algorithm or FORCE keyword, like described in FR2. There is no any change to the LOCK for this worklog. We can always specify NONE/SHARED/EXCLUSIVE for the LOCK. 2.2 Row format supported ------------------------ The row formats of DYNAMIC/COMPACT/REDUNDANT would be supported in this worklog. COMPRESSED is no need to supported. 3. Algorithm ============ To make an ADD COLUMN metadata change only, it is necessary to mark the record accordingly, so that later scan can know which record is fit to which table definition. One solution is to give every record or page a version number, meaning that the record or records in a page are defined by table structure version X, so to parse a record, a proper table structure can be picked accordingly. But this is more complex and lots of work need to be done. A simpler solution is how this worklog works. Instead of giving record/page a version, a bit in record would be set if there was any instant ADD COLUMN happened before. With this bit set, it's possible to distinguish a record is inserted before any instant ADD COLUMN or not. At the meantime, it's necessary to remember the existing column number when first instant ADD COLUMN happens. Let's suppose there is a table with one row: CREATE TABLE t1 (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY, b INT); INSERT INTO t1 VALUES(0, 1); -- Row 1 When executing an instant ADD COLUMN: ALTER TABLE t1 ADD COLUMN c INT DEFAULT 10; The column numbers in old table t1 would be remembered, so the number is 2, which is invariable. Then another INSERT: INSERT INTO t1 VALUES(0, 2, 20); -- Row 2 When inserting row two, one bit should be set to identify that it's a row after any instant ADD COLUMN. The bit in row one is not set. So when parsing row 1, only the first 2(as remembered) columns would be used. If row 2, all columns would be used. To make it possible to parse a row after another new instant ADD COLUMN, a variable-length number of columns information should be added to the physical record if the record has the instant mark set. So in row 2, it's also necessary to remember current fields number, it's 3 now. Let's say there is another ALTER TABLE and a new INSERT: ALTER TABLE t1 ADD COLUMN d INT; INSERT INTO t1 VALUES(0, 3, 20, 10); -- Row 3 In row 3, the number of current fields would be remembered, it's 4 now. So to parse row 2, 3 columns are used, while to parse row 3, 4 columns are used. Default value of the instantly added columns should be remembered too. So in above case, for row 1, default values of column c and d would be filled in for it; for row 2, default value of column d would be filled in; finally, for row 3, no need for default values. Since there is only one bit to indicate the column change, so the order of columns are assumed to be not changed after instant ADD COLUMN. For short, the newly instantly added columns can only appended at last of the table. Also, because there is no version concept here, there is no strong requirement to convert the old record format to the new one either in background or by some manual command. So physical records would always be left as is. Only when there is a table rebuild operation, the old table would be destroyed and new table would be created, then all records in the new table are according to the latest table structure. And in this way, the number of current fields information don't need to be stored in record. So, in a word, instant ADD COLUMN can be applied to a table repeatedly, and records in the table may have different number of fields stored. Once there is a table rebuild, all the record format would be in a unified and latest format. 4. Row format compatibility =========================== There are both old style row format and new style row format in use. 4.1 New style row format ------------------------ New style row format includes DYNAMIC and COMPACT format. Different rows can be of different length, depending on the nullable fields and variable-length fields. Current physical record consists of a record header and record data fields. The detailed record header which is in variable-length looks like: +--------------------------------+----------------+---------------+ | Non-null variable-length array | SQL-null flags | Extra 5 bytes | +--------------------------------+----------------+---------------+ Then the detailed 5 extra bytes look like: +-----------+---------------+----------+-------------+-----------------+ | Info bits | Records owned | Heap No. | Record type | Next record ptr | +-----------+---------------+----------+-------------+-----------------+ The lengths of above information are: a) Info bits: 4 bits b) Record owned: 4 bits c) Heap No. : 13 bits d) Record type: 3 bits e) Next record ptr: 2 bytes So the total length is 5 bytes. 4.2 How to mark instant bit? ---------------------------- Let's first consider the new style row format. As it is known, there are un-used bits in both 'Info bits' and 'Record type'. Two bits in 'Info bits' can be used further while 4 kinds of values can be used in 'Record type'. So one bit in 'Info bits' would be used to indicate the instant mark. Also, to remember current field numbers, a variable-length number of fields would be added between the SQL-null flags and Extra 5 bytes. So the record header now becomes +--------------------------+----------------+---------------+---------------+ | Non-null variable-length | | | | | array | SQL-null flags | fields number | Extra 5 bytes | +--------------------------+----------------+---------------+---------------+ 4.3 Old style row format ------------------------ Only REDUNDANT row format needs to be considered here. And since there is already firlds number in the physical record, there should be no physical format change for redundant row format. It's possible to parse the record according to the fields number. 5. Integration with DD ====================== What should be remembered in DD are: a) Column numbers before first instant ADD COLUMN b) Default values for instant added columns Please note that if the default value gets changed later, it's still a must to remember the original default value, because records without this field should be filled in with the original default value. It's obviously that a) should be stored in dd::Table::se_private_data, as an integer. Furthermore, if this is a partitioned table, a) for different partitions may differ, so this should be also stored in dd::Partition::se_private_data. About b), considering the new DD, the best place to store it is dd::Column::se_private_data. The benefit is that the default value becomes an attribute of the dd::Table, also every dd::Column can manage its default value separately. These values would be remembered during ALTER TABLE, and loaded into memory during opening table. 6. DMLs ======= 6.1 INSERT ---------- As mentioned above, INSERT after one instant ADD COLUMN would affect the record format, the number of fields currently would be remembered in the record. If no instant ADD COLUMN, insert doesn't care about the number of current fields. 6.2 DELETE ---------- There is no any change. 6.3 UPDATE ---------- As long as the instantly added column gets changed, it must not use the inplace update, instead, it should delete and re-insert the new record. Furthermore, to handle a rollback problem easily(see next section), it has to check if the trailing fields of the updated record can be ignored or not. If the updated record has the same default values at last, the default values won't be stored in the row. 6.4 SELECT ---------- Default values of instantly added columns should be read and filled into the result whenever a record is read 7. Rollback =========== Because some fields could be not stored in the physical record, update to this record may result in the row too big problem if update is done in the old way, expanding all default fields in the new record. One example is: a) create table t2( id int primary key, c1 varchar(4000), c2 varchar(4000), c3 varchar(1000)) engine=innodb row_format=compact; b) insert into t2 values(1, repeat('a', 4000), repeat('b', 4000), repeat('c', 1)); c) alter table t2 add column d1 varchar(500) not null default repeat('d', 500); d) begin; e1) update t2 set c1 = repeat('x', 200) where id = 1; or e2) update t2 set c4 = 'x' where id = 1; f) rollback; In this example, original row doesn't have any field stored externally, if it's instant ADD COLUMN, the row would not be changed at all, that is the default value of 500 chars are not in the record. Case e1): The update new row fit in the page too. However, when rollback, the original row now has to include the default value of 500 chars. Then the record is too long to fit in one page, some fields have to be stored externally. The worse is if it's REDUNDANT format with 4K page size, the expanded record would become too big to be inserted into the page, thus a rollback failure with crash. Case e2): The update new row also fit in the page. However, the rollback will now have to expand the record with 500 chars, which makes the row too big to be rolled back. So it needs to keep the default values as not stored in record when possible. That is in both cases, new record would not include 500 chars for c4 column, just mark it as default value in the record. Shall it just keep the default values as original record? No. In e2) case, the updated record doesn't have c4 as default value, if we don't check c4 when rollback, it's impossible to roll back it too. So all possible default values(after first INSTANT ADD COLUMN) should be checked. Why this doesn't happen before instant ADD COLUMN? Because the fields of the row have already stored externally after the ADD COLUMN. During rollback, the existing externally stored value can be used to represent the old value before update, thus, no new external fields are necessary. 8. Recovery =========== For redo recovery, InnoDB will parse the record logged, so it's a must to know the instant ADD COLUMN information. The only needed information is that if the table is after one instant ADD COLUMN and the column number before its first instant ADD COLUMN. So basically when logging the index information, the number of columns before first instant ADD COLUMN would be logged to indicate both above information for new style records. If it's old style, no extra logs are necessary. 9. Replication ============== As long as the DMLs can work correctly, there should be no impact on replication at all. 10. IMPORT/EXPORT ================ Since the related information as mentioned in 5. are already all stored in dd::Table, dd::Partition and dd::Column(mysql.tables, mysql.partition_tables and mysql.columns), so it would be out of box for the table to work after exporting/importing with SDI. However, this has to wait for WL#9761 for SDI EXPORT/IMPORT to work. So in this worklog, it's necessary to write the DD related information into the .cfg file, and read it during IMPORT. The serialization can be done straightly by writing the metadata out along with other table/column metadata. If the .cfg file is missing, there is no way to know the number of fields on first instant add column, so in most cases, it's impossible to use the IBD file. To handle the .cfg missing problem, after IMPORT, a check would be done on clustered index, to verify all the physical records. A table with instant columns would be always reported having corrupted records in clustered index. So in this case, the IMPORT would fail. This applies to generic IMPORT too, it's no need to import a corrupted table. Once WL#9761 comes in, .cfg missing problem would not happen so there would be no this issue. 11. Displayed in I_S.innodb* =========================== It would be nice to provide the instant ADD COLUMN information to users through I_S views. Since there are two types of DD metadata related, one is in dd::Table::se_private_data and dd::Partition::se_private_data, others are in dd::Column::se_private_data, it's naturally to show these metadata in I_S.innodb_tables and I_S.innodb_columns accordingly. So the number of columns before first instant ADD COLUMN would be displayed in I_S.innodb_tables for every table, and every default value of columns would be displayed in I_S.innodb_columns. The default value would only be displayed in a binary format which is used in InnoDB internally. 12. Side effects ================ a) Since ADD COLUMN may be done instantly, so we may not expect an instant ADD COLUMN will fix a corrupted index b) Since the INSTANT ADD COLUMN would not copy a new table from old one row by row, there is no chance for it to check if the rows from old table along with new added columns would become too big or not. So after INSTANT ADD COLUMN, some rows may in fact too big for the table, especially for REDUNDANT format, it would be detected when the record gets updated.
1. In-memory table objects ========================== To remember the instant status of a table, new member variables were introduced: In dict_table_t: + /** Number of non-virtual columns before first instant ADD COLUMN, + including the system columns like n_cols. */ + unsigned n_instant_cols:10; In dict_index_t: + unsigned n_instant_nullable:10; + /*!< number of nullable fields before first + instant ADD COLUMN applied to this table. + This is valid only when is_instant() is true */ A new structure to remember the default value in memory: +/** Data structure for default value of a column in a table */ +struct dict_col_default_t { + /** Pointer to the column itself */ + dict_col_t* col; + /** Default value in bytes */ + byte* value; + /** Length of default value */ + size_t len; +}; Also a default value pointer in dict_col_t, if this column was instantly added. + /** Default value when this column was added instantly. + If this is not a instantly added column then this is nullptr. */ + dict_col_default_t* instant_default; There are also some related functions to get instant information etc. from table, indexes and columns. 2. Records related ================== Remember that only the record on clustered index could be affected by this worklog, secondary index records don't. 2.1 Instant bit and default value --------------------------------- To set a bit in info bits to indicate this is a record after instant ADD COLUMN, this bit is introduced: +/* The 0x40UL can also be used in the future */ +/* The instant ADD COLUMN flag. When it is set to 1, it means this record +was inserted/updated after an instant ADD COLUMN. */ +#define REC_INFO_INSTANT_FLAG 0x80UL To indicate a field has a default value rather than an inlined value in the record, this bit is introduced for the offsets array: +/* Default value flag in offsets returned by rec_get_offsets() */ +#define REC_OFFS_DEFAULT ((ulint) 1 << 29) All instantly add columns will have this bit set in offsets array. If a record resides on a clustered index whose table has undergone an instant ADD COLUMN, then it should be parsed specially. So there should be a check to see if this index/table is instant affected, if so, those instantly added columns set with REC_OFFS_DEFAULT will ask for default values remembered in dict_col_t::instant_default. 2.2 Parse --------- Since the physical record may have less fields than the ones defined on clustered index, if the offsets array is initialized according to the physical record only, then the offsets fields will also mismatch with fields on clustered index. To make logic and coding simple, the offsets array is always created according to the real number of fields on index. There would be two different physical record format with three meanings, please refer to rec_init_null_and_len_comp() to know how to handle all these cases. In a word, the checking looks like: if the index is not instant /* This is a complete record without instant ADD COLUMN, parse it as is */ else if the record has instant bit set /* This is a record inserted after an instant ADD COLUMN, so it should be able to know the number of fields by parsing the length info in the record */ else /* This is a record inserted before one instant ADD COLUMN, the number of fields is remembered in the table metadata */ 3. Identify an instant ADD COLUMN ================================= Since the keyword INSTANT would be introduced for current instant ADD COLUMN, in ha_innodb::check_if_supported_inplace_alter() and ha_innopart::check_if_supported_inplace_alter(), if we know the ALGORITHM is INSTANT, all relevant tables would be checked if instant ADD COLUMN is applicable, like it's not a table with fulltext index, it's not in COMPRESSED format etc. Once all are fine, a flag would be set in above variable, to indicate this is instant ADD COLUMN, and no rebuild later. 4. Metadata in DD ================= Basically there are four new SE private data introduced: a) dd::Table::se_private_data::instant_col, to indicate how many columns exist before first instant ADD COLUMN in table level b) dd::Partition::se_private_data::instant_col, to indicate similarly to a), however, different partitions may have different numbers, as long as all are bigger than or equal to the one in table level c) dd::Column::se_private_data::default_null, to indicate the default value is NULL d) dd::Column::se_private_data::default, to indicate the default value if it's not NULL All are easy to understand, except b). Let's say there is a partitioned table with two partitions, and also an instant ADD COLUMN has happened just now. If partition two is truncated, the new partition doesn't have to keep the instant ADD COLUMN information and make the records in it more complicated. It should only works like a fresh new table/partition with all complete new records in the table. This is same to other partition related operations which will create new partitions. So a) and b) are both remembered, b) is per partition. And b) should be either nothing, or always bigger than a) Since b) should be always bigger than a) if exists, so to open a partition table, it can only load default values for last b) columns, instead of a) columns. When the whole table is not instant, all a), b), c) and d) would be cleared. If only some partitions are not instant, relevant b) would be cleared with others left. 5. I_S display ============== There would be one more columns in I_S.innodb_tables called INSTANT_COLS, to remember the number of columns when the first instant ADD COLUMN happened on this table. For partitioned table, this number may differ for each partition. There would be two more columns in I_S.innodb_columns called HAS_DEFAULT and DEFAULT_VALUE. HAS_DEFAULT = 1 means that this is a instant column with default value. DEFAULT_VALUE is only valid when HAS_DEFAULT is 1. It shows the internal binary for the default value. If necessary, we may also try to display the original default value, but this requires more work in translating the values. 6. EXPORT/IMPORT ================ For non-partitioned table, basically the tablespace can be imported to replace the existing one. For partitioned table, every partition may have different instant columns recorded, so before one partition tablespace can be imported, it's a must to check if the default values remembered in the .cfg file match the default values in the table existing in the running server. If not match, saying different default values for one instant column, the tablespace can't be imported. Once a tablespace gets imported, its default values should be imported too.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.