WL#8114: Don't store virtual generated columns in database
Status: Complete
Currently virtual generated columns still occupy space as if they were stored. Effectively server is responsible to calculate the value, but storage engine stores trash in place of those fields. With this WL, virtual columns will no longer take any space in actual data rows. Thus making such column much more flexible (add/drop does not need to rebuild table) and space efficient (do not take space in actual data row any more). For example, if we have a table with 5 regular columns and 2 virtual columns. On disk, the data record (cluster index record) will only have those 5 regular columns (along with any necessary hidden system columns), and those 2 virtual columns are never stored in those data rows. The virtual column will still be represented in the InnoDB metadata. It is treated as a column with prtype includes DATA_VIRTUAL bit. When adding a virtual column, the column will register to Sys_columns as if for any other columns. And the number of virtual column is also encoded in the N_COLS field of Sys_tables. In addition, a new system table, SYS_VIRTUAL, is added to register the base column information for each virtual columns. For corresponding in-memory metadata structure, the virtual column info is separated from that of regular columns. The regular column will still hang off the dict_table_t::cols, while the new virtual columns will be hang off a new list with dict_table_t::v_cols. In this way, the repercussion of new virtual columns on existing access methods is limited. A new virtual column structure (dict_v_col_t) is created, which has not only regular column information, but also contains information from SYS_VIRTUAL, which records the "base columns" information for the virtual column. When we fill the result through the row_prebuilt_t::mysql_template[], the virtual column could be skipped, since they do not exist in our data record. So, the optimizer will fill in the data in TABLE::record Since the virtual column is never stored in real rows, so alter table add or drop virtual columns are reduced to a few system table modification, and it is mostly effective instantly. So in summary, with this worklog, the virtual column data will never stored in data rows. But its metadata is recorded in various system tables, to show its presence. And add/drop such columns are instant without table rebuild
1) Virtual columns are represented correctly in system tables, mainly SYS_COLUMNS. 2) Virtual columns' "base columns" (columns needed to generate virtual columns) are represented correctly in system table SYS_VIRTUAL. 3) In this release, we do not support virtual column based on virtual columns. The virtual column will be based only on regular materialized columns. 4) Alter table add/drop virtual columns will be an instant operation, does not require table rebuild. Since this is an instant operation, we will limit the alter table add/drop virtual columns as a single operation itself, and not to mixed with other DDL operations in inplace alter interface, so to reduce the complexity of testing. This restriction will be removed in another WL (Combine alter add/drop virtual columns with other inplace alter operations). 5) Virtual columns are not stored on disk rows, so the size of table is as if there is no such column. 6) Metadata can be loaded successfully after server crash or reboot 7) DMLs runs correctly on the table with virtual column 8) Base columns can be very large BLOB, so does the virtual columns. 9) Base columns can be NULLable, so does the virtual columns. 10) Virtual column can't be used as an AUTO_INCREMENT column. 11) Virtual column can't be used in FOREIGN KEY constraints. 12) Virtual column cannot be indexed (until WL#8149) 13) NO lock changes in this worklog
This worklog would involve change in following areas: 1) Metadata change: The column will be registered with System tables (Sys_columns, to be specific). It will look as a normal column in Sys_columns other than it will be marked as "virtual column" in its PRTYPE (DATA_VIRTUAL), in addition to its inherited type properties (char, int etc.). We will not put additional limit on number of virtual columns can be added, but it is under the overall limitation of number of columns can be supported for a table, which is 1017 (http://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html): #define REC_MAX_N_FIELDS (1024 - 1) #define REC_MAX_N_USER_FIELDS (REC_MAX_N_FIELDS - DATA_N_SYS_COLS * 2) As mentioned in HLD, for in memory representation, we will track virtual column info separately with that of regular columns. So that the existing system continues to function as if those virtual columns are not present. 2) In Row Storage: To an InnoDB row, these virtual columns are "non-present". So for in-row record, only regular materialized columns are present, and virtual columns are not visible as if they do not exist. This is achieved by completely separate the virtual and non-virtual column in InnoDB in memory metadata (notablely, dict_table_t and dict_col_t). An alternative approach would be stored it similar as a null variable length column in the row. This will not incur additional space in the data part of the row, but maybe have 1 or 2 bytes in the row offset. So the extra cost of having it would be minimum, but we also preserve the column identity in the row. But this still have extra cost, and it cannot support instant alter table add / drop column. So this approach is far less ideal than the current one. Since the non-materialized column does not store any value in row, so it cannot be part of primary key and its presence will not make any difference in terms of sort order of primary index. 3) Logging: Without indexing, the virtual column value will never comes to actual rows, and storage engine / data row has no record of such values. So there is no need for any logging. 4) Search and MVCC: Since the value is not stored in InnoDB, all the virtual column value will be computed with the "base" columns. And the MVCC would be supported through values obtained from these "base" columns. 5) Check table will ignore virtual column without index 6) Alter table add/drop virtual column will not require table rebuild. They will be simple and instant update to system tables. Each operation will require a update of SYS_TABLES (for num cols), SYS_COLUMNS (adding, dropping cols) and SYS_VIRTUAL. In the first phase, we will support alter table add/drop virtual columns all by their own, not to combine with other table altering commands. 7) A new system table SYS_VIRTUAL will be added to track virtual columns' base columns. Its corresponding Information Schema table innodb_sys_virtual will be added to view the content of the system table. The detail column defines will be shown in LLD. But in general, SYS_VIRTUAL provides necessary information about virtual column mappings (which columns are needed to compute the value): CREATE TABLE `t` ( `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, `c` int(11) GENERATED ALWAYS AS (a+b) VIRTUAL, `h` varchar(10) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; mysql> select * from information_schema.innodb_sys_virtual; +----------+-------+----------+ | TABLE_ID | POS | BASE_POS | +----------+-------+----------+ | 38 | 65538 | 0 | | 38 | 65538 | 1 | +----------+-------+----------+ 2 rows in set (3.00 sec) Their corresponding SYS_COLUMNS view is following: mysql> select * from information_schema.innodb_sys_columns where TABLE_ID in (select TABLE_ID from information_schema.innodb_sys_tables where name like "test/t"); +----------+------+-------+-------+--------+-----+ | TABLE_ID | NAME | POS | MTYPE | PRTYPE | LEN | +----------+------+-------+-------+--------+-----+ | 38 | a | 0 | 6 | 1027 | 4 | | 38 | b | 1 | 6 | 1027 | 4 | | 38 | h | 2 | 1 | 524303 | 10 | | 38 | c | 65538 | 6 | 9219 | 4 | +----------+------+-------+-------+--------+-----+ 4 rows in set (0.01 sec) The POS fields of both tables are a large number is because it encodes 2 positions (the position in original MySQL table and its virtual column sequence number in InnoDB), detail can see LLD (5). In above case, column c is a virtual column that generated by normal column 0 (`a`) and column 1 (`b`) of the table. If there is a constant virtual column, it does not have any base column, so it will not enter into SYS_VIRTUAL: CREATE TABLE `t1` ( `a` int(11) DEFAULT NULL, `b` int(11) GENERATED ALWAYS AS (a+1) VIRTUAL, `c` int(11) GENERATED ALWAYS AS (5) VIRTUAL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; mysql> select * from information_schema.innodb_sys_virtual; +----------+-------+----------+ | TABLE_ID | POS | BASE_POS | +----------+-------+----------+ | 39 | 65537 | 0 | +----------+-------+----------+ 1 row in set (0.00 sec) But the constant column will be in SYS_COLUMNS: mysql> select * from information_schema.innodb_sys_columns where TABLE_ID in (select TABLE_ID from information_schema.innodb_sys_tables where name like "test/t1"); +----------+------+--------+-------+--------+-----+ | TABLE_ID | NAME | POS | MTYPE | PRTYPE | LEN | +----------+------+--------+-------+--------+-----+ | 39 | a | 0 | 6 | 1027 | 4 | | 39 | b | 65537 | 6 | 9219 | 4 | | 39 | c | 131074 | 6 | 9219 | 4 | +----------+------+--------+-------+--------+-----+ 3 rows in set (0.01 sec) You can see column `c` is also a virtual column, but it does not have any base columns (it is a constant). So it will have no entry in SYS_VIRTUAL but lives in SYS_COLUMNS.
As described, the main change to support non-materialized virtual columns is to add such column info in InnoDB system table and in memory structures. Following changes are made for this purpose: 1) A new virtual column structure is added: /** Data structure for a virtual column in a table */ struct dict_v_col_t{ dict_col_t m_col; /* column information */ dict_col_t** base_col; /* array of base column ptr, pointing to base columns */ ulint num_base; /* number of base column */ ulint v_pos; /* original column position */ }; The base column information is filled by function innodb_base_col_setup() when we create a table with virtual column or alter table add virtual column. 2) In dict_table_t, following members are added to track virtual columns: struct dict_table_t { /** Number of virtual columns defined so far. */ unsigned n_v_def:10; /** Number of virtual columns. */ unsigned n_v_cols:10; /** Array of virtual column descriptions. */ dict_v_col_t* v_cols; .... } 3) Following prtype is added: #define DATA_VIRTUAL 8192 /* Virtual column */ 4) The number of virtual column is embedded in Sys_tables' N_COLS column with following formula: static_cast(table->n_def) + (static_cast (table->n_v_def) << 16) We could do so because the max number of columns for MySQL table is 4096, and 1017 for InnoDB(http://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html). 5) A virtual column comes with 2 positions info: a) Its position in original (MySQL) table definition b) Its sequence number of virtual columns in InnoDB (the "nth" virtual column a) is represented by dict_v_col_t::m_col::ind b) is represented by dict_v_col_t::v_pos To make them persistent, they are both encoded in SYS_COLUMN's POS field, by following formula: (dict_v_col_t::v_pos + 1) << 16 + dict_v_col_t::m_col::ind We could do so again due to the number of columns in InnoDB is limited to 1017. Encoding and decoding functions are created. An example of innodb_sys_columns output would be following: CREATE TABLE `t` ( `a` int(11) DEFAULT NULL, `c` char(10) DEFAULT NULL, `d` char(20) DEFAULT NULL, `e` char(10) GENERATED ALWAYS AS (c) VIRTUAL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; mysql> select * from information_schema.innodb_sys_columns where TABLE_ID in (select TABLE_ID from information_schema.innodb_sys_tables where name like "test/t"); +----------+------+-------+-------+--------+-----+ | TABLE_ID | NAME | POS | MTYPE | PRTYPE | LEN | +----------+------+-------+-------+--------+-----+ | 35 | a | 0 | 6 | 1027 | 4 | | 35 | c | 1 | 2 | 524542 | 10 | | 35 | d | 2 | 2 | 524542 | 20 | | 35 | e | 65539 | 2 | 532734 | 10 | +----------+------+-------+-------+--------+-----+ 4 rows in set (0.02 sec) 6) A new system table SYS_VIRTUAL is added. This system will be created if it is not present when server boots up (see dict_create_or_check_sys_virtual()) . /* The columns in SYS_VIRTUAL */ enum dict_col_sys_virtual_enum { DICT_COL__SYS_VIRTUAL__TABLE_ID = 0, DICT_COL__SYS_VIRTUAL__POS = 1, DICT_COL__SYS_VIRTUAL__BASE_POS = 2, DICT_NUM_COLS__SYS_VIRTUAL = 3 }; /* The field numbers in the SYS_VIRTUAL clustered index */ enum dict_fld_sys_virtual_enum { DICT_FLD__SYS_VIRTUAL__TABLE_ID = 0, DICT_FLD__SYS_VIRTUAL__POS = 1, DICT_FLD__SYS_VIRTUAL__BASE_POS = 2, DICT_FLD__SYS_VIRTUAL__DB_TRX_ID = 3, DICT_FLD__SYS_VIRTUAL__DB_ROLL_PTR = 4, DICT_NUM_FIELDS__SYS_VIRTUAL = 5 }; Adding a virtual column will involve adding a row in SYS_COLUMNS table, a number of rows, one for each base column, into SYS_VIRTUAL column. And also update the SYS_TABLES to mark how many virtual columns in N_COLS field. Some example of SYS_VIRTUAL data is shown below: CREATE TABLE `t` ( `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, `c` int(11) GENERATED ALWAYS AS (a+b) VIRTUAL, `h` varchar(10) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; mysql> select * from information_schema.innodb_sys_virtual; 2 rows in set (0.00 sec) +----------+-------+----------+ | TABLE_ID | POS | BASE_POS | +----------+-------+----------+ | 36 | 65538 | 0 | | 36 | 65538 | 1 | +----------+-------+----------+ 2 rows in set (2.16 sec) The POS field matches that of corresponding row in SYS_COLUMNS: +----------+------+-------+-------+--------+-----+ | TABLE_ID | NAME | POS | MTYPE | PRTYPE | LEN | +----------+------+-------+-------+--------+-----+ | 36 | a | 0 | 6 | 1027 | 4 | | 36 | b | 1 | 6 | 1027 | 4 | | 36 | h | 2 | 1 | 524303 | 10 | | 36 | c | 65538 | 6 | 9219 | 4 | +----------+------+-------+-------+--------+-----+ 4 rows in set (0.00 sec) 65538 equals (0 + 1) << 16 + 2. The "(0+ 1) << 16" means it is the first virtual column. The "2" means it is the 3rd column in original MySQL table. If there is a constant virtual column, it does not have any base column, so it will not enter into SYS_VIRTUAL table: CREATE TABLE `t1` ( `a` int(11) DEFAULT NULL, `b` int(11) GENERATED ALWAYS AS (a+1) VIRTUAL, `c` int(11) GENERATED ALWAYS AS (5) VIRTUAL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; mysql> select * from information_schema.innodb_sys_virtual; +----------+-------+----------+ | TABLE_ID | POS | BASE_POS | +----------+-------+----------+ | 37 | 65537 | 0 | +----------+-------+----------+ 1 row in set (1.69 sec) But the constant virtual column is still represented in SYS_COLUMNS: mysql> select * from information_schema.innodb_sys_columns where TABLE_ID in (select TABLE_ID from information_schema.innodb_sys_tables where name like "test/t1"); +----------+------+--------+-------+--------+-----+ | TABLE_ID | NAME | POS | MTYPE | PRTYPE | LEN | +----------+------+--------+-------+--------+-----+ | 37 | a | 0 | 6 | 1027 | 4 | | 37 | b | 65537 | 6 | 9219 | 4 | | 37 | c | 131074 | 6 | 9219 | 4 | +----------+------+--------+-------+--------+-----+ 3 rows in set (0.01 sec) Following functions are added to enter a SYS_VIRTUAL row into the table: dict_build_v_col_def_step dict_create_sys_virtual_tuple and during server boot time, the "base column info" is loaded by calling dict_load_virtual_low() and dict_load_virtual_one_col()
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.