WL#8149: B-tree Index Support on non-materialized virtual columns
Status: Complete
This worklog implements creating secondary indexes on non-materialized virtual columns (let's simplify it as "virtual index") as well as using such indexes for fast computed-value retrieving and searching. The non-materialized virtual column support is described in WL#8114 and WL#411, which design the virtual column in such way that they are not present in actual InnoDB index records, but their metadata is registered with InnoDB system tables and metadata caches. Virtual column provides flexibility and space saving on the table, more importantly, adding/droping such columns does not require table rebuild. Making it a much better choice for storing and processing non-relational data such as JSON. However, since they are not materialized, the scan/search could be very slow. With this worklog, the virtual column value will be materialized in the secondary index, making it much easier for value search and processing. This worklog implements following major functionalities: 1) Create index on non-materialized virtual columns. User can create secondary index on one or more virtual columns. And since this is a secondary index, and also the columns are not materialized, adding/dropping columns and creating index does not require a table rebuild, which is way better than the materialized virtual column. (WL#8114 covers ADD/DROP COLUMN for virtual columns.) 2) User is able to search qualified rows using such "virtual indexes", both in terms of non-covered and covered scan. When searching such an index, no computation is needed, because the value is materialized in the index record and in undo log records. Also this should support all MVCC look ups by following up cluster index and its "earlier versions". 3) Such "virtual index" gets updated correctly for DMLs, as well as any transactional commits and rollbacks. 4) Such "virtual index" maintains its coherence during all crash recovery with sufficient operational undo and redo logging. 5) Such "virtual index" should support all isolation level searches and locking like using any other "normal" index 6) Such "virtual index" should be able to work with large BLOB/TEXT data in their "base" column. The index key size is goes with their original limit (767 bytes for Antelope, 3072 for Barracuda format) In summary, the "virtual index" is an essential means to efficiently use virtual columns, without levy heavy computation and table scan for value searching and other DML actions.
1. Can create/drop secondary index on one or more virtual columns 2. Primary index cannot contain any virtual columns 3. Does not support create index on a mixture of virtual columns and non-virtual columns. 4. Support covered and non-covered scan with such virtual index 5. There is no locking change in this feature. All isolation level search should be supported. So does fetching correct value with MVCC 6. When there are DMLs on the table, the index gets update correctly 7. Support computed column whose base columns are large BLOB, externally stored object 8. Support prefix index on large computed column (can be BLOB). 9. Support run time rollback on DMLs with virtual indexes. Virtual column values are undo and redo logged 10. Support crash recovery on DMLs with virtual indexes 11. Support check table on virtual indexes 12. Support null base column or null virtual column and index on them. 13. Support unique virtual indexes 14. Purge on virtual index should work as expected 15. Does not support spatial index or Fulltext index on Virtual columns (first phase). This will be blocked in the server level for now. 16. Support change buffering on virtual column index 17. Truncate table should behave the same as it is with normal table/index. No change is needed. 18. Virtual index does not qualify to be the index for foreign key. This should be re-evaluated when we support indexing on mixed virtual/non-virtual index. 19. There is no index row format change. The (virtual) index row looks exactly same as those index on normal columns. So does the undo and redo log format. 20. In this version, we will not support index on virtual column whose base column could be a foreign constraint cascading update/delete candidate,that is, anything that with "ON DELETE or ON UPDATE" with "CASCADE | SET NULL" clause This is because the foreign key cascading update is currently performed in InnoDB without server action, which is different from normal update operation.
WL#411: Generated columns
WL#8114: Don't store virtual generated columns in database
WL#8227: Support SEs to create index on virtual generated columns
WL#8114: Don't store virtual generated columns in database
WL#8227: Support SEs to create index on virtual generated columns
For the special index on non-materialized virtual columns, there are following properties associated with it: 1. The index will be secondary index only (virtual column cannot be part of primary key anyway). And create index syntax remains unchanged. 2. The index can only index on virtual columns, not a mixture of virtual and normal columns in this worklog. This feature will be addressed in a future WL (Remove limitations on indexing virtual columns). 3. Support unique index 4. Fulltext and Spatial index will not be in this worklog's scope. It will be addressed in a future WL(Remove limitations on indexing virtual columns). 5. As described in WL#8114, metadata/system table, the index metadata will need to record not only the virtual column, but also its "base" columns (used to compute the virtual columns). The direct impact on this worklog is that if there are any updates on the "base" columns, the indexed virtual column(s) will need to be updated as well. 6. Metadata change a) The index metadata looks the same as normal index, except the SYS_INDEXES.TYPE has the DICT_VIRTUAL bit. b) A new dict_table_t member, v_col_names is added. It is needed when we create index and match up the indexed field 7. For Index Search, it will behave the same as normal index search. It also supports MVCC search. In the case that we return early version virtual column values, they are constructed from undo logged column info. 8. For DMLs, for all insert, delete and updates, the computed value needs to be "materialized" and updated in the index, and logged in the undo log. We need to emphasize the importance of the function to be deterministic. Otherwise, the index can be easily corrupted. The determinism needs to be established in server 9. Logging: as discussed, if there is index on the virtual columns, the operation on the columns will be logged (in the redo/undo log). The undo logged value will be used for Purging, run time rollback, MVCC, implicit locking, and crash recovery. 10. Create index: will not support LOCK=NONE in the first phase, to keep everything simple. This will be addressed in a future WL (Support online create index on Computed/Generated Columns). But it can be created using the WL#5526 ALGORITHM=INPLACE. Index value will be computed at the time of the clustered index scan, and along with Primary key, passed to the sorter. It can still use the bulk copy interface (WL#7277). 11. The same limits on maximum index record length apply to both virtual and normal indexes. If the computed column is a very large row, the virtual index can be defined on a prefix of the computed column, just like we support column prefixes in normal indexes. 12. The callback function to obtain the virtual value could be called by purge threads which are InnoDB specific background threads (Please see detail in LLD).
InnoDB related change: 1) Base column metadata: Please refer to WL#8114 on how we register the base columns' info for virtual columns. s 2) Template building: In order to calculate the Virtual Column value, we need to construct a filled MySQL row (with base column value) sending to server in a callback function. So we will need to know the row mapping info between MySQL and InnoDB. To support that, we added a virtual column template attached to the table share structure, and accessible by the dict_table_t structure. /* Structure defines template related to virtual columns and their base columns */ struct innodb_col_templ_t { ulint n_col; /* num of regular col */ ulint n_v_col; /* num of virtual col */ mysql_row_templ_t** vtempl; /* array of templates for virtual col and their base col */ char db_name[MAX_DATABASE_NAME_LEN]; /* table's database name */ char tb_name[MAX_TABLE_NAME_LEN]; /* table name */ ulint rec_len; /* MySQL record length */ const byte* default_rec; /* default rec if it is NULL */ }; The db_name and tb_name is needed for the callback function supplied by Optimizer to the InnoDB purge subsystem (see section 12 below). /** InnoDB table share */ typedef struct st_innobase_share { ...... innodb_col_templ_t s_templ; /*!< table virtual column template info */ } INNOBASE_SHARE; Following 2 functions are called when the table is opened first time to create above templates: innobase_build_v_templ innobase_vcol_build_templ The main consumer of this structure is the callback function to compute the virtual column value, which is described next. 3) A new function (innobase_get_computed_value) is created to call the callback function (handler::my_eval_gcolumn_expr) provided by server to compute the virtual values. This function will use above template info to create a MySQL row (TABLE::record[0]) with base column values filled and send to server, and server will fill in the computed column data, which then send back to us. We shall parse out the virtual column value then. 4) index type of DICT_VIRTUAL Currently, we only support index on either all virtual columns or all normal columns. We do not support indexing on mixed columns. For those index on virtual columns, DICT_VIRTUAL is marked on index->type. Once this limitation is removed, this bit will be used to indicate any index contains one or more virtual fields. 6) For internal structure dtuple_t, we will also need to add additional virtual fields, so that virtual values can be filled (for example, carry data from undolog): struct dtuple_t { .... + ulint n_v_fields; /*!< number of virtual fields */ + dfield_t* v_fields; /*!< fields on virtual column */ .... } Such dtuples are created with additional virtual column info supplied to dtuple_create_with_vcol(). 7) Select In WL#8114, if there is no virtual index. InnoDB does not need to process any virtual columns. However, in this worklog, there are a couple of case that InnoDB fills in the virtual column values: a) it is a covered scan on virtual index and no need to access the cluster index (this is the case that in row_sel_store_mysql_rec the index parameter passed in is the same virtual index in prebuilt) b) if cluster index needs to be consulted (due to MVCC etc.), and if the prebuilt->read_just_key is set, then the virtual column values can be copied from the undo log records related to earlier versions of the clustered index record. This is done due to the fact that server is expecting only virtual column is returned (and no base columns returned to allow them to do a computation). In all other cases, the virtual column will be filled by server layer. 8) Undo Logging Insert: If there are any virtual columns that are part of index (col->m_col.ord_part), then it is logged along with any other indexed columns. If it is a large column, maximum DICT_MAX_FIELD_LEN_BY_FORMAT bytes are logged, which is sufficient for building an index field. The undo log is generated by trx_undo_report_insert_virtual(). Update and delete: In addition to log the column if it is a ord_part, the update also log the update vector. So that during rollback or crash recovery, the update vector can be reconstructed with asking for the callbacks. In all those case, the column pos is added REC_MAX_N_FIELDS(1023) to signify this is a virtual column. 9) for insertion, the row is already computed by MySQL, so just need to parse out the input virtual column using row_mysql_convert_row_to_innobase() called by row_get_prebuilt_insert_row(). The change is logged in trx_undo_page_report_insert() for undo/recovery purpose. 10) for modification/update, the new row (including the virtual columns) are supplied by MySQL. So the update vector can be generated for virtual columns can be calculated in calc_row_difference() along with other columns. Then such update vector on virtual columns are logged in trx_undo_page_report_modify(). To avoid calculation during crash recovery, its before and after image are both logged. In addition, any indexed virtual columns are also logged after the update vector, for purging purpose, as well some of them could be non-changing part of a index, so need to be restored during rollback/crash recovery too. 11) for delete, it is logged similarly in trx_undo_page_report_modify(), except there is no update vector needs to be logged. It will log whatever indexed columns. So this can be used for purge threads to build the index entries to purge the record. 12) A special case for purge. As mentioned above, for deleted row, all of its indexed columns will be logged. So to sufficient to build the index fields. However, there is a verification function row_vers_old_has_index_entry() that does the comparison of fields in purge records with that of current cluster index. In this case, the "current" virtual columns might need to be computed from the "current" cluster index (also_curr). Unfortunately, in such case, the table might not be opened. In addition, there currently is no MySQL THD created for InnoDB purge threads. So it cannot simply invoke the callback function to compute the virtual value. This issue is being addressed by Optimizer and runtime. We will try to reduce the need for callback in such case. So only in certain "infrequent" scenario, this is needed. To achieve this, we will need to do some "additional" logging in the undo log for updates. The detail information is as following: 1. The need for comparing to current clustered index record (PR) in row_vers_old_has_index_entry() is only needed by purge, when purging a secondary index key (in this case, the virtual index key). 2. Such need comes only when the Primary Key pointed by secondary index key is not deleted. So if it is a simple delete (without any following "insert by modify" that undo the delete-marking of the rows), then this comparison (thus callback) is not needed. Or if it is update by delete + reinsert, the row could also be marked as delete, and comparison is not needed. The more frequent scenario is when there is update by modify (which update the PR key directly on the row) or delete and later insert by modify (undo delete marking on PR), then it creates possible scenario that multiple secondary keys (all delete marked except one) point to a single undelete-marked Primary Key. In such case, when we purging those delete marked secondary keys, the comparison is needed, thus the callback. 3. We could mitigate the need for callback even in late case described above. The key is being primary key has a linked list of its old history (or specifically, the roll_ptr points to the undo log records). Theoretically, we could log additional information in undo log to help backtrack the current virtual column value in PR. To do so, we will log no only the "before image" of the updated columns (for undo) in undo log, but also the "after image" (the updated value) for those columns in undo log. With such "after image", we could in some way to backtrack the current virtual column image if those image are available. 4. However, we cannot completely eliminate the need for callback. Because above undo log chain would not be always available. The secondary index could point to a "brand new" Primary key without any history (when trx_undo_roll_ptr_is_insert(roll_ptr) is true). This could happen in following scenario: 1. CREATE TABLE t(a CHAR(1) PRIMARY KEY, b INT, INDEX(b)); 2. INSERT INTO t VALUES ('a',1); COMMIT; -- Clustered index: ('a',1) with insert_undo, no history -- Secondary index: (1,'a') 3. DELETE FROM t; COMMIT; -- Clustered index: delete-marked ('a',1) with update_undo history -- Secondary index: delete-marked (1,'a') 4. INSERT INTO t VALUES ('a',2); COMMIT; -- Clustered index: ('a',2) that was updated from delete-marked ('a',1) -- Secondary index: (2,'a') and delete-marked (1,'a') [until it is purged] 5. DELETE FROM t; COMMIT; -- Clustered index: delete-marked ('a',2) -- Secondary index: delete-marked (2,'a') and delete-marked (1,'a') 6. Assuming that there are no other transactions in the system, the purge view (purge_sys->view) is free to advance to the current maximum transaction. Purge workers are free to process the records from step 3 and 5. But, there is no ordering guarantee! Purge can process the second delete (step 5) first, leading to: -- Clustered index: [empty] -- Secondary index: delete-marked (1,'a') 7. INSERT INTO t values('a', 3); -- Clustered index: ('a',3) with insert_undo, no history -- Secondary index: (3,'a') and delete-marked (1,'a') 8. Purge processes the first delete (step 3). We find the delete-marked (1,'a') in the secondary index. We can also find the non-delete-marked ('a',3) in the clustered index, but there is no undo history. If the secondary index were built on a virtual column (the value b=1 or b=3 is not present in the clustered index record), we would have to evaluate its value to determine whether the record (1,'a') can be safely purged from the secondary index. You can see above scenario is due to there could be multiple delete marked sec keys points to single PK. And it is possible that purge could clean up the PK with one of those purge. And if a new PK with same primary key value comes along, those pending purge sec key would again point to this new PK, who has no history for you to backtrack the VC value. So essentially, the callback is still needed, but might in a infrequent use scenario. In the above scenario, the history that would be needed at step 8 is discarded at step 6 when the undo records are purged out of order. We believe that it is not feasible to try to add synchronization between purge worker threads. Another case where history can go missing in a similar way is on DELETE;COMMIT;INSERT;ROLLBACK;INSERT;COMMIT;purge when the first INSERT replaced a purgeable record. In this case, row_undo_mod_remove_clust_low() would remove the clustered index record when executing the ROLLBACK, and the purge of the DELETE operation would only see the result of the new INSERT. It would have to invoke the callback to determine whether the secondary index record matches the current clustered index record. 13) Building an earlier version of virtual index field As discussed, the old version of virtual columns are logged in undo log during DML, so if an earlier version of the row is needed, trx_undo_prev_version_build() is called with a new parameter that fills a dtuple with the virtual column information (note, it cannot be built into rec_t since the clustered index record does not contain virtual columns). Then this virtual dtuple_t's info is plugged into a row from row_build() before calling row_build_index_entry() to build virtual index fields. 14) A new flag ROW_BUILD_NORMAL_W_VCOL is introduced for row_build_index_entry_low() in the case if we need to build the virtual column by calling the innobase_get_computed_value() callback. This happens in situation such as build a secondary index for virtual index. Another flag ROW_BUILD_FOR_INSERT is introduced to indicate that we are building a cluster index record for insertion. Note we do not support virtual column in PK. So the only case we will build virtual columns for cluster index is during insert, where we will need to write the virtual column values to the undo log. 15) The implicit lock (row_vers_impl_x_locked_low) should also work with slight modification as well. When look for earlier version of the cluster index, trx_undo_prev_version_build() will be called to fetch the virtual column values from the undo log, then build the virtual index entry corresponding to the clustered index record for comparison with the record no the virtual index page.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.