WL#8763: support multi-value functional index for InnoDB
Affects: Server-8.0
—
Status: Complete
This worklog is to support multi-value functional index through InnoDB. Since this is a functional index, and functional index in MySQL/InnoDB is implemented through "virtual columns", so the multi-value funaiotnal index is based on multi-value "virtual columns". Since virtual column is never materialized in the table Primary Key (or data), so it can be only materilaized by specific index, in this case the multi-value functional index. For example: if we try to index address or zip code (index(user_id, zip)) for a user described following: { user: John, user_id: 1, addr: [ {zip: 94582}, {zip: 94536} ], } Since he has multiple homes, it could generate multiple zip code out of a single document. we will support create index on ZIP code in this case. Another simple example is: mysql> create table t1(f1 json, f2 int as (cast(`f1` as unsigned array))); Query OK, 0 rows affected (4.16 sec) mysql> insert into t1(f1) values (cast('[1,3]' as json)); Query OK, 1 row affected, 2 warnings (0.04 sec) mysql> insert into t1(f1) values (cast('[1,3,9,8]' as json)); Query OK, 1 row affected, 2 warnings (0.05 sec) mysql> create index idx1 on t1(f2); Query OK, 0 rows affected (5.17 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select f2 from t1; +------+ | f2 | +------+ | 1 | | 1 | | 3 | | 3 | | 8 | | 9 | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 where f2 = 1; +--------------+------+ | f1 | f2 | +--------------+------+ | [1, 3] | 1 | | [1, 3, 9, 8] | 1 | +--------------+------+ 2 rows in set, 4 warnings (0.01 sec) As you can see in above, there could be multiple f2 values corresponding a single f1 values.
1. Allow create multi-value functional index along with create table. 2. Allow create multi-value functional index using "alter table add index" and "create index" clause 3. Only one multi-value column can be in a index 4. DML operation will generate correct number of values and correct values in index. Index is properly maintained. 5. Check table will recognize such index and will not report table corruption on mismatch number of index records and data records 6. Replication will properly replicate operations involving the multi-value index 7. Such index cannot be used to support Foreign Key constraint 8. Do not support multi-value prefix index 9. Index can only apply on limited data type: DECIMAL, INTEGER, DATETIME, VARCHAR/CHAR 10. Do not support online create multi-value index
Terminology =========== The words used in the design: 'Multi-value index' is short for 'multi-value functional index'. 'Multi-value column' means the column where multi-value index is built on. The source of this column is basically a normal stored column, the column itself should be a virtual column. General ======= Essentially, multi-value index is a secondary index, which requires a virtual column on the table, and a multi-value index built on this virtual column. The index itself is actually a virtual index. The difference from normal virtual column and virtual index is that it used to add virtual column and virtual index separately without any mandatory rule, however, to build a mutli-value index, it will automatically generate a virtual column on the data and then build the virtual index on this virtual column. As mentioned before, one virtual column entry for multi-value index may be mapped to several different entries on the multi-value index, or nothing mapped too(Due to no matching data found in the virtual column entry). But please note that there won't be duplicate entries on the multi-value index. That means even if there are several entries for the same virtual column with the same primary key, the multi-value column data should be different for all entries. Currently, only one multi-value column can be defined for one multi-value index. This multi-value column can be in any place of the multi-value index, which means that there could be other columns defined for the same multi-value index, and there is no restriction of column order. For example: CREATE TABLE t1 ( j JSON DEFAULT (CAST('["HelloWorld"]' AS JSON)), KEY mv_idx_binary ((CAST(j->'$[*]' AS BINARY(10) ARRAY)))); CREATE TABLE t2 ( j4 json , KEY mv_idx_binary ((( CAST(j4->'$[*]' AS BINARY(10) ARRAY))), (json_depth(j4)), (json_valid(j4))) USING BTREE); Here table t1 has one column 'j', and there is a multi-value index defined on the table, which will also create a multi-value column(virtual one) for '((CAST(j->'$[*]' AS BINARY(10) ARRAY)))'. Table also has a multi-value index which now includes not only one multi-value column, but also some other normal columns. No more than one multi-value column can be on the single multi-value index. Data types ========== The multi-value column data type could only be array of: 1. DECIMAL 2. SIGNED / UNSIGNED 3. DATETIME 4. CHAR(the underlying storage is actually same with VARCHAR) Capacity ======== To make server to double check the data validation earlier before diving into SE(InnoDB), it is useful that SE which supports multi-value index informs server how it supports the multi-value data. The two main important factors are the multi-value data length(or key length since it's on the index) and the number of maximum supported number of multi-value data(or the maximum size of the data array). Currently, the limits from InnoDB is somehow bounded by undo log size. Since the undo log of INSERT should be written in a single undo log page, so basically, above two factors are restricted by the maximum undo log page size. However, it's a not realistic to have a true estimation on the multi-value data length and maximum size of the data array before InnoDB knows the real data. Because the data length of other dynamic fields and each data array size are not known beforehand, so there is no exact estimation for each different row. One doable solution is to give a rough estimation on above two factors, which allows users to insert as many multi-value data as possible, and once it exceeds the undo log page limit on later insertion, a different row is too long error would be raised. Based on this, it should assume nearly the whole undo page is available for one multi-value data, except some hard-coded header content, and the maximum length of the array can be calculated against the multi-value field with the shortest data length in the table. Index creation ============== Multi-value index should be created either along with CREATE TABLE or by CREATE INDEX/ALTER TABLE. Both approaches should be atomic. Create multi-value index during table creation is easy, just create the index itself, no data operations. However there would be some differences in behavior to create multi-value index on a table with some data. First difference is that this kind of index creation will create not only the multi-value index, but also the virtual column which the index is based. Second one is to build the index, record from clustered index would be read, which may generate more than one multi-value index entries. This 1 to N mapping requires some special handling during the merge buffer creation for sorting index entries. And one merge sort buffer may not be able to accommodate multi- value index entries from single clustered index record. SELECT on Index =============== SELECT on multi-value index are supported as normal index. However, currently, the only way to leverage multi-value index to speed up scan is query with the 'MEMBER OF' key word. For example: CREATE TABLE t1 ( j4 json , KEY mv_idx_binary ((( CAST(j4->'$[*]' AS BINARY(10) ARRAY))), (json_depth(j4)), (json_valid(j4))) USING BTREE); INSERT INTO t1 VALUES ('["foobar"]'); SELECT * FROM t1 WHERE "foobar" MEMBER OF (j4->'$[*]'); Because of the character of multi-value index, e.g. entries for same clustered index record would scatter in the full index, multi-value index can not support two kinds of scans: 1. Range scan. It's currently not proper to do range search on the multi-value index. 2. Index only scan. The entries on multi-value index is kind of meaningless if clustered index record is not fetched. DMLs on Index ============= All DMLs on multi-value index should be done similar with normal index. Only difference is that there would be more than one insertion or update on the index for one clustered index record. There are actually three status of multi-value column data: 1. NULL 2. No value on index 3. Multiple entries on multi-value index 1 and 3 are easy to understand, 2 is specific for the case that the data array is empty or nothing matches, then there is no need to maintain the corresponding entry on multi-value index. One promise from server is that, regardless of the DML type, as long as the multi-value column data is parsed by server to form an data array for InnoDB to operate on, the values in the data array are already sorted and no duplicate values. For UPDATE, it is free to update data of any status to any different or same status. Please note that currently it's useless to build a NULL entry on multi- value index, since there is no proper SQL syntax to search this. However this would be supported in the future, so internally, InnoDB is still building NULL entry on multi-value index. Another important point for UPDATE is that if the UPDATE changes the data from above status 3 to a new value of status 3, then it is probably not all multi- value entries on the index need to be updated. Let's say the old value is '["Hello", "HelloWorld"]', and new value is '["HelloMySQL", "Hello"]'. Obviously, only the "HellowWorld" has to be updated to "HelloMySQL", the "Hello" doesn't require any update. So it's important to keep the entries which are not touched at all. Furthermore, even if it's the above case and above multi-value column is the only column on the multi-value index, once the primary key of it gets changed too, then all entries tied to this primary key have to be updated for the multi-value index. Finally, for all DMLs, it's necessary to handle the lock wait scenario like DMLs on normal indexes. Redo and undo log ================= There is no need to specially handle the redo log for multi-value column or multi-value index, all can be done in normal way, because changes to multi-value index is actually done one by one, and every change is similar to the one for normal index. It's necessary to write special undo log for the multi-value column data, since it is internally stored in a different way from normal secondary index record. Due to this reason, undo log parsing also needs to take care of multi-value column data. Rollback and purge ================== Basically, there is no special handling for both rollback and purge, they just parse the record information from undo logs, and apply them to multi-value indexes and other indexes. Only difference is still that for multi-value index, more than one entries may have to be checked and rolled back(or purged). Online DDL ========== Online creation of multi-value index is not supported, since it includes creating a virtual column and a virtual index, which has been prohibited from online operation since virtual column story began. However, online DDL should be done for a table with multi-value index(es). To support this, multi-value column data has to be remembered for online DDL logs, in the same format as they are remembered for undo log. Check table and page validation =============================== Since there are more than one entries on the multi-value index for one single entry on clustered index, so the expected 1 to 1 mapping between record from clustered index and secondary index is now broken for multi-value index. This is the special case for multi-value index, which should be screen out for CHECK TABLE. For page validation, if the entries from multi-value index are buffered into change buffer, there could be some cases that values are actually different, however the records comparison function for change buffer will think they are equal. For example, if two entries are "abc" and "abc ", they are thought to be different multi-value data, but change buffer will think they are equal. So this case should be filtered out and regarded as correct too.
1. Signify the capability of multi-value index HA_MULTI_VALUED_INDEX is added to m_int_table_flags to show InnoDB supports multi-value index, however, it does not support ordered retrieval and key read only scan: + /* Multi-valued keys don't support ordered retrieval, neither they're + suitable for keyread only retrieval. */ + if (table_share->key_info[key].flags & HA_MULTIVALUED_KEY) { + flags &= ~(HA_READ_ORDER | HA_KEYREAD_ONLY); + } 2. API to specify the multi-value capacity supported by InnoDB void ha_innobase::mv_key_capacity(uint *num_keys, size_t *keys_length) const; 3. Indexed field length The indexed table column is usually of JSON type, however, for the index, these can be integer type of char type. So when we build the template, we need to specifically get the actual field length: + if (templ->is_virtual && field->is_array()) { + templ->mysql_mvidx_len = static_cast<ulint>(field->key_length()); + templ->is_mv_v = true; + } This also applies to how it is handled during ALTER TABLE. 3. Parse multi-value out of JSON column The multi-value from a JSON data column is obtained from innobase_get_multi_value(). There, the number of values in the JSON data, and individual data value is retrieved. The retrieved data is sorted without any duplication. 4. Multi-value flags There are several flags introduced to indicate if a column, field, index is a multi-value one. 4.1. #define DATA_MULTI_VALUE 16384 /* Multi-value Virtual column */ 4.2. #define DICT_MULTI_VALUE 512 /* Multi-value index */ Basically, the DATA_MULTI_VALUE is used in prtype, to indicate if a column or field, etc. is a multi-value one, while the DICT_MULTI_VALUE is used for dict_index_t::type to indicate if an index is a multi-value index. 5. In-memory multi-value data structure Following structure multi_value_data is used to store those multi-values: /** Structure to hold number of multiple values */ struct multi_value_data { public: /** points to different value */ const void **datap; /** each individual value length */ uint32_t *data_len; /** convert buffer if the data is an integer */ uint64_t *conv_buf; /** number of values */ uint32_t num_v; /** number of pointers allocated */ uint32_t num_alc; /** Bitset to indicate which data should be handled for current data array. This is mainly used for UPDATE case. UPDATE may not need to delete all old values and insert all new values because there could be some same values in both old and new data array. If current data array is for INSERT and DELETE, this can(should) be nullptr since all values in current array should be handled in these two cases. */ Bitset *bitset; ... /** Some more member functions to handle multi-value related logic */ }; and then this whole structure will be stored/pointed by dfield_t::data. If there is no value to build on the multi-value index, one flag is introduced as the length of the multi-value column field: /** The following number as the length of a logical field means that no attribute value for the multi-value index exists in the JSON doc */ #define UNIV_NO_INDEX_VALUE (UINT32_UNDEFINED - 2) So , as mentioned in "DMLs on Index" from HLS, there are three status of the multi-value data: 5.1. For the NULL case, the relevant dfield_t object would be same with normal NULL data 5.2. For the no value on multi-value index case, the dfield_t::data would be nullptr, however the len field is set with UNIV_NO_INDEX_VALUE. 5.3. For the case there are multiple entries from the multi-value column, dfield_t::data would be set to above multi_value_data structure, and the dfield_t::len would be set with 0, to indicate this is a multiple value data. 6. SELECT and matching For SELECT or any matching scenario, InnoDB will fetch the clustered index record to check if the multi-value index entry at hand is valid, not deleted, etc. Since the clustered index record will probably actually hold the multi-value data array, as mentioned as 5.3, so in this case, InnoDB will iterate over the data array of multi_value_data, and check if any value from the array matches the secondary index entry. 7. DMLs 7.1. INSERT InnoDB will loop through each value of the multi_value_data, and then insert into multi-value index. 7.2. DELETE Similar work to INSERT will be done 7.3. UPDATE This is a bit more complicated. Basically, both old and new values have to be parsed via innobase_get_multi_value(), unless either of them is NULL. If both old and new values are as described in 5.3, then it's quite possible not all old values have to be deleted and all new values have to be inserted, because there would be some overlap between old and new values, which should be kept as is. So to get rid of above case, and to keep performance, the overlap between old and new values would be calculated during parsing both old and new values. This leverages the feature of the parsed out data array having been sorted. So a merge sort can be done quickly. This is done by innobase_get_multi_value_and_diff(). One bitmap is used to remember which value is in overlap area or not. This bitmap is defined as multi_value_data::bitset, which is only initialized for this scenario, otherwise, this pointer is always nullptr. With this supported, during real updating, only the old values not in the overlap area would be removed one by one, and new values outside the overlap zone would be inserted one by one. 7.4. Lock wait Since all DMLs will modify the multi-value index several times for one single row, so to handle the lock wait scenario, current working position should be remembered, so next time after locking wait, it can continue from a correct point to not do redundant work. The position is remembered by some flag of those "node" structure. 8. Index building As mentioned in HLS, to build multi-value index entries for one row, row_merge_buf_add() could be called for several times. And merge buffer may be allocated more than once for one single row too. 9. Undo logging and online DDL logging 9.1 Generic multi-value logging and parsing A helper class for multi_value_data is defined to log the data array and then parse it from log. /** Class to log the multi-value data and read it from the log */ class Multi_value_logger { public: /** Constructor @param[in] mv_data multi-value data structure to log @param[in] field_len multi-value data field length */ Multi_value_logger(const multi_value_data *mv_data, uint32_t field_len) : m_mv_data(mv_data), m_field_len(field_len) {} /** Get the log length for the multi-value data @param[in] precise true if precise length is needed, false if rough estimation is OK @return the total log length for the multi-value data */ uint32_t get_log_len(bool precise) const; /** Log the multi-value data to specified memory @param[in,out] ptr the memory to write @return next to the end of the multi-value data log */ byte *log(byte **ptr); /** Read the log length for the multi-value data log starting from ptr @param[in] ptr log starting from here @return the length of this log */ static uint32_t read_log_len(const byte *ptr); /** Read the multi-value data from the ptr @param[in] ptr log starting from here @param[in,out] field multi-value data field to store the array @param[in,out] heap memory heap @return next to the end of the multi-value data log */ static const byte *read(const byte *ptr, dfield_t *field, mem_heap_t *heap); ... }; With this helper class, all content from multi_value_data would be written to any specified log. Please note that the multi_value_data::bitset would be logged only if it's not nullptr. 9.2 Undo logging So for undo logging, multi-value fields are logged via APIs defined in above class, same apply to log parsing. The actual function for these purpose are trx_undo_store_multi_value() and trx_undo_rec_get_multi_value(). 9.3 Online DDL logging This also requires the logging of multi-value fields, so it leverages the same APIs from above helper class to do this. 10. Rollback and purge For either operation, it will parse the multi-value field from undo log, and then do the proper operations on the multi-value index. They are same with normal DMLs, and will iterate over the multi-value data array and operate on every different entries. 11. Multi-value entry builder To build and operate on a multi-value data entry for any operation on multi- value index, it will first check which multi-value field is tied to current multi-value index; then if it's NULL value, just do it as normal; next, check if there is really any value built on the index for current row; finally, if it's a data array, then it has to iterate over the array and build different entries for each data. This is a complex flow and which is nearly the same for every operation. More important is that building different entries for one single row should be prevented since only the multi-value field data needs to be changed, while other fields must hold the same data, and this can be done simply by replacing the data pointers properly. So two kinds of multi-value entry builder are defined for this purpose, they have a base interface called: class Multi_value_entry_builder. Two sub-classes are derived from this base interface, and one of them supports the most operations which can provide the node->row and node->ext while another of them supports the INSERT specially. 12. Change buffer As explained in the last section of HLS, change buffer has to know if a field is multi-value one or not, to screen out the duplicate key case for page validation. So this flag is remembered in the metadata area of each field. Luckily, there is only one free bit left for this purpose, which is in the first byte, and '& 64'. This change is done in dtype_new_store_for_order_and_null_size() and check in the counterpart function of it. 13. multi-value field duplication InnoDB used to call dfield_dup to duplicate a normal data. However, if the multi-value field has data as described in 5.3, it has to duplicate the full data array, rather than a simple value pointed by one data pointer.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.