WL#6745: InnoDB GIS: support DML operation for InnoDB R-tree Index
Status: Complete — Priority: Medium
In WL 6455 we added the InnoDB GEOMETRY datatypes support. This WL is to add Rtree index support in InnoDB on the Geometry datatype. (See attachment Gutman84.pdf for what is R-tree.) In the first phase, we will adopt all MyISAM's MRB (minimum bounding box) manipulation functions and only supports 2 dimension GIS datatypes as the current MyISAM does. In the subsequent releases we will try to support indexing on multi-dimension data (See WL #6618 InnoDB GIS: Support multi-dimensoin data in GIS index). For this consideration, this implementaion needs to To prevent the "phantom" reads, we will need to add support of predicate locking mechanism. This is being tracked in WL#6609 InnoDB GIS: Support Predicate Locking for GIS index. This worklog addresses the R-tree split and shrink operations. It is merged to WL 6968 for the complete R-tree testing and release. User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html http://dev.mysql.com/doc/refman/5.7/en/innodb-predicate-locks.html
This is tested a part of R-tree implementation (wl 6968). So please see Worklog 6968 for function and non-functional requirement
This worklog allows InnodDB to support spatial index. InnoDB will create an R-tree that maps to MySQL spatial index. In general, the idea is to bring the R-tree into InnoDB's existing tree implementation, so that the R-tree index will become part of InnoDB seamlessly, without being a separate entity of its own. Due to the similarity between the R-tree and B-tree in terms of being balanced tree, the merge of R-tree into InnoDB tree is fairly straight forward. The benefit will be obvious: this R-tree will be secondary index almost just like other secondary index, support all transactional properties of the InnoDB, include MVCC etc., and we also do not need to provide extra work in terms recovery and other index related activity such as purging etc. (of course, the R-tree has different delete behavior, we will need to follow up). The key part of this is that the R-tree is basically a "minimal bounding box" index. Other than that, it is very similar to B-Trees. Its index records still has pointers to the primary key records that store the real spatial data. So the difference with B-tree is simply 1) the index key record (being MBR). 2) index traverse and manipulations, basically MBR related operations. So to be short, what we will need to do is just pack those MBR traverse/manipulation functions, and adopt/overload them to existing InnoDB tree operations, and then we will have R-tree in InnoDB. This might be different from earlier consideration to extract InnoDB low level buffer page APIs and bring them to existing R-tree. In that alternative, we could lose or make it very difficult to support InnoDB native properties, which are tightly integrated in InnoDB tree code.
1) Add DICT_SPATIAL index type to index->type. We will need a new index type to mark an index as SPATIAL index. So Add DICT_SPATIAL to index type. Such index is identified by HA_SPATIAL in MySQL in info->s->keyinfo[keynr].flag & HA_SPATIAL 2) Add "HA_CAN_RTREEKEYS" bit in ha_innobase::ha_innobase() to enable SPATIAL (TREE) index. Also note keyinfo->keyalg would set to HA_KEY_ALG_RTREE for rtree search. 3) Create index. The R-Tree builds index on multi-dimension. So the merge sort code path with InnoDB FIC is of no use to us (at least in the first phase). So we will use the regular create index (create_index()) path. CREATE TABLE geom3 (g GEOMETRY NOT NULL, SPATIAL INDEX(g)) ; 4) As mentioned earlier, the R-Tree index record are bounding boxes of the geometry objects. So we will need to make bounding box on inserting geometry datatype when inserting into InnoDB Spatial index. To do so, we might need special marked index field type, showing this is a field type of bounding box. Then, we will need to modify code in row_build_index_entry() to handle DICT_SPATIAL index, so instead of adding indexed field to the entry, we create a bounding box (MBR). This will be the "first" index entry The function we can reuse from MySQL/MyISAM is a set of APIs in sp_key.c. For example, sp_mbr_from_wkb() makes an minimum bound in sp_make_key() for MySQL and then insert into MyISAM R-Tree. We can just re-use these functions. 5) Once the MBR is made, we can then use it to insert into the index as index key. The corresponding InnoDB function is row_ins_index_entry_low(). The MyISAM function is rtree_insert_level->rtree_insert_req->rtree_add_key. However, at this level, MyISAM funcations are only for references. They are not needed since we already have the key (MBR). All we will need is to handle the spatial index in InnoDB functions row_ins_index_entry_low(). To insert the key, it needs to position itself (traverse from root to leave. And this is what btr_cur_search_to_nth_level() does. 6) Finally, double check the code that puts the record on the page. btr_cur_optimistic_insert and btr_cur_pessimistic_insert(), and make sure the key records are put on the corresponding index page. 7) We will need to special handle the index split and tree height growth. The corresponding InnoDB functions are btr_root_raise_and_insert() and btr_page_split_and_insert() etc. And the corresponding MyISAM R-Tree APIs are in rt_split.c. In addition, the split could result the original bounding box to be changed (this is different with that of B-tree, in which the original index record will not change). Thus while we insert the new index record (with bounding box) to the parent page, we will need to update the original data as well. 8) Index traverse is used in all operation of index tree. To see the index traversing and key positioning, please refer to worklog 6968 for details. 9) Recovery and logging. The spatial data recovery is handled same as other data, except the index would need to log the special "bounding box" field. So the InnoDB code should be able to handle this new field type. 10) Change buffering, we will disable change Buffering for Spatial index at the beginning. 11) Hash index. No need to support hash index on spatial datatype. 12) Index record update: the update is always done by delete and re-insert 13) Concurrency handling and cursor handling for above tree manipulation are addressed in WL 6968.
Copyright (c) 2000, 2018, Oracle Corporation and/or its affiliates. All rights reserved.