WL#6968: InnoDB GIS: R-tree index support
Status: Complete — Priority: Medium
This is the work for R-tree search and key positioning. It is responsible for all the key positioning for search, DML (delete/insert/update) and purge. We also continue to use the cursor APIs in the current B-tree interfaces, but such cursor positioning is more of an internal notion of being able to position the search position and continue from where we left last time, rather than the "cursor" in the external SQL language. The R-tree search and traverse is different from that of B tree in the sense that a search criteria could be met in multiple leaf pages of different search paths. A querying bounding box can intersect or contain in multiple leaf and non-leaf bounding boxes. In such a way, we will need to enhance/modify/replace the row_search_for_mysql() and btr_cur_search_to_nth_level() to do the correct work. When comes to positioning, spatial data makes almost impossible since naturally there is no ordering for spatial/multi-dimension data. However, we will need to impose artificial binary orders so that we can position/locate our record in a page. By saying that, there is no way we can store/restore any cursor in a page due to the nature how R-tree splits. So various schemes/rules are designed to accommodate that, and will be discussed in HLD User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html http://dev.mysql.com/doc/refman/5.7/en/innodb-predicate-locks.html
Since InnoDB Spatial Index worklogs are inter-related, so they can be tested as a whole, not separately. And this worklog will consolidate worklog 6745 (DML) and worklog 6609 (Predicate lock) and present a full functional GIS spatial index package, so following function requirements are also the basis for testing all three worklogs: 1. Being able to create spatial index on geometry datatype CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g)) ENGINE=InnoDB; ALTER TABLE geom ADD SPATIAL INDEX(g); CREATE SPATIAL INDEX sp_index ON geom (g); 2. Being able to drop spatial index 3. Being able to do search based on Geometry Bounding Boxes set @g1 = GeomFromText('Polygon((0 0,0 100,100 100,100 0,0 0))'); SELECT count(*) from t1 WHERE Within(t1.c2, @g1); SELECT name FROM map_test WHERE Contains(GeomFromText('POLYGON((0 0, 0 3, 3 3, 3 0, 0 0))'), loc); The Operator can be: Contain, Within, Intersect, Touches etc. 4. Being able to do DML concurrently on Geometry data with Spatial index. Test index structure coherency, with concurrent insert/delete/update 5. Observe to the transactional property, being able to rollback DML operations 6. Does correct crash recovery 7. Observe the transactional isolation level, especially regarding phantom phenomenon. 8. Be compatible with MyISAM GIS functionalities regarding the tree search 9. Correct handling on errors regarding attempts on indexing on non-geometry datatype 10. Be platform independent. No particular failure regarding endian coding etc. 11. Observe transaction rollback, tree will be shrunk properly. 12. Purge should work fine with R-tree. The tree delete marked entry should be able to purged, and tree reorg-ed (shrink) fine 13. Test on multiple spatial indexes on the same table, and used simultaneously in the queries 14. Test large GIS objects that contains thousands of points 15. Test create spatial index clause with other alter table options
WL#6609: InnoDB GIS: Support Predicate Locking for GIS index
WL#6745: InnoDB GIS: support DML operation for InnoDB R-tree Index
WL#6745: InnoDB GIS: support DML operation for InnoDB R-tree Index
This task involves the key work on how we search the Rtree and position the "key" on particular record for later retrieve/restore. Such key component is used for every aspect of the R-tree search and DML opereations, including search and scan, insert as well as delete and update. The theoretical basis of our algorithm can be found in a paper by "M. Kornacker and D. Banks" on "High-Concurrency Locking in R-Trees". We have following key characteristics for R-tree search and position: 1) The index key contains only MBR(minimum Bounding Rectangle) and primary key fields. The actual data is not stored in the R-tree. 2) On a index page, the records are sorted in their binary sequence. So it is not necessary related to their Minimum Bounding Box coordinates. 3) We will store an 8 byte monotonically increasing global sequence number (Split Sequence Number) in the page header, at FIL_RTREE_SPLIT_SEQ_NUM (26), this used to be FIL_PAGE_FILE_FLUSH_LSN, but not used by r-tree index pages. The maximum SSN value will be stored in the root page FIL_RTREE_SPLIT_SEQ_NUM field. So if crash happens, we will know where to restart the counting 4) There is a key fact that needs to be noted before we continue, that is, we cannot position a key in an R-tree index page. The reason is related to how the R-tree splits page, the splits would divide records on page based on their physical dimension, rather than existing order. So you cannot pre-determine which records goes to next page, which would be left. So we cannot do a cursor position in a page, we either finish qualifying records on a page, or not to start at all. The Search Query: As we discussed above, we cannot store/restore cursor positioned in a page, so even for normal search query, we need to finish scanning a page while we are on it. 1) We created 2 vectors to store result from page scanning, one for non-leaf page (rtr_info->path), and one for leaf page (rtr_info->matches). 2) For non-leaf nodes, the search will push the qualified record info (child page number, tree level) along with current SSN into the "path" vector. 3) For leaf nodes, the search will copy matched records to a shadow buffer page, and push its pointer to the "matches" vector. 4) As a result, when row_search_for_mysql() asks for next matching record, it will not do a restore to previous position, instead it looks for "matches" vector to get the next matching record. 5) After exhausting the records on "matches" vector, it will not get the next page by following the "next" page pointer (like btree). Instead, it will go back to the "path" vector and pop next page to search. 6) All the page that are pushed into "path" will not allow be shrunk away. It is done so by "page" lock it. 7) To get the "next" page, we pop a node from the stack, then fetch the page according to its page no. If we find the page's current SSN does not match that of we stored in the node, there must be a split happened, so it will follow the right link to push all pages with larger SSN to the "path" stack. The Search for Insert 1) For insert positioning, we have a new mode, PAGE_CUR_RTREE_INSERT, which translate to PAGE_CUR_LE for leaf level search 2) In non-leaf level traverse, it will first find any appropriate record with PAGE_CUR_WITHIN mode, if non found, it will calculate the possible area needs to increase if putting in any existing MBR,and choose the MBR with least increased area. 3) If it does need to increase the MBR to fit in, it will not release the latch on that page as it traverse down, since we will update the MBR later on. 4) We will use another vector (parent_path) containing a btr_pcur_t to record the parent pages as the traverse goes down. So in case we will need to fetch parent page for update (split and update MBR), we will not need to call btr_cur_search_to_nth_level() again. Of course, those parent pages that pending to be located again should have latches on them. The search for delete/update 1) For delete operation, we also have a new mode, PAGE_CUR_RTREE_LOCATE for finding the row. This mode translates to PAGE_CUR_WITHIN in non-leaf node, and PAGE_CUR_LE for leaf level search. 2) As in the normal query search, there could be multiple pages in a non-leaf level qualifies the WITHIN predicates. So these will be pushed to "path" vector for future search. We also maintained a "parent_path" vector, this keeps in sync with "path" vector, except it records the "parent" pages, so one level above those records in "path". 3) Different with insert, we do not allow MBR updates in parent when deleting a record as for now. This is mainly for performance concern, otherwise, we will need to scan all rows on a page to find out the MBR for each delete, and potentially propagate all the way to root. However, during Tree shrink, the upper level index entry's MBR would be updated to reflect the new MBR for merged pages 4) We also do remember the savepoint when latching each block. So if we do move to another page, we will release latches on the previous page, so there will not be 2 pages latched on the same level and we go through the "path" 5) The shrink will remain to be similar, however, we do not allow merge pages belong to different parents. In such way, the parent page after merge is the same parent page for the page to be shrunk. And such page should already be latched exclusively. This parent latching beforehand is no different from current scheme. In such way, we could utilize our parent_path vector to quickly locate the parent (instead of searching again from root). This is an advantage over btree. 6) With the "page locks" introduced in the WL 6609, the page that falls into any search paths cannot be merged/shrunk away. However, such page CAN BE deleted, if rows on the page are all deleted. When this happen, all other search paths belong to other queries will be updated, to remove such page from their search path. The operation is protected by special mutexes 7) For pessimistic delete that could involve page merging, the index->lock is X locked, as we could latch pages (parent etc.) out of regular order (up->down and left to right) In summary, there are 3 vectors (path, matches, parent_path) stored in our rtr_info structure to maintained the search results. And it couples with SSN value and "page" lock to deal with concurrent operations. The search still follows root->leaf, left->right latch ordering. Reference paper attached: M. Kornacker and D. Banks. High-Concurrency Locking in R-Trees. In Proc. 21st Int’l 13 Conference on Very Large Databases (VLDB), pages 134–145, September 1995.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.