WL#6968: InnoDB GIS: R-tree index support

Status: Complete

Description
Requirements
Dependent Tasks
High Level Architecture

This is the work for R-tree search and key positioning. It is responsible for
all the key positioning for search, DML (delete/insert/update) and purge.

We also continue to use the cursor APIs in the current B-tree interfaces, but
such cursor positioning is more of an internal notion of being able to position
the search position and continue from where we left last time, rather than the
"cursor" in the external SQL language.

The R-tree search and traverse is different from that of B tree in the sense that
a search criteria could be met in multiple leaf pages of different search paths.
A querying bounding box can intersect or contain in multiple leaf and non-leaf
bounding boxes. In such a way, we will need to enhance/modify/replace the
row_search_for_mysql() and btr_cur_search_to_nth_level() to do the correct work.

When comes to positioning, spatial data makes almost impossible since naturally
there is no ordering for spatial/multi-dimension data. However, we will need to
impose artificial binary orders so that we can position/locate our record in a
page. By saying that, there is no way we can store/restore any cursor in a page
due to the nature how R-tree splits. So various schemes/rules are designed to
accommodate that, and will be discussed in HLD

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-predicate-locks.html

Since InnoDB Spatial Index worklogs are inter-related, so they can be tested as
a whole, not separately. And this worklog will consolidate worklog 6745 (DML)
and worklog 6609 (Predicate lock) and present a full functional GIS spatial
index package, so following function requirements are also the  basis for
testing all three worklogs:

1. Being able to create spatial index on geometry datatype

CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g)) ENGINE=InnoDB;

ALTER TABLE geom ADD SPATIAL INDEX(g);

CREATE SPATIAL INDEX sp_index ON geom (g);

2. Being able to drop spatial index

3. Being able to do search based on Geometry Bounding Boxes

set @g1 = GeomFromText('Polygon((0 0,0 100,100 100,100 0,0 0))');
SELECT count(*) from t1 WHERE Within(t1.c2, @g1);

SELECT name FROM map_test WHERE
Contains(GeomFromText('POLYGON((0 0, 0 3, 3 3, 3 0, 0 0))'), loc);

The Operator can be:

Contain, Within, Intersect, Touches etc.

4. Being able to do DML concurrently on Geometry data with Spatial index. Test
index structure coherency, with concurrent insert/delete/update

5. Observe to the transactional property, being able to rollback DML operations

6. Does correct crash recovery

7. Observe the transactional isolation level, especially regarding phantom
phenomenon.

8. Be compatible with MyISAM GIS functionalities regarding the tree search

9. Correct handling on errors regarding attempts on indexing on non-geometry
datatype

10. Be platform independent. No particular failure regarding endian coding etc.

11. Observe transaction rollback, tree will be shrunk properly.

12. Purge should work fine with R-tree. The tree delete marked entry should be
able to purged, and tree reorg-ed (shrink) fine

13. Test on multiple spatial indexes on the same table, and used simultaneously
in the queries

14. Test large GIS objects that contains thousands of points

15. Test create spatial index clause with other alter table options

WL#6609: InnoDB GIS: Support Predicate Locking for GIS index
WL#6745: InnoDB GIS: support DML operation for InnoDB R-tree Index

This task involves the key work on how we search the Rtree and position the
"key" on particular record for later retrieve/restore. Such key component is
used for every aspect of the R-tree search and DML opereations, including search
and scan, insert as well as delete and update. The theoretical basis of our
algorithm can be found in a paper by "M. Kornacker and D. Banks" on
"High-Concurrency Locking in R-Trees".

We have following key characteristics for R-tree search and position:

1) The index key contains only MBR(minimum Bounding Rectangle) and primary key
fields. The actual data is not stored in the R-tree.

2) On a index page, the records are sorted in their binary sequence. So it is
not necessary related to their Minimum Bounding Box coordinates.

3) We will store an 8 byte monotonically increasing global sequence number
(Split Sequence Number) in the page header, at FIL_RTREE_SPLIT_SEQ_NUM (26),
this used to be FIL_PAGE_FILE_FLUSH_LSN, but not used by r-tree index pages. The
maximum SSN value will be stored in the root page FIL_RTREE_SPLIT_SEQ_NUM field.
So if crash happens, we will know where to restart the counting

4) There is a key fact that needs to be noted before we continue, that is, we
cannot position a key in an R-tree index page. The reason is related to how the
R-tree splits page, the splits would divide records on page based on their
physical dimension, rather than existing order. So you cannot pre-determine
which records goes to next page, which would be left. So we cannot do a cursor
position in a page, we either finish qualifying records on a page, or not to
start at all.

The Search Query:

As we discussed above, we cannot store/restore cursor positioned in a page, so
even for normal search query, we need to finish scanning a page while we are on it.

1) We created 2 vectors to store result from page scanning, one for non-leaf
page (rtr_info->path), and one for leaf page (rtr_info->matches).

2) For non-leaf nodes, the search will push the qualified record info (child
page number, tree level) along with current SSN into the "path" vector.

3) For leaf nodes, the search will copy matched records to a shadow buffer page,
and push its pointer to the "matches" vector.

4) As a result, when row_search_for_mysql() asks for next matching record, it
will not do a restore to previous position, instead it looks for "matches"
vector to get the next matching record.

5) After exhausting the records on "matches" vector, it will not get the next
page by following the "next" page pointer (like btree). Instead, it will go back
to the "path" vector and pop next page to search.

6) All the page that are pushed into "path" will not allow be shrunk away. It is
done so by "page" lock it.

7) To get the "next" page, we pop a node from the stack, then fetch the page
according to its page no. If we find the  page's current SSN does not match that
of we stored in the node, there must be a split happened, so it will follow the
right link to push all pages with larger SSN to the "path" stack.

The Search for Insert

1) For insert positioning, we have a new mode, PAGE_CUR_RTREE_INSERT, which
translate to PAGE_CUR_LE for leaf level search

2) In non-leaf level traverse, it will first find any appropriate record with
PAGE_CUR_WITHIN mode, if non found, it will calculate the possible area needs to
increase if putting in any existing MBR,and choose the MBR with least increased
area.

3) If it does need to increase the MBR to fit in, it will not release the latch
on that page as it traverse down, since we will update the MBR later on.

4) We will use another vector (parent_path) containing a btr_pcur_t to record
the parent pages as the traverse goes down. So in case we will need to fetch
parent page for update (split and update MBR), we will not need to call
btr_cur_search_to_nth_level() again. Of course, those parent pages that pending
to be located again should have latches on them.


The search for delete/update

1) For delete operation, we also have a new mode, PAGE_CUR_RTREE_LOCATE for
finding the row. This mode translates to PAGE_CUR_WITHIN in non-leaf node, and
PAGE_CUR_LE for leaf level search.

2) As in the normal query search, there could be multiple pages in a non-leaf
level qualifies the WITHIN predicates. So these will be pushed to "path" vector
for future search. We also maintained a "parent_path" vector, this keeps in sync
with "path" vector, except it records the "parent" pages, so one level above
those records in "path".

3) Different with insert, we do not allow MBR updates in parent when
deleting a record as for now. This is mainly for performance concern, otherwise,
we will need to scan all rows on a page to find out the MBR for each delete, and
potentially propagate all the way to root.

However, during Tree shrink, the upper level index entry's MBR would be updated
to reflect the new MBR for merged pages

4) We also do remember the savepoint when latching each block. So if we do move
to another page, we will release latches on the previous page, so there will not
be 2 pages latched on the same level and we go through the "path" 

5) The shrink will remain to be similar, however, we do not allow merge pages
belong to different parents. In such way, the parent page after merge is the
same parent page for the page to be shrunk. And such page should already be
latched exclusively. This parent latching beforehand is no different from
current scheme.

In such way, we could utilize our parent_path vector to quickly locate the
parent (instead of searching again from root). This is an advantage over btree.

6) With the "page locks" introduced in the WL 6609, the page that falls into any
search paths cannot be merged/shrunk away. However, such page CAN BE deleted, if
rows on the page are all deleted. When this happen, all other search paths
belong to other queries will be updated, to remove such page from their search
path. The operation is protected by special mutexes

7) For pessimistic delete that could involve page merging, the index->lock is X
locked, as we could latch pages (parent etc.) out of regular order (up->down and
left to right)

In summary, there are 3 vectors (path, matches, parent_path) stored in our
rtr_info structure to maintained the search results. And it couples with SSN
value and "page" lock to deal with concurrent operations. The search still
follows root->leaf, left->right latch ordering.


Reference paper attached:
M. Kornacker and D. Banks. High-Concurrency Locking in R-Trees.
In Proc. 21st Int’l 13
Conference on Very Large Databases (VLDB), pages 134–145, September 1995.