WL#6745: InnoDB GIS: support DML operation for InnoDB R-tree Index

Status: Complete

In WL 6455 we added the InnoDB GEOMETRY datatypes support. This WL is to add
Rtree index support in InnoDB on the Geometry datatype. (See attachment 
Gutman84.pdf for what is R-tree.)

In the first phase, we will adopt all MyISAM's MRB (minimum bounding box)
manipulation functions and only supports 2 dimension GIS datatypes as the
current MyISAM does. In the subsequent releases we will try to support indexing
on multi-dimension data (See WL #6618 InnoDB GIS: Support multi-dimensoin data
in GIS index). For this consideration, this implementaion needs to 

To prevent the "phantom" reads, we will need to add support of predicate locking
mechanism. This is being tracked in WL#6609 InnoDB GIS: Support Predicate
Locking for GIS index.

This worklog addresses the R-tree split and shrink operations. It is merged to
WL 6968 for the complete R-tree testing and release.

User Documentation
==================

http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-5.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-predicate-locks.html

This is tested a part of R-tree implementation (wl 6968). So please see Worklog
6968 for function and non-functional requirement

This worklog allows InnodDB to support spatial index. InnoDB will create an R-tree 
that maps to MySQL spatial index.

In general, the idea is to bring the R-tree into InnoDB's existing tree 
implementation, so that the R-tree index will become part of InnoDB seamlessly, 
without being a separate entity of its own.

Due to the similarity between the R-tree and B-tree in terms of being balanced 
tree, the merge of R-tree into InnoDB tree is fairly straight forward.

The benefit will be obvious: this R-tree will be secondary index almost just like 
other secondary index, support all transactional properties of the InnoDB, include 
MVCC etc., and we also do not need to provide extra work in terms recovery and 
other index related activity such as purging etc. (of course, the R-tree has 
different delete behavior, we will need to follow up).

The key part of this is that the R-tree is basically a "minimal bounding box" 
index. Other than that, it is very similar to B-Trees. Its index records still has 
pointers to the primary key records that store the real spatial data. So the 
difference with B-tree is simply 
1) the index key record (being MBR). 
2) index traverse and manipulations, basically MBR related operations.

So to be short, what we will need to do is just pack those MBR 
traverse/manipulation functions, and adopt/overload them to existing InnoDB tree 
operations, and then we will have R-tree in InnoDB. This might be different from 
earlier consideration to extract InnoDB low level buffer page APIs and bring them 
to existing R-tree. In that alternative, we could lose or make it very difficult 
to support InnoDB native properties, which are tightly integrated in InnoDB tree 
code.

1) Add DICT_SPATIAL index type to index->type.
We will need a new index type to mark an index as SPATIAL index. So Add 
DICT_SPATIAL to index type. Such index is identified by HA_SPATIAL in 
MySQL in info->s->keyinfo[keynr].flag & HA_SPATIAL


2) Add "HA_CAN_RTREEKEYS" bit in ha_innobase::ha_innobase() to enable 
SPATIAL (TREE) index. Also note keyinfo->keyalg would set to 
HA_KEY_ALG_RTREE for rtree search.

3) Create index. The R-Tree builds index on multi-dimension. So the 
merge sort code path with InnoDB FIC is of no use to us (at least in 
the first phase). So we will use the regular create index 
(create_index()) path.

CREATE TABLE geom3 (g GEOMETRY NOT NULL, SPATIAL INDEX(g)) ;

4) As mentioned earlier, the R-Tree index record are bounding boxes of 
the geometry objects. So we will need to make bounding box on inserting 
geometry datatype when inserting into InnoDB Spatial index. To do so, we 
might need special marked index field type, showing this is a field type 
of bounding box.

Then, we will need to modify code in row_build_index_entry() to handle 
DICT_SPATIAL index, so instead of adding indexed field to the entry, we 
create a bounding box (MBR). This will be the "first" index entry

The function we can reuse from MySQL/MyISAM is a set of APIs in 
sp_key.c. For example, sp_mbr_from_wkb() makes an minimum bound in 
sp_make_key() for MySQL and then insert into MyISAM R-Tree. We can just 
re-use these functions.


5) Once the MBR is made, we can then use it to insert into the index as 
index key. The corresponding InnoDB function is 
row_ins_index_entry_low(). The MyISAM function is 
rtree_insert_level->rtree_insert_req->rtree_add_key. However, at this 
level, MyISAM funcations are only for references. They are not needed 
since we already have the key (MBR). All we will need is to handle the 
spatial index in  InnoDB functions row_ins_index_entry_low().

To insert the key, it needs to position itself (traverse from root to 
leave. And this is what btr_cur_search_to_nth_level() does.


6) Finally, double check the code that puts the record on the page. 
btr_cur_optimistic_insert and btr_cur_pessimistic_insert(), and make 
sure the key records are put on the corresponding index page.


7) We will need to special handle the index split and tree height 
growth. The corresponding InnoDB functions are 
btr_root_raise_and_insert() and btr_page_split_and_insert() etc. And the 
corresponding MyISAM R-Tree APIs are in rt_split.c. 

In addition, the split could result the original bounding box to be changed
(this is different with that of B-tree, in which the original index record will
not change). Thus while we insert the new index record (with bounding box) to
the parent page, we will need to update the original data as well.

8) Index traverse is used in all operation of index tree. To see the index
traversing and key positioning, please refer to worklog 6968 for details.

9) Recovery and logging. The spatial data recovery is handled same as other
data, except the index would need to log the special "bounding box" 
field. So the InnoDB code should be able to handle this new field type.

10) Change buffering, we will disable change Buffering for Spatial index 
at the beginning.

11) Hash index. No need to support hash index on spatial datatype.

12) Index record update: the update is always done by delete and re-insert

13) Concurrency handling and cursor handling for above tree manipulation are
addressed in WL 6968.