WL#6455: InnoDB: GEOMETRY datatypes support

Status: Complete   —   Priority: Medium

In order to implement GIS indexing in InnoDB, InnoDB needs to support GEOMETRY
datatypes, so that InnoDB stores them properly and understands them.  Currently,
InnoDB store all geometry types in BLOB. We can continue to use BLOB (other than
point) as underlying datatype, but the column needs to be marked as proper
Since we use BLOB as underlying datatype, we still can use the prefix index 
on the geometry data column. Also, for the same reason, nothing needs to be done 
for recovery.

1) Add new InnoDB internal data type DATA_GEOMETRY for geometry data.
2: All MySQL Geometry datatypes shall be mapped properly to DATA_GEOMETRY 
datatype, from DATA_BOLB.
3) An prefix index still can be created on all Geometry datatype columns.
   And the Index column shall  have capability to do acurate select like below . 
   Ex: SELECT AsText(g) FROM t_wl6455 WHERE g = POINT(2,2);
This worklog allows InnoDB to be able to "undertand" and access of current GIS
datatypes supported by MySQL. InnoDB will not redefind GIS datatype definitions
already defined by MySQL, instead, we will include the necessary .h file and
re-use them, except for a generic InnoDB GEOMETRY datatype.

In the first phase of our InnoDB GIS effort, we will just support existing MySQL
GIS datatypes (Point, Line_string..). These are standard datatype as described 
in OGC standard 
Most of them are in the standard Well-Known Binary(WKB) format (for more
information, please see attached file "GIS_types").

MySQL server follows the standard define closely, the related classes and
structures are defined in sql/spatial.h; 

Changes in InnoDB:

1) A new generic "Geometry" datatype

In order to handle the geometry data in InnoDB, we need to separate geometry 
data with other data, so that we can handle it separately. Thus, a new data type
DATA_GEOMETRY will be added in InnoDB.

2) The storage format
Then, we need to store them in InnoDB. As the HLD described, since most of the 
geometry datatypes are not fixed, we still use BLOB as underlying datatype. So, 
when geometry data need to be stored in InnoDB, it will be treated like BLOB 
data. Also, when we need to read geometry data from InnoDB, it will be handled 
like BLOB data too. And in the first phase, we will keep it as WKB format. 
(Please find the detail information of WKB format in attached file WKB_format"). 
We may consider some other ways to store geo data for saving space in the future 

3) The interpretation of data
After that, for understanding what's the geometry data meaning, we need to 
interpret it to what InnoDB can understand. Currently, we can just reuse the 
classes defined in sql/spatial.h. Though these classes, we can interpret WKB 
format data to some class instances, and use their member functions to operate 
geometry data.

Before the changes, all MySQL geometry data type(Point, LineString..) map to 
InnoDB internal data type DATA_BLOB. After we add the new innoDB internal data 
type DATA_GEOMETRY, they will map to DATA_GEOMETRY.
So, we need to test all possible operations on geometry data.
For example:
     CREATE INDEX i_p ON t_wl6455 (g(10)); (Since the prefix index is still 
worked on geometry data, we can use prefix index to do accurate select like 

     SELECT AsText(g) FROM t_wl6455 WHERE g = POINT(2,2);
     SELECT AsText(g) FROM t1 where ST_Within(g, GeomFromText('POLYGON((5 1, 7 
1, 7 7, 5 7, 3 3, 5 3, 5 1))'));

     INSERT INTO t_wl6455 VALUES(1, POINT(1,1));
     DELETE ...
     UPDATE ..

     After some operations done on geometry data, crash the server.
     Then restart the server to recover the geometry data.

In the first phase of our InnoDB GIS effort, we will just support existing MySQL
GIS datatypes, they are 

1) point
2) line_string
3) polygon
4) multi_point
5) multi_polygon
6) multi_line_string
7) geometry_collection
8) geometry

The related classes and structures can be seen in sql/spatial.h and 

Classes defined in sql/spatial.h:


For support all these geometry types, we need to do the following step:
1) Add new data type DATA_GEOMETRY:
The datatype definition is in storage/innobase/include/data0type.h, we can add 
new data type DATA_GEOMETRY in it. Accordingly, we need to modify some places, 
like storage/innobase/include/data0type.ic ...

2) Store geometry data:
Following two functions are the interfaces of datatype exchange between InnoDB
and MySQL:
row_mysql_store_col_in_innobase_format ()  ====> MySQL to InnoDB
row_sel_field_store_in_mysql_format ()     ====> InnoDB to MySQL

We need to special handle the DATA_GEOMETRY in this two functions. And as
described in HLD, we still use BLOB as underlying datatype, so we can just
handle it like BLOB. Also, we need to modify some other places accordingly.

3)Interpret geometry data:
sql/spatial.h and sql/spatial.cc defined some classes for geometry data. 
We can include it directly and use them to interpret geometry data.

As described in HLD, we still can use the prefix index on the geometry data 
column. And since no need to create a new log data type, nothing needs to be 
done for recovery.