WL#1326: GIS: Precise spatial operations

Affects: Server-5.6   —   Status: Complete   —   Priority: Low

Currently all spatial relations over geometric objects are performed not over 
the exact object representations, but over their bounding rectangles. That's 
not compatible with the OpenGIS standard.
Also we need to implement spatial operations over geometric objects 
(Intersection etc.).

The goal of this task is to replace the current imprecise implementation of 
geometric relations and to implement geometric operations according to the 
OpenGIS standard.

Don't forget to close BUG#4249 when it's done.

The OpenGIS SQL specifications can be found here:
http://portal.opengeospatial.org/files/?artifact_id=829

We also take into account the OGC 06-104r3 specifications.
To perform in the OpenGIS way the plan is to implement
these methods for testing Spatial Relations between geometric objects:

Equals(g1 Geometry, g2 Geometry) returns 1 if g1 and g2 are equal 0 otherwise

Disjoint(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and
   g2 is the empty set

Touches(g1 Geometry, g2 Geometry) returns 1 if the only points in common
   between g1 and g2 lie in the union of the boundaries

Within(g1 Geometry, g2 Geometry) returns 1 if g1 completely contained in g2

Overlaps(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2
   isn't empty and is different both from g1 and g2

Crosses(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2
   isn't empty and its dimension is less than the maximum dimension of g1
   and g2

Intersects(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and
   g2 is not empty (Intersects <=> !Disjoint)

Contains(g1 Geometry, g2 Geometry) returns 1 if g2 is completely contained in g1
   (Contains(g1, g2) <=> Within(g2, g1))


These methods are present in MySQL now, but work with bounding rectangles, not 
with the precise object's shape. So we left the existing functions as they are
now, and use SQL/MM names for the precise ones. These are:

ST_Equals(g1 Geometry, g2 Geometry) returns 1 if g1 and g2 are equal 0 otherwise

ST_Disjoint(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and
   g2 is the empty set

ST_Touches(g1 Geometry, g2 Geometry) returns 1 if the only points in common
   between g1 and g2 lie in the union of the boundaries

ST_Within(g1 Geometry, g2 Geometry) returns 1 if g1 completely contained in g2

ST_Overlaps(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2
   isn't empty and is different both from g1 and g2

ST_Crosses(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2
   isn't empty and its dimension is less than the maximum dimension of g1
   and g2

ST_Intersects(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and
   g2 is not empty (Intersects <=> !Disjoint)

ST_Contains(g1 Geometry, g2 Geometry) returns 1 if g2 is completely contained in g1
   (Contains(g1, g2) <=> Within(g2, g1))


SQL functions that implement spatial operations:

ST_Intersection(g1 geometry, g2 geometry) returns a Geometry that is the set
   intersection of g1 and g2

ST_Difference(g1 geometry, g2 geometry) returns a Geometry that is the closure
   of the set difference of g1 and g2

ST_Union(g1 geometry, g2 geometry) returns a Geometry that is the set union of
   g1 and g2

ST_SymDifference(g1 geometry, g2 geometry) returns a Geometry that is the
   closure of the set symmetric difference of g1 and g2 (logical XOR of space)

ST_Buffer(g1 geometry, d numeric) returns a Geometry defined by buffering
   a distance d around g1 where d is the distance units for the Spatial
   Reference of g1


Distance function

ST_Distance(g1 geometry, g2 geometry) returns distance between g1 and g2


This task doesn't affect the way Spatial Indexes work. They still operate
with the boundary rectangles, and objects that don't fit the precise condition
will be filtered with the subsequent filter.
Description of the algorithms

To implement the entire lot of functions, I don't see anything better than
the methods based on SliceScan algorithm. The basic idea is that operand
shapes are cut to simple slices, so they become easy to analyse.


Slicescan algorithm:

It's not supposed to count anything by itself, it rather
offers a base for other 'counting' algorithms working upon it.
The basic idea is to split the complex shapes into simpler ones
that would be easy to analyse.


 Y axis
 ^
 |          a         
 |          . .         b .       
 |          .   .     .     . .  b
 |          .     .             .
 |          .      .*         .
 |          .        .     .
 |          a.   .     . .
 |             * . .    *
 |            .      * . . a
 |           .    .
 |          .   .
 |         b .
 |
 |------------------------------------------->  X axis


Let's use the example of two triangle shapes 'a' and 'b' to show how
the 'slicescan' works.

First we sort all points of the shapes by Y coordinate, then
scan these points one-by-one in that Y order, making strip slices
on each step. Note that each scan contains only trapezoids or
triangles - shapes that are pretty easy to handle.



 Y axis
 ^
 |          a         
 |          . .         b .       
 |          .   .     .     . .  b
 |          .     .             .
 |          .      .*         .
 |          .        .     .
 |          a.   .     . .
 |             * . .    *
 |            .      * . . a1
 |           .    .
 |          .   .
 |________ b1._______________________________ S1
 |
 |------------------------------------------->  X axis

First slice S1 contains only b1 point



 Y axis
 ^
 |          a         
 |          . .         b .       
 |          .   .     .     . .  b
 |          .     .             .
 |          .      .*         .
 |          .        .     .
 |          a.   .     . .
 |             * . .    *
 |___________ ._____ * . . a1________________ S2
 |           b21  .b22
 |          .   .
 |________ b1._______________________________ S1
 |
 |------------------------------------------->  X axis

Second slice has three points (b21, b22, a1)
We also get (b1, b21, b22) triangle between S1 and S2 to analyse.


 Y axis
 ^
 |          a         
 |          . .         b .       
 |          .   .     .     . .  b
 |          .     .             .
 |          .      .*         .
 |          .        .     .
 |_________ a3 _b31___a31b32_________________ S3
 |             I2. .    I3
 |___________ ._____ I1. . a1________________ S2
 |           b21  .b22
 |          .   .
 |________ b1._______________________________ S1
 |
 |------------------------------------------->  X axis

Third slice is done with a3 point, but here we can notice that
order of points was changed. a1 moved in the a3 direction, and
got on the left of the 'b' points. Also a31 is on the left
from b32 now. That means that there were intersections between
slices. As we don't want any intersection inside the slice, we
leave S3 slice to be handled later. In this case we find all
intersection points between slices (it's relatively easy as we
only have straight lines inside the slice), sort these intersection
points by Y axis and then do slices with these additional intersection
points. Here we have 3 additional 'intersection' slices with
I1, I2 and I3

So we can be sure that we only have simple shapes (trapezoid)
inside the slice.

_____a21______b21________a22________b22______________ S2
     /         |          |          \
    /          |          |           \
   /           |          |            \
_a11__________b11________a12______________b12________ S1



Now we can implement spatial relations upon the slicescan:

   2.b. Implementation of required operations

Intersects(g1 Geometry, g2 Geometry)
   Check points of the slice from the left to the right
   counting how many g1 points we met.
   If we get g2 point and the g1-points count is
   noteven, then this g2 point is inside the g1 so
   intersection is notempty, so return TRUE.

similarly are implemented:
    Disjoint(g1 Geometry, g2 Geometry)
    Within(g1 Geometry, g2 Geometry)
    Overlaps(g1 Geometry, g2 Geometry)
    Contains(g1 Geometry, g2 Geometry)

Equals(g1 Geometry, g2 Geometry)
    Here we can count the square of nonintersecting area. If it's small
    enough we can say shapes are equal. Unfortunately we can't implement
    Equals strictly enough working with DOUBLEs.

Touches(g1 Geometry, g2 Geometry)
    We can look for trapezoids of 'empty space' - not lying inside g1
    or g2. If we found such a trapezoid having zero(small enough) square
    we should decide that the geometries touch and then check if g1 and g2
    intersect (as Touches should return 0 in the case of intersection).
    Again I should say that it's not possible to implement Touches strictly
    with DOUBLEs.

Relate()
    Here we just count patternMatrix value on each trapezoid until
    we decide that the condition was met. In fact most of the predicates
    above are just Relate() with proper parameters, so maybe it's
    better to implement those as Relate calls. Though they will
    work a bit slower that way and optimization (spatial key usage) looks
    more difficult in this case.


SQL functions that implement spatial operations:

Intersection(g1 geometry, g2 geometry)
Difference(g1 geometry, g2 geometry)
Union(g1 geometry, g2 geometry)
SymDifference(g1 geometry, g2 geometry)
     Here we collect new shapes from slice's trapezoids that
     satisfy the operation's condition (Intersection, union, etc...).

Buffer(g1 geometry, d numeric)
     Here we have to find the Union of these shapes:
     1. The object itself
     2. Circles with radiuses of the d around each node of the g1
     3. Bars with the height of 2*d around each rib of the g1.


Implementation details

Types related to the Precise Geometry library
are marked by names that start with prefix "gcalc_".

a. gcalc_dyn_list class

Here in the geometry library we often need to deal with
linked lists of records, mostly descriptors of various kinds
of points. There are points of shapes, points of 'slices'
in the Slicescan algorithm, and intersections.
So here is an auxiliary class to help with handling such
linked lists efficiently. In fact it works much like MEM_ROOT.
To reduce malloc calls gcalc_dyn_list allocs bigger blocks -
memory for several items at once.
In Item_* it's reasonable to not free all the alloced
memory after one val_smth() execution, so the count of
malloc calls will be even smaller.

Basic functionality:

class gcalc_dyn_list
{
   item *new_item();
   free_item(item);
   free_list();
   clear(keep_prealloc)
};


b. gcalc_heap class

In order to proceed with the Slicescan algorithm we have to
gather all the points of all the shapes. It's reasonable to
implement special structure for that, as this set of
points can technically be passed with the Slicescan several
times.
The main gcalc_heap functionality is used to fill it with
shape points:

class gcalc_heap
{
   int start_shape(shape_kind, shape_info);
   int add_point(x, y);
   int end_shape();

   void prepare();
};

so filling of the gcalc_heap looks like this:

---------------------------------------------
heap.start_shape(shape_polygon, geom_obj_number);

for (i=0; i<n_points; i++)
  heap.add_point(x, y);

heap.complete_shape();
---------------------------------------------

after all the points are collected in the heap
they have to be prepared for the Slicescan pass
(basically just be sorted by Y coordinate)
so: heap.prepare()

After that, the gcal_heap object can be used as a
parameter in further operations.

The structure to store points in the heap is a linked
list. It's faster to sort linked lists than anything else,
it's suitable to add points to the heap one-by-one, and
we handle points one-by-one in Slicescan algorithm,
so the linked list is natural here. Thus
gcalc_heap is gcalc_dyn_list's descendant.


c. gcalc_scan_iterator class.

This class contains Slicescan algorithm implementation.
Basic functionality is:

class gcalc_scan_iterator : public gcalc_dyn_list
{
   gcalc_scan_iterator(gcalc_heap *heap);
   bool eof_points();
   bool eof_trapezoids();
   int step(); /* Returns nonzero if error */ 
};

So this class is used like:

-----------------------------------------

gcalc_scan_iterator si(&heap);

while (si.eof_points())
{
  if (si.step())
    return error;
  /* Do something with the next slice */
}

-----------------------------------------

To operate with the slice gcalc_scan_iterator exposes
two slices, 'freshest' and previous.
point *get_slice0()
point *get_slice1();
Slices are linked lists of slice point descriptors.

The difference between 'eof_points' and 'eof_trapezoids'
is just if that 'previous' slice is available (so user
does one loop more with the 'eof_points' method).

Also user can get information about the point that
was the reason for this slice (whether it was ordinary shape
point, intersection of shapes, topmost or bottommost point of
the shape), distance between slices, etc.


The next two classes help handling single scan.
These classes are lightweight, they can be created/deleted often.

d. gcalc_point_iterator

This is to be used when we need to check every point of the slice.

Example:

-----------------------------------------

gcalc_scan_iterator si(&heap);

while (!si.eof_points())
{
  if (si.step())
    return error;
  
  for (gcalc_point_iterator pit(&si); !pit.eof(); ++pit)
  {
    /* Handle next point of the slice */
  }
}

-----------------------------------------


e. gcalc_trapezoid_iterator

This is to be used when we need to check trapezoids of the slice only.
Sometimes it's more effective than working with the points.

Example:

-----------------------------------------

gcalc_scan_iterator si(&heap);

while (!si.eof_points())
{
  if (si.step())
    return error;
  
  for (gcalc_trapezoid_iterator tit(&si); !tit.eof(); ++tit)
  {
    /* Handle next trapezoid of the slice */
  }
}

-----------------------------------------



DISTANCE() function doesn't require SLICESCAN to be implemented, and I plan to
implement a straightforward algorithm for it during this task (just picking
nodes and ribs to find closest + check for internal points).
But SLICESCAN can make DISTANCE work faster on very big objects. So we will
check if we need it in the future.

References
----------

For the definition of trapezoid, see http://en.wikipedia.org/wiki/Trapezoid
(we use the US English meaning).

Email thread "REVIEW: WL#1326: Exact GIS operations" starting with
[ mysql intranet ]/secure/mailarchive/mail.php?folder=4&mail=18086

Email thread "Review: WL#1326: Exact GIS operations" starting with
[ mysql intranet ] /secure/mailarchive/mail.php?folder=4&mail=30854

Correspondence with Antonin Guttman is a file attached to this worklog task.