# WL#1326: GIS: Precise spatial operations

Affects: Server-5.6
—
Status: Complete
—
Priority: Low

Currently all spatial relations over geometric objects are performed not over the exact object representations, but over their bounding rectangles. That's not compatible with the OpenGIS standard. Also we need to implement spatial operations over geometric objects (Intersection etc.). The goal of this task is to replace the current imprecise implementation of geometric relations and to implement geometric operations according to the OpenGIS standard. Don't forget to close BUG#4249 when it's done. The OpenGIS SQL specifications can be found here: http://portal.opengeospatial.org/files/?artifact_id=829 We also take into account the OGC 06-104r3 specifications.

To perform in the OpenGIS way the plan is to implement these methods for testing Spatial Relations between geometric objects: Equals(g1 Geometry, g2 Geometry) returns 1 if g1 and g2 are equal 0 otherwise Disjoint(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 is the empty set Touches(g1 Geometry, g2 Geometry) returns 1 if the only points in common between g1 and g2 lie in the union of the boundaries Within(g1 Geometry, g2 Geometry) returns 1 if g1 completely contained in g2 Overlaps(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 isn't empty and is different both from g1 and g2 Crosses(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 isn't empty and its dimension is less than the maximum dimension of g1 and g2 Intersects(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 is not empty (Intersects <=> !Disjoint) Contains(g1 Geometry, g2 Geometry) returns 1 if g2 is completely contained in g1 (Contains(g1, g2) <=> Within(g2, g1)) These methods are present in MySQL now, but work with bounding rectangles, not with the precise object's shape. So we left the existing functions as they are now, and use SQL/MM names for the precise ones. These are: ST_Equals(g1 Geometry, g2 Geometry) returns 1 if g1 and g2 are equal 0 otherwise ST_Disjoint(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 is the empty set ST_Touches(g1 Geometry, g2 Geometry) returns 1 if the only points in common between g1 and g2 lie in the union of the boundaries ST_Within(g1 Geometry, g2 Geometry) returns 1 if g1 completely contained in g2 ST_Overlaps(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 isn't empty and is different both from g1 and g2 ST_Crosses(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 isn't empty and its dimension is less than the maximum dimension of g1 and g2 ST_Intersects(g1 Geometry, g2 Geometry) returns 1 if the intersection of g1 and g2 is not empty (Intersects <=> !Disjoint) ST_Contains(g1 Geometry, g2 Geometry) returns 1 if g2 is completely contained in g1 (Contains(g1, g2) <=> Within(g2, g1)) SQL functions that implement spatial operations: ST_Intersection(g1 geometry, g2 geometry) returns a Geometry that is the set intersection of g1 and g2 ST_Difference(g1 geometry, g2 geometry) returns a Geometry that is the closure of the set difference of g1 and g2 ST_Union(g1 geometry, g2 geometry) returns a Geometry that is the set union of g1 and g2 ST_SymDifference(g1 geometry, g2 geometry) returns a Geometry that is the closure of the set symmetric difference of g1 and g2 (logical XOR of space) ST_Buffer(g1 geometry, d numeric) returns a Geometry defined by buffering a distance d around g1 where d is the distance units for the Spatial Reference of g1 Distance function ST_Distance(g1 geometry, g2 geometry) returns distance between g1 and g2 This task doesn't affect the way Spatial Indexes work. They still operate with the boundary rectangles, and objects that don't fit the precise condition will be filtered with the subsequent filter.

Description of the algorithms To implement the entire lot of functions, I don't see anything better than the methods based on SliceScan algorithm. The basic idea is that operand shapes are cut to simple slices, so they become easy to analyse. Slicescan algorithm: It's not supposed to count anything by itself, it rather offers a base for other 'counting' algorithms working upon it. The basic idea is to split the complex shapes into simpler ones that would be easy to analyse. Y axis ^ | a | . . b . | . . . . . b | . . . | . .* . | . . . | a. . . . | * . . * | . * . . a | . . | . . | b . | |-------------------------------------------> X axis Let's use the example of two triangle shapes 'a' and 'b' to show how the 'slicescan' works. First we sort all points of the shapes by Y coordinate, then scan these points one-by-one in that Y order, making strip slices on each step. Note that each scan contains only trapezoids or triangles - shapes that are pretty easy to handle. Y axis ^ | a | . . b . | . . . . . b | . . . | . .* . | . . . | a. . . . | * . . * | . * . . a1 | . . | . . |________ b1._______________________________ S1 | |-------------------------------------------> X axis First slice S1 contains only b1 point Y axis ^ | a | . . b . | . . . . . b | . . . | . .* . | . . . | a. . . . | * . . * |___________ ._____ * . . a1________________ S2 | b21 .b22 | . . |________ b1._______________________________ S1 | |-------------------------------------------> X axis Second slice has three points (b21, b22, a1) We also get (b1, b21, b22) triangle between S1 and S2 to analyse. Y axis ^ | a | . . b . | . . . . . b | . . . | . .* . | . . . |_________ a3 _b31___a31b32_________________ S3 | I2. . I3 |___________ ._____ I1. . a1________________ S2 | b21 .b22 | . . |________ b1._______________________________ S1 | |-------------------------------------------> X axis Third slice is done with a3 point, but here we can notice that order of points was changed. a1 moved in the a3 direction, and got on the left of the 'b' points. Also a31 is on the left from b32 now. That means that there were intersections between slices. As we don't want any intersection inside the slice, we leave S3 slice to be handled later. In this case we find all intersection points between slices (it's relatively easy as we only have straight lines inside the slice), sort these intersection points by Y axis and then do slices with these additional intersection points. Here we have 3 additional 'intersection' slices with I1, I2 and I3 So we can be sure that we only have simple shapes (trapezoid) inside the slice. _____a21______b21________a22________b22______________ S2 / | | \ / | | \ / | | \ _a11__________b11________a12______________b12________ S1 Now we can implement spatial relations upon the slicescan: 2.b. Implementation of required operations Intersects(g1 Geometry, g2 Geometry) Check points of the slice from the left to the right counting how many g1 points we met. If we get g2 point and the g1-points count is noteven, then this g2 point is inside the g1 so intersection is notempty, so return TRUE. similarly are implemented: Disjoint(g1 Geometry, g2 Geometry) Within(g1 Geometry, g2 Geometry) Overlaps(g1 Geometry, g2 Geometry) Contains(g1 Geometry, g2 Geometry) Equals(g1 Geometry, g2 Geometry) Here we can count the square of nonintersecting area. If it's small enough we can say shapes are equal. Unfortunately we can't implement Equals strictly enough working with DOUBLEs. Touches(g1 Geometry, g2 Geometry) We can look for trapezoids of 'empty space' - not lying inside g1 or g2. If we found such a trapezoid having zero(small enough) square we should decide that the geometries touch and then check if g1 and g2 intersect (as Touches should return 0 in the case of intersection). Again I should say that it's not possible to implement Touches strictly with DOUBLEs. Relate() Here we just count patternMatrix value on each trapezoid until we decide that the condition was met. In fact most of the predicates above are just Relate() with proper parameters, so maybe it's better to implement those as Relate calls. Though they will work a bit slower that way and optimization (spatial key usage) looks more difficult in this case. SQL functions that implement spatial operations: Intersection(g1 geometry, g2 geometry) Difference(g1 geometry, g2 geometry) Union(g1 geometry, g2 geometry) SymDifference(g1 geometry, g2 geometry) Here we collect new shapes from slice's trapezoids that satisfy the operation's condition (Intersection, union, etc...). Buffer(g1 geometry, d numeric) Here we have to find the Union of these shapes: 1. The object itself 2. Circles with radiuses of the d around each node of the g1 3. Bars with the height of 2*d around each rib of the g1. Implementation details Types related to the Precise Geometry library are marked by names that start with prefix "gcalc_". a. gcalc_dyn_list class Here in the geometry library we often need to deal with linked lists of records, mostly descriptors of various kinds of points. There are points of shapes, points of 'slices' in the Slicescan algorithm, and intersections. So here is an auxiliary class to help with handling such linked lists efficiently. In fact it works much like MEM_ROOT. To reduce malloc calls gcalc_dyn_list allocs bigger blocks - memory for several items at once. In Item_* it's reasonable to not free all the alloced memory after one val_smth() execution, so the count of malloc calls will be even smaller. Basic functionality: class gcalc_dyn_list { item *new_item(); free_item(item); free_list(); clear(keep_prealloc) }; b. gcalc_heap class In order to proceed with the Slicescan algorithm we have to gather all the points of all the shapes. It's reasonable to implement special structure for that, as this set of points can technically be passed with the Slicescan several times. The main gcalc_heap functionality is used to fill it with shape points: class gcalc_heap { int start_shape(shape_kind, shape_info); int add_point(x, y); int end_shape(); void prepare(); }; so filling of the gcalc_heap looks like this: --------------------------------------------- heap.start_shape(shape_polygon, geom_obj_number); for (i=0; i<n_points; i++) heap.add_point(x, y); heap.complete_shape(); --------------------------------------------- after all the points are collected in the heap they have to be prepared for the Slicescan pass (basically just be sorted by Y coordinate) so: heap.prepare() After that, the gcal_heap object can be used as a parameter in further operations. The structure to store points in the heap is a linked list. It's faster to sort linked lists than anything else, it's suitable to add points to the heap one-by-one, and we handle points one-by-one in Slicescan algorithm, so the linked list is natural here. Thus gcalc_heap is gcalc_dyn_list's descendant. c. gcalc_scan_iterator class. This class contains Slicescan algorithm implementation. Basic functionality is: class gcalc_scan_iterator : public gcalc_dyn_list { gcalc_scan_iterator(gcalc_heap *heap); bool eof_points(); bool eof_trapezoids(); int step(); /* Returns nonzero if error */ }; So this class is used like: ----------------------------------------- gcalc_scan_iterator si(&heap); while (si.eof_points()) { if (si.step()) return error; /* Do something with the next slice */ } ----------------------------------------- To operate with the slice gcalc_scan_iterator exposes two slices, 'freshest' and previous. point *get_slice0() point *get_slice1(); Slices are linked lists of slice point descriptors. The difference between 'eof_points' and 'eof_trapezoids' is just if that 'previous' slice is available (so user does one loop more with the 'eof_points' method). Also user can get information about the point that was the reason for this slice (whether it was ordinary shape point, intersection of shapes, topmost or bottommost point of the shape), distance between slices, etc. The next two classes help handling single scan. These classes are lightweight, they can be created/deleted often. d. gcalc_point_iterator This is to be used when we need to check every point of the slice. Example: ----------------------------------------- gcalc_scan_iterator si(&heap); while (!si.eof_points()) { if (si.step()) return error; for (gcalc_point_iterator pit(&si); !pit.eof(); ++pit) { /* Handle next point of the slice */ } } ----------------------------------------- e. gcalc_trapezoid_iterator This is to be used when we need to check trapezoids of the slice only. Sometimes it's more effective than working with the points. Example: ----------------------------------------- gcalc_scan_iterator si(&heap); while (!si.eof_points()) { if (si.step()) return error; for (gcalc_trapezoid_iterator tit(&si); !tit.eof(); ++tit) { /* Handle next trapezoid of the slice */ } } ----------------------------------------- DISTANCE() function doesn't require SLICESCAN to be implemented, and I plan to implement a straightforward algorithm for it during this task (just picking nodes and ribs to find closest + check for internal points). But SLICESCAN can make DISTANCE work faster on very big objects. So we will check if we need it in the future. References ---------- For the definition of trapezoid, see http://en.wikipedia.org/wiki/Trapezoid (we use the US English meaning). Email thread "REVIEW: WL#1326: Exact GIS operations" starting with [ mysql intranet ]/secure/mailarchive/mail.php?folder=4&mail=18086 Email thread "Review: WL#1326: Exact GIS operations" starting with [ mysql intranet ] /secure/mailarchive/mail.php?folder=4&mail=30854 Correspondence with Antonin Guttman is a file attached to this worklog task.

Copyright (c) 2000, 2016, Oracle Corporation and/or its affiliates. All rights reserved.