# WL#9347: Ellipsoidal ST_Distance for point and multipoint

ST_Distance currently only supports computations in Cartesian spatial reference systems (SRSs). This WL extends ST_Distance to detect that its parameters are in a geographic (ellipsoidal) SRS and to compute the distance on the ellipsoid. The WL will implement

- ST_Distance(point, point)
- ST_Distance(point, multipoint)
- ST_Distance(multipoint, point)

Distance between multipoint and multipoint is not yet supported by Boost.Geometry, and will not be implemented by this WL.

Projected SRSs and SRID 0 will still return the same result as before. If the SRID refers to a geographic SRS, the result will be different. If the SRS is geographic and the parameters are not any of the above combinations, an error will be returned until ellipsoidal distance for these parameter type combinations have been implemented.

Ellipsoidal distance between points is implemented in Boost 1.62.0 (BUG#82834).

Most of the work in this WL is adding the framework for geographic geometries, not the implementation of ST_Distance itself.

**User Documentation**

- F-1
- ST_Distance MUST return NULL if at least one of its parameters is NULL or an empty geometry.
- F-2
- ST_Distance MUST NOT return NULL if none of its parameters are NULL or an empty geometry.
- F-3
- If one or more geometry arguments are not syntactically well-formed geometries, ST_Distance MUST raise ER_GIS_INVALID_DATA during function evaluation.
- F-4
- If the geometry arguments are not in the same SRID, ST_Distance MUST raise ER_GIS_DIFFERENT_SRIDS.
- F-5
- If the geometry arguments are in an undefined SRS, ST_Distance MUST raise ER_SRS_NOT_FOUND during function evaluation.
- F-6
- If one or more geometry arguments are geometrically invalid, ST_Distance MUST either return a distance result or raise an error. If a distance is returned, the value is undefined, i.e., it may be any number.
- F-7
- If both parameters are valid geometries in a Cartesian SRS, ST_Distance MUST return the shortest Cartesian distance between the two geometries.
- F-8
- If both parameters are valid geometries in a geographic SRS, and one parameter is a point and the other parameter is a point or a multipoint, ST_Distance MUST return the shortest geodetic distance between the two geometries in that SRS. The result MUST be in meters.
- F-9
- If both parameters are valid geometries in a geographic SRS, and one of the parameters is not a point or a multipoint, or if both are multipoints, ST_Distance MUST raise ER_NOT_IMPLEMENTED_FOR_GEOGRAPHIC_SRS during function evaluation.
- F-10
- If any parameter is a geometry in a geographic SRS and a longitude value is not in the range (-180,180] (in degrees -- other limits in other units), ST_Distance MUST raise ER_LONGITUDE_OUT_OF_RANGE.(*)
- F-11
- If any parameter is a geometry in a geographic SRS and a latitude value is not in the range [-90,90] (in degrees -- other limits in other units), ST_Distance MUST raise ER_LATITUDE_OUT_OF_RANGE.(*)

(*) The exact limits will deviate slightly because of floating point arithmetics.

- I-1
- No new files.
- I-2
- No new syntax.
- I-3
- No new commands.
- I-4
- No new tools.
- I-5
- The semantics of interface SQL01 are changed: If the geometries are in a geographic SRS, and the geometries are point and point, or point and multipoint (in any permutation), ST_Distance will return the geodetic distance between the geometries. If called on other geometries in a geographic SRS, an error will be raised. Calculations in projected SRSs will remain as they are today.
- I-6
- Interface ERR01 is extended with 3 new error messages:

ER_NOT_IMPLEMENTED_FOR_GEOGRAPHIC_SRS, SQLSTATE 22S00, "%.192s(%.40s, %.40s) has not been implemented for geographic spatial reference systems.".

ER_LONGITUDE_OUT_OF_RANGE, SQLSTATE 22S02, "Longitude %f is out of range in function %.192s. It must be within (%f, %f]."

ER_LATITUDE_OUT_OF_RANGE, SQLSTATE 22S03, "Latitude %f is out of range in function %.192s. It must be within [-%f, %f]."

## Contents |

## Overview

This WL implements a lot of infrastructure for geography support:

- A new geometry type hierarchy that supports Cartesian, geographic and spherical coordinate systems. This type hierarchy will eventually replace the Gis_* hierarchy in spatial.h.
- A visitor pattern for traversing geometries.
- Geometry factories for producing geometry objects in a particular coordinate system.
- A new WKB parser.
- New wrapper functions that encapsulate Boost.Geometry operations in order to do error conversion. These wrapper functions will be the interface to the rest of the server code.
- A functor pattern for parameter type dispatching. These functors are only used by the wrapper functions.

## Preparations

In order to prepare for the implementation of geographic distance, the ST_Distance and ST_Distance_Sphere functions are split into two different Item subclasses: Item_func_distance and Item_func_distance_sphere.

This WL is the first step in moving GIS exception handling out of the Item hierarchy and into non-throwing functions that can be used by the items. In order to achieve this, the functions handle_gis_exception and handle_std_exception are moved into a separate file so that they can be used outside of the Item hierarchy.

## Geographic type hierarchy

Boost.Geometry requires that different types are used for Cartesian and geographic geometries. Because of this, we will construct a new type hierarchy, modelled closely after the OpenGIS type hierarcy.

The new hierarchy will look like this:

- Geometry
- Point
- Cartesian_point
- Geographic_point

- Curve
- Linestring
- Cartesian_linestring
- Geographic_linestring

- Linestring
- ... etc. ...

- Point

All classes, except the leaf classes, are abstract. The abstract classes are the types defined by OpenGIS. All GIS code will use the abstract, coordinate system agnostic classes until the geometries are passed to Boost.Geoemtry, which needs to know the exact type.

A Point consists of one double for each coordinate axis. A Linestring is a vector of Points, etc. There are no pointers back to a WKB string, like in the old hierarchy. After a WKB string has been parsed into a geometry, the string is no longer needed or maintained.

Unlike the old type hierarchy, the new types don't know which SRID they are in, so this has to be passed as an extra parameter to GIS operations. The primary reason for this decision is that Point is used to build all other types. If Point were to have an SRID, it would mean 4 bytes extra storage for each point in a geometry.

### Visitors

The geometry type hierarchy implements the [http://c2.com/cgi/wiki?HierarchicalVisitorPattern hierarchical visitor pattern]. A gis::Geometry_visitor has visit functions called when entering and leaving a compound geometry, and one called between the call to each element's visitation. E.g., when visiting a linestring consisting of three points:

visit_enter(Linestring ls) visit(Point p1) visit(Linestring ls) visit(Point p2 visit(Linestring ls) visit(Point p3) visit_leave(Linestring ls)

This visitor pattern can be used to generate WKT and WKB and to perform other types of operations, e.g., changing the polygon ring order or swapping X and Y coordinates.

### Future extensions

There are some known problems with the existing type hierarchy, so in the long term, we want to move to the new hierarchy also for Cartesian types. Therefore, the hierarchy has to be extendable wrt. coordinate systems. We also want to add 3d support, so it has to be extendable wrt. the number of coordinate dimensions.

This hierarchy can be extended with new coordinate systems by adding new leaf classes. E.g., Spherical_point is an obvious candidate for use in ST_Distance_Sphere.

For 3d to be introduced, a larger change is needed. Since both 2d and 3d points are points, they need a common supertype. This can be solved by creating new (abstract) subclasses of Point: Point_xy, Point_xym, Point_xyz, Point_xyzm. Cartesian_point would be renamed Cartesian_point_xy, etc. Similar for all other types. The only instantiable classes will be fully specified in number of dimensions and coordinate system.

### Memory allocation

The new hierarchy makes use of standard containers. Memory for these are allocated on the heap using Malloc_allocator(key_memory_Geometry_objects_data) so that we can track the memory usage.

## WKB parser

The existing WKB parser is tightly tied into the existing geometry type hierarchy. It is also misused in ways that cause UBSan warnings that are not trivial to fix.

In order to parse geometries into geographic types, a new WKB parser is created. The parser takes all geometry types as template parameters, so that the parser itself is completely coordinate system agnostic.

## Boost.Geometry wrapper functions

In order to encapsulate the Boost.Geometry code, it's static typing and exception throwing, a new wrapper method is created for distance calculations:

gis::distance(srs, g1, g2)

This function handles all dispatching of geometry types and all exception handling and conversion to MySQL errors. The geometries can be both Cartesian and geographic, as long as the coordinate system matches the SRS. This presents a uniform interface to the calling code, regardless of SRS.

This pattern will be used on other functions, one by one, as they start supporting geographic calculations.

### Functors

The wrapper function deals with the exception handling. The type dispatching is handled by a functor, which may throw exceptions.

This functor can also be used to implement other GIS operations. While not relevant for Distance, the set operations are sometimes implemented using each other, e.g., symmetric difference by using the difference operation. In this case, gis::symdifference would use the Difference functor.

Since the exceptions are converted to errors by the wrapper function, ST_Symdifference will be reported as the failing function. This solves an error reporting problem we have in the current codebase.

The rule is that the functors are only used inside the wrapper functions. All other server code uses the wrapper interface.

## Files

This WL introduces a number of new source code files:

- gis/geometries.h
- The geometry class hierarchy.
- gis/geometries_cs.h
- The coordinate system specific classes of the geometry class hierarchy. Should only be necessary in internal GIS implementation, not in Item subclasses.
- gis/geometries.cc
- Implementation of the classes in geometries.h and geometries_cs.h.
- gis/geometries_traits.h
- Type traits to link the coordinate system specific geometry classes with BG. Should only be used by GIS function implementation, not by Item subclasses.
- gis/wkb_parser.h / .cc
- WKB parser for geometries.
- gis/geometry_visitor.h
- Base visitor class.
- gis/ring_flip_visitor.h
- Visitor to flip polygon ring direction. Currently only flips Cartesian rings.
- gis/ring_direction.h / .cc
- Interface for detecting ring direction. Currently, it only implements direction detection in Cartesian coordinate systems.
- gis/functor.h
- Base functor class.
- gis/distance_functor.h
- Distance functor interface for use in GIS functions. Definition in gis/distance.cc.
- gis/distance.h
- Distance function interface for use in Item subclasses.
- gis/distance.cc
- Implementation of the gis/distance_functor.h and gis/distance.h interfaces.
- gis/srid.h
- A unified gis::srid_t to replace all previous typedefs.
- sql_exception_handler.h / .cc
- handle_gis_exception() and handle_std_exception(), moved from item_geofunc_internal.h and .cc, and item_func.h and .cc.