WL#7444: GeoJson support for GIS

Status: Complete

This WL adds functions for parsing and generating GeoJSON [1,2] documents into GIS data types: ST_AsGeoJSON and ST_GeomFromGeoJSON.

GeoJSON is an open standard for encoding geometric/geographical features. GeoJSON supports the same geometric/geographic datatypes that are already supported by MySQL. GeoJSON also includes the possibility to declare which coordinate reference system (CRS) is used (WKT and WKB lack this).

Implementation limitations: Only 2D geometries are supported. The Feature and FeatureCollection objects are not supported, except that geometry objects are extracted from them. The CRS support is limited to values that identify an SRID.

[1] http://en.wikipedia.org/wiki/GeoJSON [2] http://geojson.org/geojson-spec.html

User Documentation

NOTE: If not stated otherwise, 2D and 3D refers to "2 coordinate
      dimension" and "3 coordinate dimension"; number of measurements
      or axes needed to describe a position in a coordinate system.

Functional requirements:

F-1:  The output function MUST correctly convert all valid geometries
      into GeoJSON documents according to the specification.

F-2:  The parsing function MUST correctly convert all valid 2D GeoJSON
      geometry objects into MySQL geometry objects.

F-3:  The parsing function MAY raise an exception condition if the
      GeoJSON document contains a geometry object that is not 2D. If
      it accepts such objects, it MUST strip the extra coordinates
      according to the <options> parameter and MUST raise a completion
      condition.

F-4:  The parsing function SHOULD extract 2D geometries from GeoJSON
      feature objects. In that case, it MUST extract 2D geometries
      from all valid features.

F-5:  The parsing function MAY extract geometries from GeoJSON feature
      collection objects. In that case, it MUST extract all 2D
      geometries from ALL valid feature collections.

F-6:  The functions MUST NOT return NULL unless given a NULL parameter
      value or a GeoJSON NULL object. A GeoJSON NULL object is defined
      by assigning JSON 'null' value to the 'geometry' member of a
      feature object.

F-7:  The functions MUST return NULL if one or more parameters are
      NULL.

F-8:  The parsing function SHOULD NOT allow non-string values as the
      GeoJSON parameter. It SHOULD raise exception conditions in such
      cases.

F-9:  The output function SHOULD NOT allow non-geometry values as the
      geometry parameter. It SHOULD raise exception conditions in such
      cases.

F-10: Empty strings SHOULD NOT be valid GeoJSON strings. Empty strings
      MUST be either invalid GeoJSON strings or equivalent to NULL.

F-11: The parsing function SHOULD NOT allow invalid GeoJSON
      documents. In case of invalid GeoJSON documents, the function
      SHOULD raise an error condition.

F-12: The parsing function SHOULD set SRID 4326 if the GeoJSON
      document doesn't specify a CRS.

F-13: The parsing function SHOULD understand EPSG and OGC CRS URNs [1]
      and map them to the correct SRID. Especially,
      "urn:ogc:def:crs:OGC:1.3:CRS84", "urn:ogc:def:crs:EPSG::4326" and
      "EPSG:4326" SHOULD be recognized as SRID 4326.

F-14: The parsing function SHOULD raise an exception condition if it
      can't understand the CRS, unless a SRID parameter is provided,
      in which case it MUST NOT raise an exception condition.

F-15: The user SHOULD be allowed to provide an optional SRID parameter
      to the parsing function. Such a parameter MUST be a positive 32
      bit unsigned integer or NULL and will override the CRS specified
      in the GeoJSON document.

F-16: The output function SHOULD allow the user to specify the
      maximum number of decimal digits for coordinates as a
      parameter. The parameter MUST be a positive integer or NULL. An
      exception condition MUST be raised if the number is negative.

F-17: The output function MUST NOT add a CRS URN to GeoJSON geometry
      objects by default.

F-18: The output function MUST NOT add a bounding box to GeoJSON
      geometry objects by default.

F-19: The output function SHOULD allow the user to specify an
      options parameter that decides whether a bounding box or a long
      or short format CRS URN is added to the GeoJSON object. The
      option parameter SHOULD be a positive integer value or NULL. An
      exception condition MUST be raised if the number is negative.

F-20: The output function options parameter SHOULD have an option
      for adding a bounding box.

F-21: The output function options parameter MAY have an option for
      adding a CRS URN.

F-22: The output function options parameter MAY have an option for
      selecting short or long format CRS URNs.

F-23: The output function SHOULD NOT add a CRS URN to the GeoJSON
      document if the SRID is 0, even if the user has asked for a CRS
      URN using the options parameter.

F-24: The parsing function SHOULD allow the user to specify an optional 
      'options' parameter that decides how to treat geometries of higher 
      coordinate dimensions (>2D).
      
      Option one: Reject such GeoJSON documents and raise an error condition. 
      This is the default option, effective if the 'options' parameter isn't 
      specified.

      Option two: Accept them and stripe off the coordinates for higher  
      coordinate dimensions. When higher coordinate dimensions (e.g. 3D)
      are supported in the future, silently interpret the geometry data as of
      higher coordinate dimensions (e.g. 3D).

      Option three: Accept them and stripe off the coordinates for higher 
      coordinate dimensions. When higher coordinate dimensions (e.g. 3D)
      are supported in the future, raise an error condition to inform users of
      change of behavior and don't accept such data at that time.

      Option four: Accept such GeoJSON documents and stripe off the coordinates
      for higher coordinate dimensions. When higher coordinate dimensions
      (e.g. 3D) are supported in the future, continue to stripe off the
      extra coordinates and continue to interpret the geometry data as 2D.
      For users to get support for higher coordinate dimensions, they must
      remove this option from their queries.

F-25. The parsing/output function SHOULD support empty geometry collections.
      The GeoJSON representation for an empty collection is  
      {"type":"GeometryCollection", "geometries":[]}

F-26. Given a nested geometry collection as input, the output function MUST
      return a nested GeoJSON GeometryCollection or raise error condition
      "ER_NOT_SUPPORTED_YET".

Non-functional requirements:

NF-1: The functions SHOULD stop processing long input parameters after
      the required data is read.

NF-2: The functions SHOULD stop processing immediately after an error
      (e.g., an invalid parameter) is discovered.

NF-3: The functions SHOULD use the rapidjson JSON parser [2].

[1] http://portal.opengeospatial.org/files/?artifact_id=8814 [2] https://github.com/pah/rapidjson

Changes to the interface specification:

I-1: No new files.

I-2: New syntax: Two new functions:

         <geometry> = ST_GEOMFROMGEOJSON(<string>[, <options>[, <srid>]])
         <string> = ST_ASGEOJSON(<geometry>[, <maxdecimaldigits>[, <options>]])

I-3: No new commands.

I-4: No new tools.

I-5: No impact on existing functionality.
NOTE: If not stated otherwise, 2D and 3D refers to "2 coordinate
      dimension" and "3 coordinate dimension"; number of measurements
      or axes needed to describe a position in a coordinate system.

Overall design
==============

We only add functions for converting geometry objects to and from
GeoJSON. To do other GeoJSON construction and parsing, e.g.,
extracting feature properties, the user can use the general JSON
functions.


Two new Item classes:

 - Item_func_asgeojson : public Item_str_ascii_func
 - Item_func_geomfromgeojson: public Item_geometry_func

Each has its own Crate_func factory class.


Detailed design
===============


Item_func_asgeojson
-------------------

This function will generate simple GeoJSON geometry objects, e.g.:

    { "type": "LineString", "coordinates": [ [0.0, 0.0], [1.0, 1.0] ] }

If given a nested geometry collection as input, the function should produce
a nested GeoJSON GeometryCollection, e.g.:

    { "type": "GeometryCollection", "geometries":
      { "type": "GeometryCollection", "geometries":
        { "type": "Point", "coordinates": [10, 12] },
        ...
      }
    ...
    }

The maxdecimaldigits parameter, if one is provided, limits the number
of decimal digits for coordinates. If not provided, it defaults to INT_MAX32 (2147483647) which is also the upper limit for this parameter. The lower limit is set to 0. If a value is provided outside this range, an error is returned to the user. The output is rounded, and behaves like the SQL function ROUND(X, D).

The options parameter is a bitmask with the following flags:

    0  No options (default values).

    1  Add a bounding box to the output.

    2  Add a short CRS URN to the output. The default format is a
       short format ("EPSG:<srid>").

    4  Add a long format CRS URN ("urn:ogc:def:crs:EPSG::<srid>"). This
       implies 2. This means that, e.g., bitmask 5 and 7 mean the
       same: add a bounding box and a long format CRS URN.

If the resulting GeoJSON string is longer than allowed_packet_size, NULL is returned to the user and a warning is given (ER_WARN_ALLOWED_PACKET_OVERFLOWED).

Item_func_geomfromgeojson
-------------------------

This function will parse simple GeoJSON geometry objects, e.g.:

    { "type": "Point", "coordinates": [0.0, 0.0] }

Features contain only one geometry object in the geometry parameter,
so they can also be parsed to create geometry object:

    { "type": "Feature",
      "geometry": { "type": "Point",
                    "coordinates": [0.0, 0.0]
                  }
      ...
    }

Feature collections contain multiple features, each which may have a
geometry object. The parsing function will create a geometry
collection with the geometry objects for each feature, in the order
they appear in the GeoJSON document. Example:

    { "type": "FeatureCollection",
      "features" : [ { "type": "Feature",
                       "geometry": { "type": "Point",
                                     "coordinates": [0.0, 0.0]
                                   }
                       ...
                     },
                     { "type": "Feature",
                       "geometry": { "type": "Point",
                                     "coordinates": [1.0, 1.0]
                                   }
                       ...
                     }
                     ...
                   ]
    }

The above example would be parsed into a geometry collection
containing two points, (0,0) and (1,1), in that order.

GeoJSON geometry, feature and feature collection objects may have a
"crs" property. The parsing function will parse named CRS URNs in the
"urn:ogc:def:crs:EPSG::<srid>" and "EPSG:<srid>" namespaces, but not
CRSs given as link objects. Also, "urn:ogc:def:crs:OGC:1.3:CRS84" is
recognized as SRID 4326. If an object has a CRS that is not
understood, an exception condition is raised telling the user that the
CRS is not understood and recommending using the optional SRID
parameter.

The parsing is case sensitive when it comes to the member "type" in the
GeoJSON input ("Point", "LineString" etc). See the following snippet from
the specification:

  The value of the type member must be one of: "Point", "MultiPoint",
  "LineString", "MultiLineString", "Polygon", "MultiPolygon", 
  "GeometryCollection", "Feature", or "FeatureCollection". The case of
  the type member values must be as shown here.

The rest of the parsing is case insensitive since the GeoJSON specification
doesn't mention anything (member names, CRS object and such).

All coordinates are stored internally with longitude as x and latitude
as y.

The 'options' parameter has four valid integer values (1, 2, 3 and 4) which
describes how GeoJSON documents that contains geometries with coordinate
dimension higher than 2D should be handled. Values other than these four
are wrong and this function should raise an error condition if any other than
the four values are supplied as a parameter. The meaning of the four values
are as below:

    1 Reject such GeoJSON documents with geometries and raise an error
      condition. This is the default behavior if the parameter isn't specified.
    
    2 Accept them and stripe off the coordinates for higher coordinate
      dimensions. When higher coordinate dimensions (e.g. 3D) are supported
      in the future, silently interpret the geometry data as higher
      coordinate dimensions (e.g. 3D).
      
    3 Accept them and stripe off the coordinates for higher coordinate
      dimensions. When higher coordinate dimensions (e.g. 3D) are supported
      in the future, raise an error condition to inform user of change of
      behavior and don't accept such data.
      
    4 Accept such GeoJSON documents and stripe off the coordinates for
      higher coordinate dimensions. When higher coordinate dimensions
      (e.g. 3D) are supported in the future, continue to stripe off the
      extra coordinates and continue to interpret the geometry data as 2D.

Use the GEOM_DIM global variable as the currently supported coordinate
dimensions. For option 3 it means given a geometry g described as GeoJSON,
when g.dim > GEOM_DIM, g is accepted and parsed and its z coordinate stripped
off; when g.dim <= GEOM_DIM, raise an error condition that there is a change
of behavior and the formally stripped z coordinate is now part of g's data.