WL#7444: GeoJson support for GIS
This WL adds functions for parsing and generating GeoJSON [1,2] documents into GIS data types: ST_AsGeoJSON and ST_GeomFromGeoJSON.
GeoJSON is an open standard for encoding geometric/geographical features. GeoJSON supports the same geometric/geographic datatypes that are already supported by MySQL. GeoJSON also includes the possibility to declare which coordinate reference system (CRS) is used (WKT and WKB lack this).
Implementation limitations: Only 2D geometries are supported. The Feature and FeatureCollection objects are not supported, except that geometry objects are extracted from them. The CRS support is limited to values that identify an SRID.
[1] http://en.wikipedia.org/wiki/GeoJSON [2] http://geojson.org/geojson-spec.html
User Documentation
NOTE: If not stated otherwise, 2D and 3D refers to "2 coordinate dimension" and "3 coordinate dimension"; number of measurements or axes needed to describe a position in a coordinate system. Functional requirements: F-1: The output function MUST correctly convert all valid geometries into GeoJSON documents according to the specification. F-2: The parsing function MUST correctly convert all valid 2D GeoJSON geometry objects into MySQL geometry objects. F-3: The parsing function MAY raise an exception condition if the GeoJSON document contains a geometry object that is not 2D. If it accepts such objects, it MUST strip the extra coordinates according to the <options> parameter and MUST raise a completion condition. F-4: The parsing function SHOULD extract 2D geometries from GeoJSON feature objects. In that case, it MUST extract 2D geometries from all valid features. F-5: The parsing function MAY extract geometries from GeoJSON feature collection objects. In that case, it MUST extract all 2D geometries from ALL valid feature collections. F-6: The functions MUST NOT return NULL unless given a NULL parameter value or a GeoJSON NULL object. A GeoJSON NULL object is defined by assigning JSON 'null' value to the 'geometry' member of a feature object. F-7: The functions MUST return NULL if one or more parameters are NULL. F-8: The parsing function SHOULD NOT allow non-string values as the GeoJSON parameter. It SHOULD raise exception conditions in such cases. F-9: The output function SHOULD NOT allow non-geometry values as the geometry parameter. It SHOULD raise exception conditions in such cases. F-10: Empty strings SHOULD NOT be valid GeoJSON strings. Empty strings MUST be either invalid GeoJSON strings or equivalent to NULL. F-11: The parsing function SHOULD NOT allow invalid GeoJSON documents. In case of invalid GeoJSON documents, the function SHOULD raise an error condition. F-12: The parsing function SHOULD set SRID 4326 if the GeoJSON document doesn't specify a CRS. F-13: The parsing function SHOULD understand EPSG and OGC CRS URNs [1] and map them to the correct SRID. Especially, "urn:ogc:def:crs:OGC:1.3:CRS84", "urn:ogc:def:crs:EPSG::4326" and "EPSG:4326" SHOULD be recognized as SRID 4326. F-14: The parsing function SHOULD raise an exception condition if it can't understand the CRS, unless a SRID parameter is provided, in which case it MUST NOT raise an exception condition. F-15: The user SHOULD be allowed to provide an optional SRID parameter to the parsing function. Such a parameter MUST be a positive 32 bit unsigned integer or NULL and will override the CRS specified in the GeoJSON document. F-16: The output function SHOULD allow the user to specify the maximum number of decimal digits for coordinates as a parameter. The parameter MUST be a positive integer or NULL. An exception condition MUST be raised if the number is negative. F-17: The output function MUST NOT add a CRS URN to GeoJSON geometry objects by default. F-18: The output function MUST NOT add a bounding box to GeoJSON geometry objects by default. F-19: The output function SHOULD allow the user to specify an options parameter that decides whether a bounding box or a long or short format CRS URN is added to the GeoJSON object. The option parameter SHOULD be a positive integer value or NULL. An exception condition MUST be raised if the number is negative. F-20: The output function options parameter SHOULD have an option for adding a bounding box. F-21: The output function options parameter MAY have an option for adding a CRS URN. F-22: The output function options parameter MAY have an option for selecting short or long format CRS URNs. F-23: The output function SHOULD NOT add a CRS URN to the GeoJSON document if the SRID is 0, even if the user has asked for a CRS URN using the options parameter. F-24: The parsing function SHOULD allow the user to specify an optional 'options' parameter that decides how to treat geometries of higher coordinate dimensions (>2D). Option one: Reject such GeoJSON documents and raise an error condition. This is the default option, effective if the 'options' parameter isn't specified. Option two: Accept them and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, silently interpret the geometry data as of higher coordinate dimensions (e.g. 3D). Option three: Accept them and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, raise an error condition to inform users of change of behavior and don't accept such data at that time. Option four: Accept such GeoJSON documents and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, continue to stripe off the extra coordinates and continue to interpret the geometry data as 2D. For users to get support for higher coordinate dimensions, they must remove this option from their queries. F-25. The parsing/output function SHOULD support empty geometry collections. The GeoJSON representation for an empty collection is {"type":"GeometryCollection", "geometries":[]} F-26. Given a nested geometry collection as input, the output function MUST return a nested GeoJSON GeometryCollection or raise error condition "ER_NOT_SUPPORTED_YET". Non-functional requirements: NF-1: The functions SHOULD stop processing long input parameters after the required data is read. NF-2: The functions SHOULD stop processing immediately after an error (e.g., an invalid parameter) is discovered. NF-3: The functions SHOULD use the rapidjson JSON parser [2].
[1] http://portal.opengeospatial.org/files/?artifact_id=8814 [2] https://github.com/pah/rapidjson
Changes to the interface specification: I-1: No new files. I-2: New syntax: Two new functions: <geometry> = ST_GEOMFROMGEOJSON(<string>[, <options>[, <srid>]]) <string> = ST_ASGEOJSON(<geometry>[, <maxdecimaldigits>[, <options>]]) I-3: No new commands. I-4: No new tools. I-5: No impact on existing functionality.
NOTE: If not stated otherwise, 2D and 3D refers to "2 coordinate dimension" and "3 coordinate dimension"; number of measurements or axes needed to describe a position in a coordinate system. Overall design ============== We only add functions for converting geometry objects to and from GeoJSON. To do other GeoJSON construction and parsing, e.g., extracting feature properties, the user can use the general JSON functions. Two new Item classes: - Item_func_asgeojson : public Item_str_ascii_func - Item_func_geomfromgeojson: public Item_geometry_func Each has its own Crate_func factory class. Detailed design =============== Item_func_asgeojson ------------------- This function will generate simple GeoJSON geometry objects, e.g.: { "type": "LineString", "coordinates": [ [0.0, 0.0], [1.0, 1.0] ] } If given a nested geometry collection as input, the function should produce a nested GeoJSON GeometryCollection, e.g.: { "type": "GeometryCollection", "geometries": { "type": "GeometryCollection", "geometries": { "type": "Point", "coordinates": [10, 12] }, ... } ... } The maxdecimaldigits parameter, if one is provided, limits the number of decimal digits for coordinates. If not provided, it defaults to INT_MAX32 (2147483647) which is also the upper limit for this parameter. The lower limit is set to 0. If a value is provided outside this range, an error is returned to the user. The output is rounded, and behaves like the SQL function ROUND(X, D). The options parameter is a bitmask with the following flags: 0 No options (default values). 1 Add a bounding box to the output. 2 Add a short CRS URN to the output. The default format is a short format ("EPSG:<srid>"). 4 Add a long format CRS URN ("urn:ogc:def:crs:EPSG::<srid>"). This implies 2. This means that, e.g., bitmask 5 and 7 mean the same: add a bounding box and a long format CRS URN. If the resulting GeoJSON string is longer than allowed_packet_size, NULL is returned to the user and a warning is given (ER_WARN_ALLOWED_PACKET_OVERFLOWED). Item_func_geomfromgeojson ------------------------- This function will parse simple GeoJSON geometry objects, e.g.: { "type": "Point", "coordinates": [0.0, 0.0] } Features contain only one geometry object in the geometry parameter, so they can also be parsed to create geometry object: { "type": "Feature", "geometry": { "type": "Point", "coordinates": [0.0, 0.0] } ... } Feature collections contain multiple features, each which may have a geometry object. The parsing function will create a geometry collection with the geometry objects for each feature, in the order they appear in the GeoJSON document. Example: { "type": "FeatureCollection", "features" : [ { "type": "Feature", "geometry": { "type": "Point", "coordinates": [0.0, 0.0] } ... }, { "type": "Feature", "geometry": { "type": "Point", "coordinates": [1.0, 1.0] } ... } ... ] } The above example would be parsed into a geometry collection containing two points, (0,0) and (1,1), in that order. GeoJSON geometry, feature and feature collection objects may have a "crs" property. The parsing function will parse named CRS URNs in the "urn:ogc:def:crs:EPSG::<srid>" and "EPSG:<srid>" namespaces, but not CRSs given as link objects. Also, "urn:ogc:def:crs:OGC:1.3:CRS84" is recognized as SRID 4326. If an object has a CRS that is not understood, an exception condition is raised telling the user that the CRS is not understood and recommending using the optional SRID parameter. The parsing is case sensitive when it comes to the member "type" in the GeoJSON input ("Point", "LineString" etc). See the following snippet from the specification: The value of the type member must be one of: "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", "GeometryCollection", "Feature", or "FeatureCollection". The case of the type member values must be as shown here. The rest of the parsing is case insensitive since the GeoJSON specification doesn't mention anything (member names, CRS object and such). All coordinates are stored internally with longitude as x and latitude as y. The 'options' parameter has four valid integer values (1, 2, 3 and 4) which describes how GeoJSON documents that contains geometries with coordinate dimension higher than 2D should be handled. Values other than these four are wrong and this function should raise an error condition if any other than the four values are supplied as a parameter. The meaning of the four values are as below: 1 Reject such GeoJSON documents with geometries and raise an error condition. This is the default behavior if the parameter isn't specified. 2 Accept them and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, silently interpret the geometry data as higher coordinate dimensions (e.g. 3D). 3 Accept them and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, raise an error condition to inform user of change of behavior and don't accept such data. 4 Accept such GeoJSON documents and stripe off the coordinates for higher coordinate dimensions. When higher coordinate dimensions (e.g. 3D) are supported in the future, continue to stripe off the extra coordinates and continue to interpret the geometry data as 2D. Use the GEOM_DIM global variable as the currently supported coordinate dimensions. For option 3 it means given a geometry g described as GeoJSON, when g.dim > GEOM_DIM, g is accepted and parsed and its z coordinate stripped off; when g.dim <= GEOM_DIM, raise an error condition that there is a change of behavior and the formally stripped z coordinate is now part of g's data.