WL#7928: Geohash encoding and decoding functions

Status: Complete

Geohash is a system for encoding latitude and longitude coordinates of arbitrary precision into a text string. This worklog is about adding functions to encode and decode geohashes.

These functions will make it possible for applications to import and export data from MySQL using the geohash format. It will also make it possible to index and search for geographic data in one-dimensional indexes.

Specification: http://en.wikipedia.org/wiki/Geohash

User Documentation

Functional requirements:

F-1:  The encoding function MUST correctly encode all valid
      longitude-latitude coordinates into geohashes according to the
      specification.

F-2:  The decoding functions MUST correctly decode all valid geohashes
      into longitude/latitude/point.

F-3:  The encoding function MUST NOT allow invalid longitude, latitude
      or point values. It MUST raise an exception condition if given
      invalid parameter values.

F-4:  The decoding functions MUST NOT allow invalid geohash
      values. They MUST raise exception conditions if given invalid
      parameter values.

F-5:  The functions MUST NOT return NULL unless given a NULL parameter
      value. In case of NULL values for one or more nullable input
      parameters, the functions MUST return NULL.

F-6:  The functions SHOULD NOT allow non-numeric types as longitude,
      latitude or length parameters. They SHOULD raise exception
      conditions in such cases.

F-7:  The functions SHOULD NOT allow non-string values as geohash
      parameters. They SHOULD raise exception conditions in such
      cases.

F-8:  The user MUST provide a maxlength parameter to encoding
      functions to get a geohash of the requested length (in
      bytes). Such a parameter MUST be an integer number greater than
      0. If this parameter is NULL, the functions MUST return NULL.

F-9:  The user SHOULD be allowed to provide an SRID parameter to
      decoding functions returning geometric objects. Such a parameter
      MUST be a positive 32 bit unsigned integer and SHOULD NOT have a
      default value. If this parameter is NULL, the functions MUST
      return NULL.

F-10: Valid longitude values MUST be in the range [-180,180]. Positive
      values are east of the prime meridian. Longitude parameters MUST
      be nullable.

F-11: Valid latitude values MUST be in the range [-90,90]. Positive
      values are north of the equator. Latitude parameters MUST be
      nullable.

F-12: Longitude and latitude output values SHOULD be DOUBLE
      values. They MUST be DECIMAL or DOUBLE.

F-13: Valid points MUST be of the POINT datatype and have a valid
      longitude parameter as x coordinate and a valid latitude
      parameters as y coordinate. Point parameters MUST be nullable.

F-14: Valid geohash strings MUST only contain the characters
      "0123456789bcdefghjkmnpqrstuvwxyz". Geohash string parameters
      MUST be nullable.

F-15: Empty strings SHOULD NOT be valid geohash strings. Empty strings
      MUST be either invalid geohash strings or equivalent to NULL.

Non-functional requirements:

NF-1: The functions SHOULD stop processing long input parameters after
      the required precision is reached.

NF-2: The functions SHOULD stop processing immediately after an error
      (e.g., an invalid parameter) is discovered.
Changes to the interface specification:

I-1: No new files.

I-2: New syntax: Four new functions:

         <string> = ST_GEOHASH(<longitude>, <latitude>, <maxlength>)
         <string> = ST_GEOHASH(<point>, <maxlength>)
         <double> = ST_LONGFROMGEOHASH(<string>)
         <double> = ST_LATFROMGEOHASH(<string>)
         <point>  = ST_POINTFROMGEOHASH(<string>, <srid>)

I-3: No new commands.

I-4: No new tools.

I-5: No impact on existing functionality.
Overall design
==============

Four new functions implemented as four new Item_func subclasses, where
latitude and longitude decoding functions is derived from a new superclass:

 - Item_func_geohash : public Item_str_func
 - Item_func_pointfromgeohash : public Item_geometry_func
 - Item_func_latlongfromgeohash : public Item_real_func
 - Item_func_longfromgeohash : public Item_func_latlongfromgeohash
 - Item_func_latfromgeohash : public Item_func_latlongfromgeohash

Each function has a corresponding Create_func subclass:

 - Create_func_geohash : public Create_native_func
 - Create_func_longfromgeohash : public Create_func_arg1
 - Create_func_latfromgeohash : public Create_func_arg1
 - Create_func_pointfromgeohash : public Create_func_arg2
    
Each of these factory classes are registered in func_array in
item_create.cc using GEOM_BUILDER.


Detailed design
===============


Item_func subclasses
--------------------

The types and number of input parameters is determined in
Item_func_*::fix_fields()/fix_length_and_dec(). Exception conditions
are raised if the number of parameters and parameter types don't match
the expected function signatures.

At evaluation time, in Item_func_*::val_str() or
Item_func_*::val_decimal(), exception conditions are raised if the
parameter values are invalid.

Upper and lower limits for longitude and latitude values are defined
in one place for each class.


Item_func_geohash
-----------------

Subclass of Item_str_func. This class will handle two forms of the
same function:

    <string> = ST_GEOHASH(<longitude>, <latitude>, <maxlength>)
    <string> = ST_GEOHASH(<point>, <maxlength>)

At the current time, points of any SRID is allowed. An exception
condition will be raised if the values turn out to be out of range.

Item_func_pointfromgeohash
--------------------------

Subclass of Item_geometry_func. This class will handle one function:

         <point> = ST_POINTFROMGEOHASH(<string>, <srid>)

At the current time, any SRID is allowed.


Item_func_latlongfromgeohash
----------------------------

Subclass of Item_real_func. This class will contain common functions for
decoding geohash strings, including decoding geohash into longitude and
latitude, and rounding these values according to the length of the input
geohash.


Item_func_longfromgeohash
-------------------------

Subclass of Item_func_latlongfromgeohash. This class will handle one function:

    <double> = ST_LONGFROMGEOHASH(<string>)

The number of digits in the result is defined by the length of the
geohash string.


Item_func_latfromgeohash
------------------------

Subclass of Item_func_latlongfromgeohash. This class will handle one function:

    <double> = ST_LATFROMGEOHASH(<string>)

The number of digits in the result is defined by the length of the
geohash string.


Create_func subclasses
----------------------

These are factory classes to create objects of the corresponding
Item_func subclasses.

Create_func_geohash::create_native() handles 2 and 3 parameter
versions of the encoding function and calls the Item_func_geohash
constructor with the correct number of parameters. The other factory
classes benefit from the parameter list expansion inherited from their
base classes.