WL#2934: Make/find library for doing float/double to string conversions and vice versa

Affects: Server-5.5 — Status: Complete

Description
High Level Architecture
Low Level Design

We currently rely on the system sprintf() function to do conversions from 
doubles 
and floats to strings (and the reverse), which results in inconsistent results 
from system-to-system.

We need to find or write a library for doing these conversions.

There are papers describing the algorithms for floats <=> strings conversions:
- Guy L. Steele, Jr., Jon L. White. "How to print floating-point numbers 
accurately". 
http://portal.acm.org/citation.cfm?id=93559&coll=portal&dl=ACM&CFID=551188&CFTOKEN=64149307
- William D. Clinger. "How to read floating point numbers accurately" 
http://portal.acm.org/citation.cfm?id=93557&coll=portal&dl=ACM&CFID=1476301&CFTOKEN=64297675

There is also dtoa, a "float to/from string" conversion library which is 
loosely based on the above papers. We need to find out whether it is legally 
possible to include that implementation into MySQL code.

BUG#12860 is an example of a difference between Windows and most Unix 
systems.

BUG#21497 "DOUBLE truncated to unusable value" is an example where standard 
library functions do not provide the necessary precision in some cases (e.g. 
when a number is close to IEEE limits).

BUG#24541 is an example where we want to convert not to an exact string 
representation performed by the standard library functions, but rather to a 
shortest string that yields input floating point number when read in and 
rounded to nearest, such as by the "mode 0" in dtoa. 

BUG#26788 demostrates that as libc printf() doesn't have support the format 
we
need, we need to juggle with %g width, predicting whether it'll result in
scientific or decimal notation. And the latter is impossible to do in all 
cases.

Our legal counsel has given the OK to use dtoa.c in the server.

Kostja has a patched version of dtoa.c which replaces malloc with our allocation
routines.

All current server code which needs to perform double-to-string or vice versa 
conversions can be divided into 3 categories:

1. Convert a double or float number with a fixed precision into its decimal 
string representation using the 'f' format. Since a representation in the 'f' 
format can be up to 341 bytes long, a buffer of sufficient size is allocated.

2. Convert a double or float number without precision specification into a 
string of limited length using the 'e' or 'f' format, whichever provides the 
most number of significant digits with a given string length.

3. Convert a string containing a decimal representation of a floating point 
number in the 'e' or 'f' format into a double number.

dtoa.c code contains 2 functions, dtoa() and strtod(), which allow us to 
utilize them for the above 3 tasks.

The dtoa() interface is similar to that of ecvt(3) and fcvt(3), i.e. it takes 
a double argument and the number of significant digits (depending on 
the 'mode' of operation, it may be either the total number of significant 
digits, or the number of significant digits after the decimal point) and 
returns a string containing _only_ significant digits of the resulting decimal 
representation without a decimal point. The position of the decimal point 
relative to the start of the string is returned as an output argument, decpt.

Since the result string of dtoa() is not a valid 'e'- or 'f'-format 
representation of a floating point number, nor does it have a notion of 
a 'field width', 2 wrappers around dtoa() corresponding to cases 1 and 2 above 
were implemented.

Below are their prototypes along with their comments explaining the semantics:

--- cut ---
/**
   @brief
   Converts a given floating point number to a zero-terminated string
   representation using the 'f' format.

   @details
   This function is a wrapper around dtoa() to do the same as
   sprintf(to, "%-.*f", precision, x), though the conversion is usually more
   precise. The only difference is in handling [-,+]infinity and nan values,
   in which case we print '0\0' to the output string and indicate an overflow.

   @param x           the input floating point number.
   @param precision   the number of digits after the decimal point.
                      All properties of sprintf() apply:
                      - if the number of significant digits after the decimal
                        point is less than precision, the resulting string is
                        right-padded with zeros
                      - if the precision is 0, no decimal point appears
                      - if a decimal point appears, at least one digit appears
                        before it
   @param to          pointer to the output buffer. The longest string which
                      my_fcvt() can return is FLOATING_POINT_BUFFER bytes
                      (including the terminating '\0').
   @param to_end      if not NULL, is set to point to the last written 
character
                      (which is always '\0').
   @retval TRUE       returned when the input number is [-,+]infinity or nan.
                      The output string in this case is always '0'.
   @retval FALSE      returned in case of successful conversion.
                 
*/

my_bool my_fcvt(double x, int precision, char *to, char **to_end);

/**
   @brief
   Converts a given floating point number to a zero-terminated string
   representation with a given field width using the 'e' format
   (aka scientific notation) or the 'f' one.

   @details
   The format is chosen automatically to provide the most number of 
significant
   digits (and thus, precision) with a given field width. In many cases, the
   result is similar to that of sprintf(to, "%g", x) with a few notable
   differences:
   - the conversion is usually more precise than C library functions.
   - there is no 'precision' argument. instead, we specify the number of
     characters available for conversion (i.e. a field width).
   - the result never exceeds the specified field width. If the field is too
     short to contain even a rounded decimal representation, my_gcvt()
     indicates overflow and sets '0\0' as the result of the conversion.
   - float-type arguments are handled differently than the double ones. For 
the
     float input number (i.e. when the 'type' argument is MY_GCVT_ARG_FLOAT)
     we deliberately limit the precision of conversion by FLT_DIG digits to
     avoid garbage past the significant digits.
   - unlike sprintf(), in cases where the 'e' format is preferred,  we don't
     zero-pad the exponent to save space for significant digits. The '+' sign
     for a positive exponent does not appear for the same reason.

   @param x           the input floating point number.
   @param type        is either MY_GCVT_ARG_FLOAT or MY_GCVT_ARG_DOUBLE.
                      Specifies the type of the input number (see notes 
above).
   @param width       field width in characters. The minimal field width to
                      hold any number representation (albeit rounded) is 7
                      characters ("-Ne-NNN").
   @param to          pointer to the output buffer. The result is always
                      zero-terminated, and the longest returned string is thus
                      'width + 1' bytes.
   @param to_end      if not NULL, is set to point to the last written 
character
                      (which is always '\0').
   @retval TRUE       returned when the input number is [-,+]infinity, nan, or
                      cannot be represented with a given field width.
                      The output string in this case is always '0'.
   @retval FALSE      returned in case of successful conversion.
   

   @todo
   Check if it is possible and  makes sense to do our own rounding on top of
   dtoa() instead of calling dtoa() twice in (rare) cases when the resulting
   string representation does not fit in the specified field width and we want
   to re-round the input number with fewer significant digits.
*/

my_bool my_gcvt(double x, my_gcvt_arg_type type, int width, char *to, char 
**to_end);

--- cut ---

The my_gcvt() wrapper is quite tricky, because we have the 'field width' 
constraint and hence need to re-round dtoa() results in some cases. For 
example '0.01' cannot be represented precisely in a 3-character string. dtoa() 
returns '1', decpt=-1 when passed 3 as the 'number of significant digits' 
argument, because in '0.01', 1 is the only significant digit. This is 
currently solved by calling dtoa() twice, to re-round the result to a 
specified number of significant digits after the point.

strtod() is basically needed because it is more precise than the 
current, "naive" implementation. Previously, imprecise results from 
my_strtod() were "hidden" by our imprecise way of double->string conversion 
via sprintf(). Since dtoa() provides more precision in a double->string 
conversion, results obtained by an imprecise reverse conversion may lead to 
unexpected behavior in some cases.

The wrapper over dtoa's strtod() replaces the current my_strtod(). Here is the 
prototype with comments:

--- cut ---
/**
   @brief
   Converts string to double (string does not have to be zero-terminated)

   @details
   This is a wrapper around dtoa's version of strtod().

   @param str     input string
   @param end     address of a pointer to the first character after the input
                  string. Upon return the pointer is set to point to the first
                  rejected character.
   @param error   Upon return is set to EOVERFLOW in case of underflow or
                  overflow.
   
   @return        The resulting double value. In case of underflow, 0.0 is
                  returned. In case overflow, signed infinity is returned.
*/

double my_strtod(const char *str, char **end, int *error);
--- cut ---