WL#11935: Add folding of constants when compared to fields

Affects: Server-8.0 — Status: Complete

We want to fold cases during optimize time where we currently handle type skews
at execution time. In this WL, we handle comparisons[1] between a field and a
constant value where the constant is of another type (or out of range) with
respect to the type of the field.

For comparison operators, we can let MySQL convert the type of the constant
to match the other expression, or perform constant folding if a conversion is
not possible.

For cases when both operands are non-constant, we need to inject CASTs to allow
the comparison operators to work on operands of the same type. This work is
to be covered by separate work.

In the special case that we are comparing a field to a constant, the work
described in this WL applies.

This WL will facilitate speed up at execution at the cost of a little more
analysis at optimize[2] time.

For example, given this table:

   CREATE TABLE t (ti TINYINT UNSIGNED NOT NULL);

the condition in this query:

   SELECT * FROM t WHERE ti < 256;

contains the integral constant 256 which is out range of TINYINT
[-128,127]. MySQL would earlier compare using full 64 bits lenght on both
operand, but that is overkill in this case: we can fold the WHERE expression to

   SELECT * FROM t WHERE TRUE 

which allows the optimizer to remove the WHERE expression.  If the field is
nullable, the expression is folded to

   SELECT * FROM t WHERE ti IS NOT NULL

to preserve SQL semantics.

[1] The comparison operands presently covered are >, >=, <, <=, <>, = and <=>.
    We do not fold constants used with BETWEEN and IN as part of this WL.

[2] For prepared statements, the constant's value is not known at prepare time,
so the conversion/constant folding of this WL is performed at optimize time.

F-1 Convert constants in the comparisons (see above) to the type of the
    field, if that is semantically equivalent to what MySQL currently does.

F-2 Fold comparison expressions, if the constant is out of range of
    the field's type.

F-3 The query results should not be impacted by this change (modulo
    bug fixes), but the EXPLAIN result will often change.

F-4 Nested comparisons inside AND/OR or other function arguments will
    also be handled, allowing simplification of conditions if folding
    takes places.

F-5 Comparisons of the kind described will also be handled inside
    expressions, e.g. FUNC(ti < 256) -> FUNC(ti IS NOT NULL)

F-6 The order of the field and the constant in the comparison may be
    f,c or c,f.

F-7 Constants given as dynamic parameters to a prepared statement are
    handled.

F-8 An extra optimization is handled: if a field is not nullable, the
    expression
        f IS NOT NULL
    will be folded to TRUE.

N-1 Performance of existing queries should not be impacted. Note that folding
incurs some extra work at prepare/optimize time. On the other hand we should
save some cycles at execute time, so overall, performance should not be
negatively impacted in a significant way.

The main method performing this logic is `fold_condition' called from
`internal_remove_eq_conds' called from `remove_eq_conds' in the
optimizer. This allows the folding to tie into the existing
simplification apparatus present.

In addition, fold_condition is called when generating eq queries generated
from the multi-equality analysis in eliminate_item_equal.

In some cases, the amount of warnings may change due to this patch,
since constants are checked at optimize time, not for every row's execution.

`fold_condition' analyzes the comparison expressions using the main dispatcher
method `analyze_field_constant'. This will convert the constant if possible, or
signal that some folding can take place, which is then carried out in
`fold_condition'.

The following field types are subject to treatment (column one).  The relevant
analysis method used for each is listed in column two.

TINYINT                analyze_int_field_constant
SHORTINT               analyze_int_field_constant
MEDIUMINT              analyze_int_field_constant
INT                    analyze_int_field_constant
BIGINT                 analyze_int_field_constant
DECIMAL                analyze_decimal_field_constant
FLOAT[(m,n)]           analyze_real_field_constant
DOUBLE[(m,n)]          analyze_real_field_constant
TIME                   analyze_time_field_constant
DATE                   analyze_timestamp_field_constant
DATETIME[.nnnnnn]      analyze_timestamp_field_constant
TIMESTAMP[.nnnnnn]     analyze_timestamp_field_constant
YEAR                   analyze_year_field_constant


How each of these is treated is indicated below. The treatment depends of the
MySQL result type of the constants in each case: INT_RESULT, DECIMAL_RESULT,
REAL_RESULT or STRING_RESULT.

1. TINYINT

   This integer type will show the pattern of the other integer
   types. Only BIGINT needs special treatment, so the explanations
   given for TINYINT will apply and be referenced from the sections
   for the other integer types.
   
   1.1 INT_RESULT

   The constant is analyzed for range. If outside, the comparison is folded. If
   the constant is on a range border, we simplify >= or <=, e.g.

     SELECT * FROM t WHERE f <= -128     ->   SELECT * FROM t WHERE f = 128;
   
   1.2 DECIMAL_RESULT

   If a decimal has a fraction part, it can't be equal to an integer; inversely
   it can't not be unequal. These cases are folded.

   For the other comparisons, we round up (or down depending on sign) to an
   integer before we check the integer range in two steps: first whether its
   representable in a 64 bit int, and if so, applying the same check for integer
   type as for INT_RESULT.
   
   1.3 REAL_RESULT

   A very small value that can't be represented as decimal, is represented as
   +/- 0.1 before we handle it the same way as decimals, see above. Folded
   directly if it overflows decimal's range.

   1.4 STRING_RESULT

   Numbers given as 0xnnnn have this result type, and represented as a
   Item::VARBIN_ITEM.  If so, it is treated as if it were a constant with
   INT_RESULT, see above.

   If not, it is treated as for REAL_RESULT, see above.   

   This only happens if the string is not an integer or within legal range of
   the field type, btw. If it is an integer and within its range, it is
   converted to an Item_int_with_ref (with INT_RESULT) before folding happens.

2. SHORTINT, MEDIUMINT, INT

   See TINYINT description.
   
5. BIGINT                 
   5.1 INT_RESULT

   See TINYINT description.
   
   5.2 DECIMAL_RESULT

   See TINYINT description.
   
   5.3 REAL_RESULT

   See TINYINT description.
   
   5.4 STRING_RESULT

   Same as TINYINT. Special logic to handle constant at the limit, since MySQL
   internally uses a 64 bits int to represent the constant (longlong).

6. DECIMAL                
   6.1 INT_RESULT

   Checks if the integer overflows the range of the decimal type's integer
   part, if so, fold.  If not convert to a corresponding decimal constant with
   the same number of decimals as the field's type.

   6.2 DECIMAL_RESULT

   We first check overflow, i.e. if the constant has more digits in the integer
   part than allowed for the fields decimal type. If so, we fold the
   comparison. If the constant has more significant fractional digits than the
   field's type, we truncate it, and fold iff we have = or <>.  If we have >= or
   <=, we adjust the operator since we truncated, e.g. if the field's type is
   DECIMAL(3,1):
   
     SELECT * FROM t WHERE f >= 10.13   ->   SELECT * FROM t WHERE f > 10.1

   If the constant has fewer decimal digits than the field's type, we convert it
   to a constant with the same number for digits.

   6.3 REAL_RESULT

   We first check if the real value overflows the integer part of a MySQL
   DECIMAL (using max precision). If not, we fold.  If not, we apply the same
   checks as for DECIMAL_RESULT above, but converting the constant to a decimal
   constant.

   If the real value underflows, i.e. we have too few fractional digits to 
   represent a very small real value, we convert the value to decimal 0 and 
   compensate in the folding logic, cf. explanation for DECIMAL_RESULT (see
   RP_INSIDE_TRUNCATED in the code).
   
   6.4 STRING_RESULT

   0xnnnn numbers are treated as INT_RESULT, see section 1.4.

   If not, we use the logic of DECIMAL_RESULT.

   NOTE: Any real value inside a string which underflows decimals representation
   gets truncated silently to 0 by existing MySQL code (probably a bug: a string
   non-constant doesn't show this behavior), and thus breaks the assumption that
   the constant with and without string quotes should behave the same:

   a) Small real constant
   
   CREATE TABLE t(i INT, d DECIMAL(5,2)); 
   INSERT INTO t VALUES (0, 0);          
   mysql> SELECT * FROM t WHERE i = 1.0E-308;
   Empty set (0.01 sec)
   mysql> SELECT * FROM t WHERE d = 1.0E-308;
   Empty set (0.00 sec)

   b) Small real constant inside a string:

   mysql> SELECT * FROM t WHERE i = '1.0E-308';
   Empty set (0.00 sec)

   mysql> SELECT * FROM t WHERE d = '1.0E-308';
   +------+------+
   | i    | d    |
   +------+------+
   |    0 | 0.00 |
   +------+------+
   1 row in set (0.00 sec)

   The new code replicates this behavior. This can be avoided if we instead of
   directly using the logic of DECIMAL_RESULT (above), we apply the logic of
   REAL_RESULT.
   
7. FLOAT[(m,n)]

   7.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT
   
   If the value overflows the range of float, we fold.  If the value has more
   decimals than 'n', if given, we truncate, but compensate during folding: for
   = and <>, we fold to false, true (modulo nulls); for the other operators we
   adjust the operator, e.g. for FLOAT(5,2):

      SELECT * FROM t WHERE f < 123.223   -> 
      SELECT * FROM t WHERE f <= 123.22000122070312

   If the value has more integer digits than m, if given, we fold.  If the
   constant is not a real constant, we convert it to one.
   
8. DOUBLE[(m,n)]

   Same logic as for FLOAT. 
   
9. TIME                   
   9.1 INT_RESULT

   If the data_type of the constant is TIME, this means that the value is a
   valid TIME value, so no need for any folding. If not, it could not be
   interpreted as a TIME value, so we do nothing. MySQL will then compare
   using double. This would need CAST injection.

   9.2 STRING_RESULT

   This means that the constant is not a valid TIME value. MySQL will then
   compare using strings. This need CAST injection, so we do nothing here.
   
   9.3 REAL_RESULT, DECIMAL_RESULT

   The number could not be interpreted as datetime, so MySQL will compare as
   DOUBLE.  In this case, we will need CAST injection, so we do nothing here.
   
10. DATE                  
   10.1 INT_RESULT, STRING_RESULT

   See DATETIME, but with the following difference: if a string contains
   a datetime value that has a non-zero time part, the constant is truncated
   to a DATE literal, and by adjusting operator (>, >=, <, <=) or folding 
   to true or false (=, <>, !=, <=>). If the time part is zero, it is simply
   truncated to a DATE literal.

   10.2 DECIMAL_RESULT, REAL_RESULT

   The number could not be interpreted as datetime, so MySQL will compare as
   double.  In this case, we will need CAST injection, so we do nothing here.

11. DATETIME[.nnnnnn]     
   11.1 INT_RESULT, STRING_RESULT

   This covers both string constants and TIMESTAMP literals. Any integer
   interpretable as a valid timestamp will have been converted to INT_RESULT.
   DATE constants get promoted to TIMESTAMP values by giving them 00:00:00 time.
   Some wrong values are still compared as DATETIME, e.g. '2018-02-31 06:14:07'
   (illegal day in February) - they are representable internally as a DATETIME -
   while worse values lead to comparison as strings (e.g. '2018') or DOUBLE
   (2018) by MySQL currently. The latter two are not covered by this WL and
   needs CAST injection. The same goes for zeros in year or month (except for
   the special null value 0000-00-00 which is accepted), unless the current
   sql_mode allows it (no NO_ZERO_IN_DATE).

   Acceptable values are converted to Item_datetime_literal (INT_RESULT). We do
   not presently truncate excessive number of second fraction decimals in the
   constant (compared to what is allowed for the field's type), but retain the
   number given. If more than 6 are given, normal rounding happens. We do not
   attempt folding if the number of decimals in the constant exceeds the field
   type's number. 


   11.2 DECIMAL_RESULT, REAL_RESULT

   The number could not be interpreted as datetime, so MySQL will compare as
   double.  In this case, we will need CAST injection, so we do nothing here.

12. TIMESTAMP[.nnnnnn]    
   12.1 INT_RESULT, STRING_RESULT

   As for datetime, but we analyze and fold DATETIME values outside the legal
   range for TIMESTAMP, unless it contains a zero month or date, cf explanation
   in section 11.1.

   12.2 DECIMAL_RESULT, REAL_RESULT

   The number could not be interpreted as datetime, so MySQL will compare as
   double.  In this case, we will need CAST injection, so we do nothing here.
   
13. YEAR                  

   13.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT

   The YEAR range is weird:  Y = {0} U [1901, 2155].

   If the constant is a valid year, this is detected during prepare, and the
   constant has received datatype == MYSQL_TYPE_YEAR. We identify 0 and 2155 as
   min and max respectively for folding of <= or >= to < or >.

   If the constant is outside the legal range we have three cases:
   a) c < 0
   b) c > 0 && c < 1901 (the minimum year, if not 0)
   c) c > 2155

   a) and c) are handled in the normal way for folding. b) is treated specially,
   see RP_INSIDE_YEAR_HOLE in the code: for = and <>, if is treated as a normal
   outside range, for >, >=, < and <= it is treated as within the range, i.e. no
   folding. No attempt is made to convert constants in the range [1,1900] into a
   "legal" year. This is possible if we adjust the operators >= or <=,
   though. Reviewer: should we do this?

   If the constant is not an integer, it is converted into one.