WL#11935: Add folding of constants when compared to fields
Affects: Server-8.0
—
Status: Complete
We want to fold cases during optimize time where we currently handle type skews
at execution time. In this WL, we handle comparisons[1] between a field and a
constant value where the constant is of another type (or out of range) with
respect to the type of the field.
For comparison operators, we can let MySQL convert the type of the constant
to match the other expression, or perform constant folding if a conversion is
not possible.
For cases when both operands are non-constant, we need to inject CASTs to allow
the comparison operators to work on operands of the same type. This work is
to be covered by separate work.
In the special case that we are comparing a field to a constant, the work
described in this WL applies.
This WL will facilitate speed up at execution at the cost of a little more
analysis at optimize[2] time.
For example, given this table:
CREATE TABLE t (ti TINYINT UNSIGNED NOT NULL);
the condition in this query:
SELECT * FROM t WHERE ti < 256;
contains the integral constant 256 which is out range of TINYINT
[-128,127]. MySQL would earlier compare using full 64 bits lenght on both
operand, but that is overkill in this case: we can fold the WHERE expression to
SELECT * FROM t WHERE TRUE
which allows the optimizer to remove the WHERE expression. If the field is
nullable, the expression is folded to
SELECT * FROM t WHERE ti IS NOT NULL
to preserve SQL semantics.
[1] The comparison operands presently covered are >, >=, <, <=, <>, = and <=>.
We do not fold constants used with BETWEEN and IN as part of this WL.
[2] For prepared statements, the constant's value is not known at prepare time,
so the conversion/constant folding of this WL is performed at optimize time.
F-1 Convert constants in the comparisons (see above) to the type of the
field, if that is semantically equivalent to what MySQL currently does.
F-2 Fold comparison expressions, if the constant is out of range of
the field's type.
F-3 The query results should not be impacted by this change (modulo
bug fixes), but the EXPLAIN result will often change.
F-4 Nested comparisons inside AND/OR or other function arguments will
also be handled, allowing simplification of conditions if folding
takes places.
F-5 Comparisons of the kind described will also be handled inside
expressions, e.g. FUNC(ti < 256) -> FUNC(ti IS NOT NULL)
F-6 The order of the field and the constant in the comparison may be
f,c or c,f.
F-7 Constants given as dynamic parameters to a prepared statement are
handled.
F-8 An extra optimization is handled: if a field is not nullable, the
expression
f IS NOT NULL
will be folded to TRUE.
N-1 Performance of existing queries should not be impacted. Note that folding
incurs some extra work at prepare/optimize time. On the other hand we should
save some cycles at execute time, so overall, performance should not be
negatively impacted in a significant way.
The main method performing this logic is `fold_condition' called from `internal_remove_eq_conds' called from `remove_eq_conds' in the optimizer. This allows the folding to tie into the existing simplification apparatus present. In addition, fold_condition is called when generating eq queries generated from the multi-equality analysis in eliminate_item_equal. In some cases, the amount of warnings may change due to this patch, since constants are checked at optimize time, not for every row's execution.
`fold_condition' analyzes the comparison expressions using the main dispatcher
method `analyze_field_constant'. This will convert the constant if possible, or
signal that some folding can take place, which is then carried out in
`fold_condition'.
The following field types are subject to treatment (column one). The relevant
analysis method used for each is listed in column two.
TINYINT analyze_int_field_constant
SHORTINT analyze_int_field_constant
MEDIUMINT analyze_int_field_constant
INT analyze_int_field_constant
BIGINT analyze_int_field_constant
DECIMAL analyze_decimal_field_constant
FLOAT[(m,n)] analyze_real_field_constant
DOUBLE[(m,n)] analyze_real_field_constant
TIME analyze_time_field_constant
DATE analyze_timestamp_field_constant
DATETIME[.nnnnnn] analyze_timestamp_field_constant
TIMESTAMP[.nnnnnn] analyze_timestamp_field_constant
YEAR analyze_year_field_constant
How each of these is treated is indicated below. The treatment depends of the
MySQL result type of the constants in each case: INT_RESULT, DECIMAL_RESULT,
REAL_RESULT or STRING_RESULT.
1. TINYINT
This integer type will show the pattern of the other integer
types. Only BIGINT needs special treatment, so the explanations
given for TINYINT will apply and be referenced from the sections
for the other integer types.
1.1 INT_RESULT
The constant is analyzed for range. If outside, the comparison is folded. If
the constant is on a range border, we simplify >= or <=, e.g.
SELECT * FROM t WHERE f <= -128 -> SELECT * FROM t WHERE f = 128;
1.2 DECIMAL_RESULT
If a decimal has a fraction part, it can't be equal to an integer; inversely
it can't not be unequal. These cases are folded.
For the other comparisons, we round up (or down depending on sign) to an
integer before we check the integer range in two steps: first whether its
representable in a 64 bit int, and if so, applying the same check for integer
type as for INT_RESULT.
1.3 REAL_RESULT
A very small value that can't be represented as decimal, is represented as
+/- 0.1 before we handle it the same way as decimals, see above. Folded
directly if it overflows decimal's range.
1.4 STRING_RESULT
Numbers given as 0xnnnn have this result type, and represented as a
Item::VARBIN_ITEM. If so, it is treated as if it were a constant with
INT_RESULT, see above.
If not, it is treated as for REAL_RESULT, see above.
This only happens if the string is not an integer or within legal range of
the field type, btw. If it is an integer and within its range, it is
converted to an Item_int_with_ref (with INT_RESULT) before folding happens.
2. SHORTINT, MEDIUMINT, INT
See TINYINT description.
5. BIGINT
5.1 INT_RESULT
See TINYINT description.
5.2 DECIMAL_RESULT
See TINYINT description.
5.3 REAL_RESULT
See TINYINT description.
5.4 STRING_RESULT
Same as TINYINT. Special logic to handle constant at the limit, since MySQL
internally uses a 64 bits int to represent the constant (longlong).
6. DECIMAL
6.1 INT_RESULT
Checks if the integer overflows the range of the decimal type's integer
part, if so, fold. If not convert to a corresponding decimal constant with
the same number of decimals as the field's type.
6.2 DECIMAL_RESULT
We first check overflow, i.e. if the constant has more digits in the integer
part than allowed for the fields decimal type. If so, we fold the
comparison. If the constant has more significant fractional digits than the
field's type, we truncate it, and fold iff we have = or <>. If we have >= or
<=, we adjust the operator since we truncated, e.g. if the field's type is
DECIMAL(3,1):
SELECT * FROM t WHERE f >= 10.13 -> SELECT * FROM t WHERE f > 10.1
If the constant has fewer decimal digits than the field's type, we convert it
to a constant with the same number for digits.
6.3 REAL_RESULT
We first check if the real value overflows the integer part of a MySQL
DECIMAL (using max precision). If not, we fold. If not, we apply the same
checks as for DECIMAL_RESULT above, but converting the constant to a decimal
constant.
If the real value underflows, i.e. we have too few fractional digits to
represent a very small real value, we convert the value to decimal 0 and
compensate in the folding logic, cf. explanation for DECIMAL_RESULT (see
RP_INSIDE_TRUNCATED in the code).
6.4 STRING_RESULT
0xnnnn numbers are treated as INT_RESULT, see section 1.4.
If not, we use the logic of DECIMAL_RESULT.
NOTE: Any real value inside a string which underflows decimals representation
gets truncated silently to 0 by existing MySQL code (probably a bug: a string
non-constant doesn't show this behavior), and thus breaks the assumption that
the constant with and without string quotes should behave the same:
a) Small real constant
CREATE TABLE t(i INT, d DECIMAL(5,2));
INSERT INTO t VALUES (0, 0);
mysql> SELECT * FROM t WHERE i = 1.0E-308;
Empty set (0.01 sec)
mysql> SELECT * FROM t WHERE d = 1.0E-308;
Empty set (0.00 sec)
b) Small real constant inside a string:
mysql> SELECT * FROM t WHERE i = '1.0E-308';
Empty set (0.00 sec)
mysql> SELECT * FROM t WHERE d = '1.0E-308';
+------+------+
| i | d |
+------+------+
| 0 | 0.00 |
+------+------+
1 row in set (0.00 sec)
The new code replicates this behavior. This can be avoided if we instead of
directly using the logic of DECIMAL_RESULT (above), we apply the logic of
REAL_RESULT.
7. FLOAT[(m,n)]
7.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT
If the value overflows the range of float, we fold. If the value has more
decimals than 'n', if given, we truncate, but compensate during folding: for
= and <>, we fold to false, true (modulo nulls); for the other operators we
adjust the operator, e.g. for FLOAT(5,2):
SELECT * FROM t WHERE f < 123.223 ->
SELECT * FROM t WHERE f <= 123.22000122070312
If the value has more integer digits than m, if given, we fold. If the
constant is not a real constant, we convert it to one.
8. DOUBLE[(m,n)]
Same logic as for FLOAT.
9. TIME
9.1 INT_RESULT
If the data_type of the constant is TIME, this means that the value is a
valid TIME value, so no need for any folding. If not, it could not be
interpreted as a TIME value, so we do nothing. MySQL will then compare
using double. This would need CAST injection.
9.2 STRING_RESULT
This means that the constant is not a valid TIME value. MySQL will then
compare using strings. This need CAST injection, so we do nothing here.
9.3 REAL_RESULT, DECIMAL_RESULT
The number could not be interpreted as datetime, so MySQL will compare as
DOUBLE. In this case, we will need CAST injection, so we do nothing here.
10. DATE
10.1 INT_RESULT, STRING_RESULT
See DATETIME, but with the following difference: if a string contains
a datetime value that has a non-zero time part, the constant is truncated
to a DATE literal, and by adjusting operator (>, >=, <, <=) or folding
to true or false (=, <>, !=, <=>). If the time part is zero, it is simply
truncated to a DATE literal.
10.2 DECIMAL_RESULT, REAL_RESULT
The number could not be interpreted as datetime, so MySQL will compare as
double. In this case, we will need CAST injection, so we do nothing here.
11. DATETIME[.nnnnnn]
11.1 INT_RESULT, STRING_RESULT
This covers both string constants and TIMESTAMP literals. Any integer
interpretable as a valid timestamp will have been converted to INT_RESULT.
DATE constants get promoted to TIMESTAMP values by giving them 00:00:00 time.
Some wrong values are still compared as DATETIME, e.g. '2018-02-31 06:14:07'
(illegal day in February) - they are representable internally as a DATETIME -
while worse values lead to comparison as strings (e.g. '2018') or DOUBLE
(2018) by MySQL currently. The latter two are not covered by this WL and
needs CAST injection. The same goes for zeros in year or month (except for
the special null value 0000-00-00 which is accepted), unless the current
sql_mode allows it (no NO_ZERO_IN_DATE).
Acceptable values are converted to Item_datetime_literal (INT_RESULT). We do
not presently truncate excessive number of second fraction decimals in the
constant (compared to what is allowed for the field's type), but retain the
number given. If more than 6 are given, normal rounding happens. We do not
attempt folding if the number of decimals in the constant exceeds the field
type's number.
11.2 DECIMAL_RESULT, REAL_RESULT
The number could not be interpreted as datetime, so MySQL will compare as
double. In this case, we will need CAST injection, so we do nothing here.
12. TIMESTAMP[.nnnnnn]
12.1 INT_RESULT, STRING_RESULT
As for datetime, but we analyze and fold DATETIME values outside the legal
range for TIMESTAMP, unless it contains a zero month or date, cf explanation
in section 11.1.
12.2 DECIMAL_RESULT, REAL_RESULT
The number could not be interpreted as datetime, so MySQL will compare as
double. In this case, we will need CAST injection, so we do nothing here.
13. YEAR
13.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT
The YEAR range is weird: Y = {0} U [1901, 2155].
If the constant is a valid year, this is detected during prepare, and the
constant has received datatype == MYSQL_TYPE_YEAR. We identify 0 and 2155 as
min and max respectively for folding of <= or >= to < or >.
If the constant is outside the legal range we have three cases:
a) c < 0
b) c > 0 && c < 1901 (the minimum year, if not 0)
c) c > 2155
a) and c) are handled in the normal way for folding. b) is treated specially,
see RP_INSIDE_YEAR_HOLE in the code: for = and <>, if is treated as a normal
outside range, for >, >=, < and <= it is treated as within the range, i.e. no
folding. No attempt is made to convert constants in the range [1,1900] into a
"legal" year. This is possible if we adjust the operators >= or <=,
though. Reviewer: should we do this?
If the constant is not an integer, it is converted into one.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.