WL#11935: Add folding of constants when compared to fields
Affects: Server-8.0
—
Status: Complete
We want to fold cases during optimize time where we currently handle type skews at execution time. In this WL, we handle comparisons[1] between a field and a constant value where the constant is of another type (or out of range) with respect to the type of the field. For comparison operators, we can let MySQL convert the type of the constant to match the other expression, or perform constant folding if a conversion is not possible. For cases when both operands are non-constant, we need to inject CASTs to allow the comparison operators to work on operands of the same type. This work is to be covered by separate work. In the special case that we are comparing a field to a constant, the work described in this WL applies. This WL will facilitate speed up at execution at the cost of a little more analysis at optimize[2] time. For example, given this table: CREATE TABLE t (ti TINYINT UNSIGNED NOT NULL); the condition in this query: SELECT * FROM t WHERE ti < 256; contains the integral constant 256 which is out range of TINYINT [-128,127]. MySQL would earlier compare using full 64 bits lenght on both operand, but that is overkill in this case: we can fold the WHERE expression to SELECT * FROM t WHERE TRUE which allows the optimizer to remove the WHERE expression. If the field is nullable, the expression is folded to SELECT * FROM t WHERE ti IS NOT NULL to preserve SQL semantics. [1] The comparison operands presently covered are >, >=, <, <=, <>, = and <=>. We do not fold constants used with BETWEEN and IN as part of this WL. [2] For prepared statements, the constant's value is not known at prepare time, so the conversion/constant folding of this WL is performed at optimize time.
F-1 Convert constants in the comparisons (see above) to the type of the field, if that is semantically equivalent to what MySQL currently does. F-2 Fold comparison expressions, if the constant is out of range of the field's type. F-3 The query results should not be impacted by this change (modulo bug fixes), but the EXPLAIN result will often change. F-4 Nested comparisons inside AND/OR or other function arguments will also be handled, allowing simplification of conditions if folding takes places. F-5 Comparisons of the kind described will also be handled inside expressions, e.g. FUNC(ti < 256) -> FUNC(ti IS NOT NULL) F-6 The order of the field and the constant in the comparison may be f,c or c,f. F-7 Constants given as dynamic parameters to a prepared statement are handled. F-8 An extra optimization is handled: if a field is not nullable, the expression f IS NOT NULL will be folded to TRUE. N-1 Performance of existing queries should not be impacted. Note that folding incurs some extra work at prepare/optimize time. On the other hand we should save some cycles at execute time, so overall, performance should not be negatively impacted in a significant way.
The main method performing this logic is `fold_condition' called from `internal_remove_eq_conds' called from `remove_eq_conds' in the optimizer. This allows the folding to tie into the existing simplification apparatus present. In addition, fold_condition is called when generating eq queries generated from the multi-equality analysis in eliminate_item_equal. In some cases, the amount of warnings may change due to this patch, since constants are checked at optimize time, not for every row's execution.
`fold_condition' analyzes the comparison expressions using the main dispatcher method `analyze_field_constant'. This will convert the constant if possible, or signal that some folding can take place, which is then carried out in `fold_condition'. The following field types are subject to treatment (column one). The relevant analysis method used for each is listed in column two. TINYINT analyze_int_field_constant SHORTINT analyze_int_field_constant MEDIUMINT analyze_int_field_constant INT analyze_int_field_constant BIGINT analyze_int_field_constant DECIMAL analyze_decimal_field_constant FLOAT[(m,n)] analyze_real_field_constant DOUBLE[(m,n)] analyze_real_field_constant TIME analyze_time_field_constant DATE analyze_timestamp_field_constant DATETIME[.nnnnnn] analyze_timestamp_field_constant TIMESTAMP[.nnnnnn] analyze_timestamp_field_constant YEAR analyze_year_field_constant How each of these is treated is indicated below. The treatment depends of the MySQL result type of the constants in each case: INT_RESULT, DECIMAL_RESULT, REAL_RESULT or STRING_RESULT. 1. TINYINT This integer type will show the pattern of the other integer types. Only BIGINT needs special treatment, so the explanations given for TINYINT will apply and be referenced from the sections for the other integer types. 1.1 INT_RESULT The constant is analyzed for range. If outside, the comparison is folded. If the constant is on a range border, we simplify >= or <=, e.g. SELECT * FROM t WHERE f <= -128 -> SELECT * FROM t WHERE f = 128; 1.2 DECIMAL_RESULT If a decimal has a fraction part, it can't be equal to an integer; inversely it can't not be unequal. These cases are folded. For the other comparisons, we round up (or down depending on sign) to an integer before we check the integer range in two steps: first whether its representable in a 64 bit int, and if so, applying the same check for integer type as for INT_RESULT. 1.3 REAL_RESULT A very small value that can't be represented as decimal, is represented as +/- 0.1 before we handle it the same way as decimals, see above. Folded directly if it overflows decimal's range. 1.4 STRING_RESULT Numbers given as 0xnnnn have this result type, and represented as a Item::VARBIN_ITEM. If so, it is treated as if it were a constant with INT_RESULT, see above. If not, it is treated as for REAL_RESULT, see above. This only happens if the string is not an integer or within legal range of the field type, btw. If it is an integer and within its range, it is converted to an Item_int_with_ref (with INT_RESULT) before folding happens. 2. SHORTINT, MEDIUMINT, INT See TINYINT description. 5. BIGINT 5.1 INT_RESULT See TINYINT description. 5.2 DECIMAL_RESULT See TINYINT description. 5.3 REAL_RESULT See TINYINT description. 5.4 STRING_RESULT Same as TINYINT. Special logic to handle constant at the limit, since MySQL internally uses a 64 bits int to represent the constant (longlong). 6. DECIMAL 6.1 INT_RESULT Checks if the integer overflows the range of the decimal type's integer part, if so, fold. If not convert to a corresponding decimal constant with the same number of decimals as the field's type. 6.2 DECIMAL_RESULT We first check overflow, i.e. if the constant has more digits in the integer part than allowed for the fields decimal type. If so, we fold the comparison. If the constant has more significant fractional digits than the field's type, we truncate it, and fold iff we have = or <>. If we have >= or <=, we adjust the operator since we truncated, e.g. if the field's type is DECIMAL(3,1): SELECT * FROM t WHERE f >= 10.13 -> SELECT * FROM t WHERE f > 10.1 If the constant has fewer decimal digits than the field's type, we convert it to a constant with the same number for digits. 6.3 REAL_RESULT We first check if the real value overflows the integer part of a MySQL DECIMAL (using max precision). If not, we fold. If not, we apply the same checks as for DECIMAL_RESULT above, but converting the constant to a decimal constant. If the real value underflows, i.e. we have too few fractional digits to represent a very small real value, we convert the value to decimal 0 and compensate in the folding logic, cf. explanation for DECIMAL_RESULT (see RP_INSIDE_TRUNCATED in the code). 6.4 STRING_RESULT 0xnnnn numbers are treated as INT_RESULT, see section 1.4. If not, we use the logic of DECIMAL_RESULT. NOTE: Any real value inside a string which underflows decimals representation gets truncated silently to 0 by existing MySQL code (probably a bug: a string non-constant doesn't show this behavior), and thus breaks the assumption that the constant with and without string quotes should behave the same: a) Small real constant CREATE TABLE t(i INT, d DECIMAL(5,2)); INSERT INTO t VALUES (0, 0); mysql> SELECT * FROM t WHERE i = 1.0E-308; Empty set (0.01 sec) mysql> SELECT * FROM t WHERE d = 1.0E-308; Empty set (0.00 sec) b) Small real constant inside a string: mysql> SELECT * FROM t WHERE i = '1.0E-308'; Empty set (0.00 sec) mysql> SELECT * FROM t WHERE d = '1.0E-308'; +------+------+ | i | d | +------+------+ | 0 | 0.00 | +------+------+ 1 row in set (0.00 sec) The new code replicates this behavior. This can be avoided if we instead of directly using the logic of DECIMAL_RESULT (above), we apply the logic of REAL_RESULT. 7. FLOAT[(m,n)] 7.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT If the value overflows the range of float, we fold. If the value has more decimals than 'n', if given, we truncate, but compensate during folding: for = and <>, we fold to false, true (modulo nulls); for the other operators we adjust the operator, e.g. for FLOAT(5,2): SELECT * FROM t WHERE f < 123.223 -> SELECT * FROM t WHERE f <= 123.22000122070312 If the value has more integer digits than m, if given, we fold. If the constant is not a real constant, we convert it to one. 8. DOUBLE[(m,n)] Same logic as for FLOAT. 9. TIME 9.1 INT_RESULT If the data_type of the constant is TIME, this means that the value is a valid TIME value, so no need for any folding. If not, it could not be interpreted as a TIME value, so we do nothing. MySQL will then compare using double. This would need CAST injection. 9.2 STRING_RESULT This means that the constant is not a valid TIME value. MySQL will then compare using strings. This need CAST injection, so we do nothing here. 9.3 REAL_RESULT, DECIMAL_RESULT The number could not be interpreted as datetime, so MySQL will compare as DOUBLE. In this case, we will need CAST injection, so we do nothing here. 10. DATE 10.1 INT_RESULT, STRING_RESULT See DATETIME, but with the following difference: if a string contains a datetime value that has a non-zero time part, the constant is truncated to a DATE literal, and by adjusting operator (>, >=, <, <=) or folding to true or false (=, <>, !=, <=>). If the time part is zero, it is simply truncated to a DATE literal. 10.2 DECIMAL_RESULT, REAL_RESULT The number could not be interpreted as datetime, so MySQL will compare as double. In this case, we will need CAST injection, so we do nothing here. 11. DATETIME[.nnnnnn] 11.1 INT_RESULT, STRING_RESULT This covers both string constants and TIMESTAMP literals. Any integer interpretable as a valid timestamp will have been converted to INT_RESULT. DATE constants get promoted to TIMESTAMP values by giving them 00:00:00 time. Some wrong values are still compared as DATETIME, e.g. '2018-02-31 06:14:07' (illegal day in February) - they are representable internally as a DATETIME - while worse values lead to comparison as strings (e.g. '2018') or DOUBLE (2018) by MySQL currently. The latter two are not covered by this WL and needs CAST injection. The same goes for zeros in year or month (except for the special null value 0000-00-00 which is accepted), unless the current sql_mode allows it (no NO_ZERO_IN_DATE). Acceptable values are converted to Item_datetime_literal (INT_RESULT). We do not presently truncate excessive number of second fraction decimals in the constant (compared to what is allowed for the field's type), but retain the number given. If more than 6 are given, normal rounding happens. We do not attempt folding if the number of decimals in the constant exceeds the field type's number. 11.2 DECIMAL_RESULT, REAL_RESULT The number could not be interpreted as datetime, so MySQL will compare as double. In this case, we will need CAST injection, so we do nothing here. 12. TIMESTAMP[.nnnnnn] 12.1 INT_RESULT, STRING_RESULT As for datetime, but we analyze and fold DATETIME values outside the legal range for TIMESTAMP, unless it contains a zero month or date, cf explanation in section 11.1. 12.2 DECIMAL_RESULT, REAL_RESULT The number could not be interpreted as datetime, so MySQL will compare as double. In this case, we will need CAST injection, so we do nothing here. 13. YEAR 13.1 INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT The YEAR range is weird: Y = {0} U [1901, 2155]. If the constant is a valid year, this is detected during prepare, and the constant has received datatype == MYSQL_TYPE_YEAR. We identify 0 and 2155 as min and max respectively for folding of <= or >= to < or >. If the constant is outside the legal range we have three cases: a) c < 0 b) c > 0 && c < 1901 (the minimum year, if not 0) c) c > 2155 a) and c) are handled in the normal way for folding. b) is treated specially, see RP_INSIDE_YEAR_HOLE in the code: for = and <>, if is treated as a normal outside range, for >, >=, < and <= it is treated as within the range, i.e. no folding. No attempt is made to convert constants in the range [1,1900] into a "legal" year. This is possible if we adjust the operators >= or <=, though. Reviewer: should we do this? If the constant is not an integer, it is converted into one.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.