WL#13456: Inject CAST nodes for comparisons with STRING non-const expressions
WL#12108 created the infrastructure around injecting CAST nodes in comparisons between non-const expressions of different data types. This WL will implement support for these string-based data type expressions too.
F-1 to F-5 requirements from WL#12108
F-6 Casts to DOUBLE should be injected when comparing a string based data type
with a numeric one (or YEAR). The cast needs to be injected to the arguments that are not of type FLOAT/DOUBLE/REAL.
F-7 Cast to DATETIME should be injected when comparing a string based data type
with an argument of type DATETIME or TIMESTAMP. The cast needs to be injected only to the arguments that is not of DATETIME/TIMESTAMP data type.
F-8 Cast to DATETIME should be injected to both arguments when comparing DATE with
TIME.
F-9 Cast to DATE should be injected when comparing a string based data type with
an argument of type DATE. The cast will be injected for the argument that has the string based data type.
Where numeric data types are defined as:
* Integer Types (Exact Value): INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT * Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC * Floating-Point Types (Approximate Value) - FLOAT, DOUBLE, REAL * Bit-Value Type - BIT
String-based data types: CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET
When comparing two columns, one of which is of a STRING-based data type, the comparator used will be:
(1) DOUBLE: If one of the columns is of type INT (or any of the sub-types), DECIMAL, DOUBLE/FLOAT/REAL or YEAR (internally also represented as INT). For this case one or both arguments that are not of type DOUBLE/FLOAT/REAL will get wrapped into CASTs to DOUBLE.
example:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_int * 3 = t2.d_varchar;
becomes:
SELECT * FROM t1 INNER JOIN t2 ON CAST (t1.d_int AS DOUBLE) = CAST(t2.d_varchar AS DOUBLE);
(2) DATETIME: If one of the column is of type DATETIME/TIMESTAMP or one of the arguments is of type DATE and the other one of type TIME. For these cases the arguments that are not of DATETIME/TIMESTAMP type will get wrapped into CASTs to DATETIME.
example:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_date = t2.d_time;
becomes:
SELECT * FROM t1 INNER JOIN t2 ON CAST(t1.d_date AS DATETIME) = CAST(t2.d_time AS DATETIME);
2nd example:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_datetime = t2.d_varchar;
becomes:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_datetime = CAST(t2.d_varchar AS DATETIME);
(3) DATE: If one of the column is of type DATE the argument that is not of DATE type will get wrapped into CAST to DATE.
example:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_date = t2.d_str;
becomes:
SELECT * FROM t1 INNER JOIN t2 ON t1.d_date = CAST(t2.d_str AS DATE);
The CAST should be injected in the same manner for non-constant expressions as it is done for columns (fields).
example:
SELECT * FROM t WHERE col_str = 1 * col_int;
becomes:
SELECT * FROM t WHERE CAST(col_str AS DOUBLE) = CAST((1 * col_int) AS DOUBLE));
Note:
For comparison as double(real) there are only two scenarios where we skip injecting the CAST nodes:
- when comparison is between numeric arguments - when arguments are of type YEAR and TIME (we fallback to comparison as DOUBLE as a last resort)
The Arg_comparator's inject_cast_nodes method previously returned early when it recognized that one of the arguments was a STRING based data type (for the REAL comparator) or a STRING based data type excluding the temporal ones (for the DATETIME comparator). This was done by checking that the arguments are only of temporal type or numeric. With this WL, STRING based arguments are also accepted and wrapped with CAST nodes as needed.
Identifying the data type for an argument in MySQL is done using the following functions belonging to the Item class:
- result_type(): which can be INT_RESULT, REAL_RESULT and DECIMAL_RESULT for the numeric data types (plus YEAR), or STRING_RESULT for the STRING related data types (CHAR, VARCHAR, BLOB, VARBINARY..), or the DATE/TIME types.
- data_type(): returns an exact data type (some of which are only used internally like MYSQL_TYPE_TIME2) to more precisely discriminate between the arguments