WL#12108: Inject type cast nodes into the item tree to avoid data type mismatches
Affects: Server-8.0 — Status: Complete
Add implicit type cast operations into the item tree inside expressions and conditions that have a mismatch between the provided argument's data type and the expected data type. MySQL can compare arguments of any data type with arguments of any other data type, either by internally converting one of the arguments to the other one's data type, or converting both to DOUBLE, but in many cases this is not standard compliant. The standard SQL way of converting to different data types is to use explicit type casts. A step in this direction is to transparently inject CAST nodes, this way the query will get converted into a compliant query and executed with the semantics of a compliant query. This change implies MySQL will go through the item tree and insert the type cast nodes automatically if the user neglects to do so, at least for the time being. This way, the behavior of the server is backward compatible and the executed query becomes equivalent to a standard compliant one. Example: when t1.d_time is TIME and t2.d_date is DATE SELECT * from t1 inner join t2 on t1.d_time = t2.d_date; after this WL the query will be executed as: SELECT * from t1 inner join t2 ON CAST(t1.d_time AS DATETIME) = CAST(t2.d_date AS DATETIME); This is not a solution to handle the problem with data types with different value ranges ex. for SMALLINT and LONGINT the operation will be executed on the longest type of integer without inserting any cast nodes. Why do it this way? - long term solution - internally, converts MySQL's non standard functionality into standard-compliant functionality - mostly transparent to the end-users (still, EXPLAIN shows the actual query that gets executed internally) - in the long run it will help us ditch the val_* functions
F-1 Cast nodes will be injected in the comparisons between non-constant fields/ non-constant functions(regular functions and aggregation functions), in a way that it stays semantically equivalent to what MySQL currently does at the moment. This way we're replacing the implicit casts that are currently done by default with explicit casts. (UDFs, stored functions, stored procedures are excluded, for now but will be added in a future WL for completeness) F-2 The query results should not be any different than the ones before this change but the EXPLAIN output will reflect these changes by showing the cast nodes injected. F-3 The order of the fields should not matter as the transformation should be symmetric. F-4 The execution order of the fields will not be changed when wrapping any of the arguments in CAST nodes. F-5 The operators that should be covered by these changes are: =, >=, >, <, <=, <> and <=> e.g. WHERE col_date < col_datetime, will become: WHERE cast(col_date as DATETIME) < col_datetime F-6 This WL will be limited to casts * from numeric/temporal arguments to DOUBLE in comparisons between a numeric and a temporal data type argument: * from DATE/TIME arguments to DATETIME in comparisons between arguments of different temporal types Where numeric data types are defined as: * Integer Types (Exact Value): INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT * Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC * Floating-Point Types (Approximate Value) - FLOAT, DOUBLE, REAL * Bit-Value Type - BIT And temporal data types: * DATE, DATETIME, and TIMESTAMP Types * TIME Type N-1 Performance of existing queries should not be negatively impacted in a significant way.
MySQL sets the comparison type in Arg_comparator::set_compare_func, Arg_comparator::set_cmp_func, and it makes the decision based on the result type of the arguments(INT_RESULT, DECIMAL_RESULT, REAL_RESULT, STRING_RESULT) and, for arguments having temporal data types it also takes into consideration the more specific data type they have. For DATETIME comparisons: Arg_comparator::set_cmp_func decides if the comparison will be done as DATETIME, setting the comparison function to `Arg_comparator::compare_datetime`. We'll inject a CAST node if both arguments are either fields or functions and: - if one of the arguments is DATETIME/TS and the other one is DATE or TIME, then the latter one gets converted to DATETIME - if one argument is DATE and the other one is TIME, both arguments get converted to DATETIME For example: SELECT * FROM t1 WHERE col_date_time >= col_date; Will become: SELECT * FROM t1 WHERE CAST(col_date_time AS DATETIME) >= CAST(col_date AS DATETIME); And SELECT * FROM t1 WHERE col_time < col_date; Will become: SELECT * FROM t1 WHERE CAST(col_time AS DATETIME) < CAST(col_date AS DATETIME); But SELECT * FROM t1 WHERE col_date = col_date2; Will remain unchanged For comparisons as DOUBLE: In Arg_comparator::set_compare_func, it is already decided that the comparison will be done as floating-point (DOUBLE more precisely). In this case the cast to DOUBLE is injected if the following conditions are met: - one of the arguments is of temporal type and the other one is numeric - both arguments are fields or functions Only the arguments that are not FLOAT/DOUBLE will be CAST to DOUBLE. Ex: SELECT * FROM t1 WHERE col_int <> col_date; Will become: SELECT * FROM t1 WHERE CAST(col_int AS DOUBLE) <> CAST(col_date AS DOUBLE); Since the injection of the CAST nodes will happen towards the end of `JOIN::optimize()` no query plan will be altered. This also means that the nodes needs to be tracked and reverted after execution.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.