WL#2649: Number-to-string conversions
Affects: Server-5.5
—
Status: Complete
Automatic number-to-string conversions which currently return with [VAR]BINARY data type will return with [VAR]CHAR data type with character set = character_set_connection, collation = collation_connection, with exceptions as noted in section '' of the High-Level Specification. Also we will cease to return [VAR]BINARY for a few other functions which do not involve automatic number-to-string conversions.
An example of automatic number-to-string conversion is: CREATE TABLE tk AS SELECT CONCAT(1),COALESCE(1,'a'),UPPER(1); SHOW CREATE TABLE tk; The SHOW will show that the data types are all VARBINARY. Typically the conversion occurs for string functions with numeric inputs when string inputs would be more normal. Functions which are known to return VARBINARY are: (String Functions) CHAR CONCAT CONCAT_WS ELT EXPORT_SET INSERT LCASE LEFT LOWER LPAD LTRIM MID QUOTE REPEAT REPLACE REVERSE RIGHT RPAD RTRIM SOUNDEX SUBSTRING TRIM UCASE UPPER when string inputs would be more normal. (Date and Time Functions) ADDDATE ADDTIME DATE_ADD DATE_SUB DAYNAME GET_FORMAT MONTHNAME SUBDATE SUBTIME TIMESTAMPADD. (Not all functions which return VARBINARY are affected by this task. Only functions which meet the criteria in this worklog task description are affected by this task.) All implicit number-to-string-conversion functions which currently return [VAR]BINARY will return with [VAR]CHAR and character set = character_set_connection, collation = collation_connection, except for functions listed later in section "Exceptions". Since the result currently is VARBINARY, Peter Gulutzan asked for a vote in the thread "Re: BINARY and VARBINARY". The vote was in favour of character_set_connection and collation_connection. Exceptions ---------- Although "SELECT CHAR(5)" returns VARBINARY, it's documented: "By default, CHAR() returns a binary string. To produce a string in a given character set, use the optional USING clause ...". So CHAR(n) results will continue to be VARBINARY. If a function would return a UTF8 character string, then it is unaffected by this worklog task. That is, it will continue to return a UTFf8 character string. For example, CHARSET('a') returns VARCHAR(64) CHARACTER SET UTF8. If a function would return VARBINARY for a string input, then it is unaffected by this worklog task (except as stated elewhere in this description). That is, it will continue to return VARBINARY even if there is an implicit conversion to a string. For example, AES_DECRYPT(1,1) will still return VARBINARY, because AES_DECRYPT('a','a') returns VARBINARY. If there is aggregation, as with CASE, COALESCE, CONCAT(multiple values), CONCAT_WS, GREATEST, GROUP_CONCAT(multiple values), LEAST, MAX, MIN, UNION, the rules for aggregation are what matters, the result when there are aggregated multiple values of wildly different data types is VARBINARY. > And now, consider this: > > > > mysql> set @@group_concat_max_len = 100; > > Query OK, 0 rows affected (0.00 sec) > > > > mysql> create table t1 (s1 int); > > Query OK, 0 rows affected (0.06 sec) > > > > mysql> create table t2 as select group_concat(s1) from t1; > > Query OK, 1 row affected (0.10 sec) > > Records: 1 Duplicates: 0 Warnings: 0 > > > > mysql> show create table t2; > > +-------+-------------------------------------------------------------------------------------------------------------+ > > | Table | Create Table | > > +-------+-------------------------------------------------------------------------------------------------------------+ > > | t2 | CREATE TABLE `t2` ( > > `group_concat(s1)` varbinary(100) DEFAULT NULL > > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | > > +-------+-------------------------------------------------------------------------------------------------------------+ > > 1 row in set (0.00 sec) > > > > This doesn't match what the manual says > > http://dev.mysql.com/doc/refman/6.0/en/group-by-functions.html#function_group-concat > > Does anyone think it should be noted in the worklog task description? Yes, it will be naturally fixed when number-to-string conversion will be fixed. But I suppose the manual should be corrected anyway. If you mean the part "The type returned by GROUP_CONCAT() is always VARCHAR" - it will be VARBINARY if argument type is VARBINARY. Exceptions to the exceptions ---------------------------- After checking with MySQL's connectors people, Sergei Golubchik suggested that the following functions, which return VARBINARY now, should return VARCHAR (with connection character set: MD5, OLD_PASSWORD, PASSWORD, SHA, SHA1, SHA2. The following functions, which return LONGBLOB now, should return VARCHAR (with connection character set): ASTEXT, ASWKT. All encryption functions that return HEX-encoded strings that return VARBINARY now, should return VARCHAR. The implementor may in all cases use TEXT or LONGTEXT rather than VARCHAR, if he/she finds it necessary due to length, or finds it appropriate so as to look more compatible with other functions that return LONGBLOB. There will be no change for other functions which may appear similar to some of the above. For example we considered changing AES_DECRYPT and DES_ENCRYPT and UNCOMPRESS and UNHEX, and decided: no change. See dev-private email thread "Re: ARCH REVIEW REQUEST: WL#2649 Number-to-string conversions": [mysql intranet]/secure/mailarchive/mail.php?folder=4&mail=27531 Other automatic conversions --------------------------- Although the title of this worklog task is "Number-to-string conversions", the same effect occurs for any conversion of datetime to string. For example: CREATE TABLE tm (s1 TIME); SELECT UCASE(s1) FROM tm; /* result is VARBINARY(8) */ So let us extend the principle as follows: If datetime-to-string conversion causes VARBINARY result, and numeric-to-string conversion would cause VARBINARY result, and non-conversion would not cause VARBINARY result, and the function is not listed among "Exceptions" above, then result should have character_set_connection, collation_connection. This will not cause difficulty because the characters that might appear in DATE or TIME or DATETIME or TIMESTAMP values, i.e. - / . : + 0-9 A-Z, are in all character sets. The assumption is that date values will never contain a (possibly Japanese) month name. ascii ----- On previous occasions: - Sergei Golubchik suggested the result should be ascii (email thread "charset of the int->char conversion") - Monty Widenius suggested that for the expression select "0"=0 there should be a number-to-string conversion and the result is ascii - in Prague Alexander Barkov and Peter Gulutzan got the impression that Sergei and Monty wanted the result to be ascii, collation ascii_general_ci. However, any such impressions are superseded. A previous version of this worklog description said: "Also, when concatenating or comparing ascii with another character set, e.g. CONCAT(CONCAT(1),'a'), the ascii will be converted to the other character set. There will be no change for explicit conversions." However, that too is superseded. Now this worklog has nothing to do with ascii. Implicit conversions to dates or times -------------------------------------- ADDDATE(date_column,5) is a DATE. ADDDATE(datetime_or_timestamp_column,5) is a DATETIME. ADDDATE(any_other_kind_of_column,5) is a VARBINARY. Nobody wants the result to be VARBINARY. So, for data types other than DATE or DATETIME (including TIME!), there are two proposals and we'll pick one of them. Alexander Barkov says: The result should be [VAR]CHAR with connection character set. Peter Gulutzan says: For ADDDATE(expr,...), DATE_SUB(expr,...), SUBDATE(expr,...), TIMESTAMP(expr,...), TIMESTAMPADD(...,...,expr), do an implicit CAST(expr AS DATETIME) and return DATETIME. For ADDTIME(expr,...), SUBTIME(expr,...), TIMEDIFF(expr,...) do an implicit CAST(CAST(expr AS TIME) AS DATETIME) and return DATETIME, or do an implicit CAST(expr as DATETIME) and return DATETIME. Peter has no idea how MySQL decides which implicit CAST to do, but it seems to be happening already. See also BUG#31384 "DATE_ADD() and DATE_SUB() return binary data". Internal emails about this subject are in this worklog task's "Progress" report for on and around 2009-02-26. People affected: Gleb Shchepa, Alexander Barkov, Evgeny Potemkin, Peter Gulutzan. This in an incompatible change. DAYNAME and MONTHNAME ---------------------- The original plan was to do something about DAYNAME and MONTHNAME here. See also BUG#24494 dayname() function ignores my collation, uses binary or some Case Sensitive [Closed]. There it was clarified that DAYNAME() and MONTHNAME() return [VAR]BINARY]. The comments also suggest that this is adequately documented. But, once again, we don't want VARBINARY. But we have already changed the behaviour of DAYNAME() and MONTHNAME() for BUG#37575 "UCASE fails on monthname and dayname". So for this worklog task they have no relevance. See also BUG#31555 IFNULL(some_field, 123) doesn't convert charset from UTF8 MONTHNAME Proofs ------ Execute any of the following one-line scripts and you'll see that SHOW CREATE TABLE indicates that t has a column with data type BINARY, VARBINARY, or LONGBLOB. When this task is complete the data type will be CHAR or VARCHAR or TEXT or LONGTEXT. create table t as select ADDDATE(5,5); show create table t; create table t as select ADDTIME(5,5); show create table t; create table t as select ASTEXT(5); show create table t; create table t as select ASWKT(5); show create table t; create table t as select CONCAT(5); show create table t; create table t as select CONCAT_WS(5,5); show create table t; create table t as select DATE_ADD(5,INTERVAL 5 SECOND); show create table t; create table t as select DATE_SUB(5,INTERVAL 5 SECOND); show create table t; create table t as select ELT(5,5); show create table t; create table t as select EXPORT_SET(5,5,5); show create table t; create table t as select GROUP_CONCAT(1); show create table t; create table t as select IF(1,'a',5); show create table t; create table t as select IFNULL(1,'a'); show create table t; create table t as select INSERT(5,0,0,'a'); show create table t; create table t as select LCASE(5); show create table t; create table t as select LEFT(5,5); show create table t; create table t as select LOWER(5); show create table t; create table t as select LPAD(5,3,' '); show create table t; create table t as select LTRIM(5); show create table t; create table t as select MD5('a'); show create table t; create table t as select MID(5,5,5); show create table t; create table t as select OLD_PASSWORD('a'); create table t as select PASSWORD('a'); show create table t; create table t as select QUOTE(5); show create table t; create table t as select REPEAT(5,5); show create table t; create table t as select REPLACE(5,5,5); show create table t; create table t as select REVERSE(5); show create table t; create table t as select RIGHT(5,5); show create table t; create table t as select RPAD(5,5,5); show create table t; create table t as select RTRIM(5); show create table t; create table t as select SHA('a'); show create table t; create table t as select SHA1('a'); show create table t; create table t as select SHA2('a','a'); show create table t; create table t as select SOUNDEX(5); show create table t; create table t as select SUBDATE(5,5); show create table t; create table t as select SUBSTR(5,5,5); show create table t; create table t as select SUBSTRING(5,5,5); show create table t; create table t as select SUBTIME(5,5); show create table t; create table t as select TIMESTAMPADD(minute,1,1); show create table t; create table t as select TRIM(5); show create table t; create table t as select UCASE(5); show create table t; create table t as select UPDATEXML(5,5,5); show create table t; Bug reports ----------- Most of the following were marked "not a bug" or "to be fixed later". Since behaviour is changing, we might revisit them and change comments. BUG#6257 UPPER(MD5('xxx')) BUG#7008 UPPER(MD5()) does not work BUG#7739 UPPER(CONCAT(...)) lowercase when CONCAT() deals with mixed types BUG#10320 UPPER(MD5(NOW())) or UPPER(SHA1(NOW())) do not capitalize BUG#12030 CONCAT with INTEGER modifies result-set encoding BUG#14783 UPPER/UCASE function not working alongside ENCRYPT/MD5 functions BUG#19412 UPPER, & UCASE on CONCAT expression of mixed type fail BUG#26269 UPPER(CONCAT(...)) does not convert to uppercase BUG#27216 coalesce returns varbinary for dates BUG#30235 Bad result expected in DATETIME Function with MyODBC BUG#31384 DATE_ADD() and DATE_SUB() return binary data BUG#31555 IFNULL(some_field, 123) doesn't convert charset from UTF8 BUG#31601 decode function recognize the charset as binary BUG#31073 UPPER() does not work on MD5() function when Inserting BUG#34068 CURRENT_DATE() returns a binary string for non-binary data BUG#34388 CONCAT Change result DataType BUG#36347 Order by on decrypted data appears to be case sensitive BUG#46735 GROUP_CONCAT colation or something
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.