WL#2649: Number-to-string conversions

Affects: Server-5.5   —   Status: Complete   —   Priority: Low

Automatic number-to-string conversions which currently return
with [VAR]BINARY data type will return with [VAR]CHAR data type
with character set = character_set_connection,
collation = collation_connection, with exceptions as
noted in section '<Exceptions>' of the High-Level Specification.

Also we will cease to return [VAR]BINARY for a few other functions
which do not involve automatic number-to-string conversions.


An example of automatic number-to-string conversion is:
CREATE TABLE tk AS SELECT CONCAT(1),COALESCE(1,'a'),UPPER(1);
SHOW CREATE TABLE tk;
The SHOW will show that the data types are all VARBINARY.

Typically the conversion occurs for string functions with
numeric inputs when string inputs would be more normal.
Functions which are known to return VARBINARY are:
(String Functions)
CHAR CONCAT CONCAT_WS ELT EXPORT_SET INSERT
LCASE LEFT LOWER LPAD LTRIM MID QUOTE REPEAT REPLACE
REVERSE RIGHT RPAD RTRIM SOUNDEX SUBSTRING TRIM UCASE UPPER
when string inputs would be more normal.
(Date and Time Functions)
ADDDATE ADDTIME DATE_ADD DATE_SUB DAYNAME GET_FORMAT
MONTHNAME SUBDATE SUBTIME TIMESTAMPADD.
(Not all functions which return VARBINARY are affected by this task.
Only functions which meet the criteria in this worklog task description
are affected by this task.)

All implicit number-to-string-conversion functions which
currently return [VAR]BINARY will return with [VAR]CHAR and
character set = character_set_connection,
collation = collation_connection,
except for functions listed later in section "Exceptions".

Since the result currently is VARBINARY, Peter Gulutzan
asked for a vote in the thread "Re: BINARY and VARBINARY".
The vote was in favour of character_set_connection and
collation_connection.

Exceptions
----------

Although "SELECT CHAR(5)" returns VARBINARY, it's documented:
"By default, CHAR() returns a binary string. To produce a string
in a given character set, use the optional USING clause ...".
So CHAR(n) results will continue to be VARBINARY.

If a function would return a UTF8 character string,
then it is unaffected by this worklog task. That is,
it will continue to return a UTFf8 character string.
For example, CHARSET('a') returns VARCHAR(64) CHARACTER SET UTF8.

If a function would return VARBINARY for a string input,
then it is unaffected by this worklog task (except as
stated elewhere in this description). That is, it
will continue to return VARBINARY even if there is an
implicit conversion to a string. For example,
AES_DECRYPT(1,1) will still return VARBINARY,
because AES_DECRYPT('a','a') returns VARBINARY.

If there is aggregation, as with CASE, COALESCE, CONCAT(multiple values),
CONCAT_WS, GREATEST, GROUP_CONCAT(multiple values), LEAST, MAX, MIN, UNION,
the rules for aggregation are what matters, the result when there are
aggregated multiple values of wildly different data types is VARBINARY.

> And now, consider this:
> > 
> > mysql> set @@group_concat_max_len = 100;
> > Query OK, 0 rows affected (0.00 sec)
> > 
> > mysql> create table t1 (s1 int);
> > Query OK, 0 rows affected (0.06 sec)
> > 
> > mysql> create table t2 as select group_concat(s1) from t1;
> > Query OK, 1 row affected (0.10 sec)
> > Records: 1  Duplicates: 0  Warnings: 0
> > 
> > mysql> show create table t2;
> >
+-------+-------------------------------------------------------------------------------------------------------------+
> > | Table | Create Table                                                     
                                          |
> >
+-------+-------------------------------------------------------------------------------------------------------------+
> > | t2    | CREATE TABLE `t2` (
> >   `group_concat(s1)` varbinary(100) DEFAULT NULL
> > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
> >
+-------+-------------------------------------------------------------------------------------------------------------+
> > 1 row in set (0.00 sec)
> > 
> > This doesn't match what the manual says
> >
http://dev.mysql.com/doc/refman/6.0/en/group-by-functions.html#function_group-concat
> > Does anyone think it should be noted in the worklog task description?

Yes, it will be naturally fixed when number-to-string conversion will be
fixed. But I suppose the manual should be corrected anyway. If you mean
the part "The type returned by GROUP_CONCAT() is always VARCHAR" - it
will be VARBINARY if argument type is VARBINARY.
 



Exceptions to the exceptions
----------------------------

After checking with MySQL's connectors people, Sergei Golubchik
suggested that the following functions, which return VARBINARY now,
should return VARCHAR (with connection character set:
MD5, OLD_PASSWORD, PASSWORD, SHA, SHA1, SHA2. The following
functions, which return LONGBLOB now, should return VARCHAR
(with connection character set):
ASTEXT, ASWKT.

All encryption functions that return HEX-encoded strings
that return VARBINARY now, should return VARCHAR.

The implementor may in all cases use TEXT or LONGTEXT rather
than VARCHAR, if he/she finds it necessary due to length, or
finds it appropriate so as to look more compatible with other
functions that return LONGBLOB.

There will be no change for other functions which may appear
similar to some of the above. For example we considered changing
AES_DECRYPT and DES_ENCRYPT and UNCOMPRESS and UNHEX, and decided:
no change. See dev-private email thread "Re: ARCH REVIEW REQUEST:
WL#2649 Number-to-string conversions":
[mysql intranet]/secure/mailarchive/mail.php?folder=4&mail=27531

Other automatic conversions
---------------------------

Although the title of this worklog task is
"Number-to-string conversions", the same effect
occurs for any conversion of datetime to string.
For example:
CREATE TABLE tm (s1 TIME);
SELECT UCASE(s1) FROM tm;    /* result is VARBINARY(8) */

So let us extend the principle as follows:
If datetime-to-string conversion causes VARBINARY result,
and numeric-to-string conversion would cause VARBINARY result,
and non-conversion would not cause VARBINARY result,
and the function is not listed among "Exceptions" above,
then result should have character_set_connection, collation_connection.

This will not cause difficulty because the characters that might
appear in DATE or TIME or DATETIME or TIMESTAMP values, i.e.
- / . : + 0-9 A-Z, are in all character sets. The assumption
is that date values will never contain a (possibly Japanese)
month name.

ascii
-----

On previous occasions:
- Sergei Golubchik suggested the result should be ascii
  (email thread "charset of the int->char conversion")
- Monty Widenius suggested that for the expression select "0"=0 
  there should be a number-to-string conversion and the result is ascii
- in Prague Alexander Barkov  and Peter Gulutzan got the impression that
  Sergei and Monty wanted the result to be ascii, collation ascii_general_ci.
However, any such impressions are superseded.

A previous version of this worklog description said:
"Also, when concatenating or comparing ascii with another 
character set, e.g. CONCAT(CONCAT(1),'a'), the ascii 
will be converted to the other character set. 
There will be no change for explicit conversions."
However, that too is superseded.

Now this worklog has nothing to do with ascii.

Implicit conversions to dates or times
--------------------------------------

ADDDATE(date_column,5) is a DATE.
ADDDATE(datetime_or_timestamp_column,5) is a DATETIME.
ADDDATE(any_other_kind_of_column,5) is a VARBINARY.

Nobody wants the result to be VARBINARY. So, for
data types other than DATE or DATETIME (including TIME!),
there are two proposals and we'll pick one of them.

Alexander Barkov says:
The result should be [VAR]CHAR with connection character set.

Peter Gulutzan says:
For ADDDATE(expr,...), DATE_SUB(expr,...), SUBDATE(expr,...),
TIMESTAMP(expr,...), TIMESTAMPADD(...,...,expr), do an implicit
CAST(expr AS DATETIME) and return DATETIME.
For ADDTIME(expr,...), SUBTIME(expr,...), TIMEDIFF(expr,...)
do an implicit CAST(CAST(expr AS TIME) AS DATETIME) and return DATETIME, or
do an implicit CAST(expr as DATETIME) and return DATETIME.
Peter has no idea how MySQL decides which implicit CAST to do,
but it seems to be happening already.

See also BUG#31384 "DATE_ADD() and DATE_SUB() return binary data".
Internal emails about this subject are in this worklog task's
"Progress" report for on and around 2009-02-26. People affected:
Gleb Shchepa, Alexander Barkov, Evgeny Potemkin, Peter Gulutzan.

This in an incompatible change.

DAYNAME and MONTHNAME
----------------------

The original plan was to do something about DAYNAME and MONTHNAME here.
See also BUG#24494 dayname() function ignores my collation, uses binary
or some Case Sensitive [Closed]. There it was clarified that DAYNAME()
and MONTHNAME() return [VAR]BINARY]. The comments also suggest that
this is adequately documented. But, once again, we don't want VARBINARY.

But we have already changed the behaviour of DAYNAME() and MONTHNAME()
for BUG#37575 "UCASE fails on monthname and dayname". So for this
worklog task they have no relevance.

See also BUG#31555 IFNULL(some_field, 123) doesn't convert charset from UTF8
MONTHNAME

Proofs
------

Execute any of the following one-line scripts and you'll
see that SHOW CREATE TABLE indicates that t has a column
with data type BINARY, VARBINARY, or LONGBLOB. When this
task is complete the data type will be CHAR or VARCHAR
or TEXT or LONGTEXT.

create table t as select ADDDATE(5,5); show create table t;
create table t as select ADDTIME(5,5); show create table t;
create table t as select ASTEXT(5); show create table t;
create table t as select ASWKT(5); show create table t;
create table t as select CONCAT(5); show create table t;
create table t as select CONCAT_WS(5,5); show create table t;
create table t as select DATE_ADD(5,INTERVAL 5 SECOND); show create table t;
create table t as select DATE_SUB(5,INTERVAL 5 SECOND); show create table t;
create table t as select ELT(5,5); show create table t;
create table t as select EXPORT_SET(5,5,5); show create table t;
create table t as select GROUP_CONCAT(1); show create table t;
create table t as select IF(1,'a',5); show create table t;
create table t as select IFNULL(1,'a'); show create table t;
create table t as select INSERT(5,0,0,'a'); show create table t;
create table t as select LCASE(5); show create table t;
create table t as select LEFT(5,5); show create table t;
create table t as select LOWER(5); show create table t;
create table t as select LPAD(5,3,' '); show create table t;
create table t as select LTRIM(5); show create table t;
create table t as select MD5('a'); show create table t;
create table t as select MID(5,5,5); show create table t;
create table t as select OLD_PASSWORD('a');
create table t as select PASSWORD('a'); show create table t;
create table t as select QUOTE(5); show create table t;
create table t as select REPEAT(5,5); show create table t;
create table t as select REPLACE(5,5,5); show create table t;
create table t as select REVERSE(5); show create table t;
create table t as select RIGHT(5,5); show create table t;
create table t as select RPAD(5,5,5); show create table t;
create table t as select RTRIM(5); show create table t;
create table t as select SHA('a'); show create table t;
create table t as select SHA1('a'); show create table t;
create table t as select SHA2('a','a'); show create table t;
create table t as select SOUNDEX(5); show create table t;
create table t as select SUBDATE(5,5); show create table t;
create table t as select SUBSTR(5,5,5); show create table t;
create table t as select SUBSTRING(5,5,5); show create table t;
create table t as select SUBTIME(5,5); show create table t;
create table t as select TIMESTAMPADD(minute,1,1); show create table t;
create table t as select TRIM(5); show create table t;
create table t as select UCASE(5); show create table t;
create table t as select UPDATEXML(5,5,5); show create table t;

Bug reports
-----------

Most of the following were marked "not a bug" or "to be fixed later".
Since behaviour is changing, we might revisit them and change comments.

BUG#6257 UPPER(MD5('xxx'))
BUG#7008 UPPER(MD5()) does not work
BUG#7739 UPPER(CONCAT(...)) lowercase when CONCAT() deals with mixed types
BUG#10320 UPPER(MD5(NOW())) or UPPER(SHA1(NOW())) do not capitalize
BUG#12030 CONCAT with INTEGER modifies result-set encoding
BUG#14783 UPPER/UCASE function not working alongside ENCRYPT/MD5 functions
BUG#19412 UPPER, & UCASE on CONCAT expression of mixed type fail
BUG#26269 UPPER(CONCAT(...)) does not convert to uppercase
BUG#27216 coalesce returns varbinary for dates
BUG#30235 Bad result expected in DATETIME Function with MyODBC
BUG#31384 DATE_ADD() and DATE_SUB() return binary data
BUG#31555 IFNULL(some_field, 123) doesn't convert charset from UTF8
BUG#31601 decode function recognize the charset as binary
BUG#31073 UPPER() does not work on MD5() function when Inserting
BUG#34068 CURRENT_DATE() returns a binary string for non-binary data
BUG#34388 CONCAT Change result DataType
BUG#36347 Order by on decrypted data appears to be case sensitive
BUG#46735 GROUP_CONCAT colation or something