WL#3792: BINARY and VARBINARY changes

Affects: Server-7.1   —   Status: Un-Assigned

SQL:2008, the replacement for the SQL:2003
standard, has a new non-core feature:
Feature T021, "BINARY and VARBINARY data types".

MySQL will make some changes to its existing
support for BINARY/VARBINARY when the standard
is out.
Data types
----------

For a long time, there was controversy whether
BINARY/VARBINARY were data types, or merely variants
of CHAR/VARCHAR data types. The decision, finally,
was that BINARY and VARBINARY are data types.

The SQL:200x document says that they are data types.
There will be no change.

Padding 0x00 for comparison
---------------------------

With CHAR/VARCHAR, 'A   '= 'A'. That is, all
collations are "pad space"; in effect MySQL adds
spaces to the end of the shorter string before
comparing. The question arose: for BINARY/VARBINARY,
should there be 0x00 padding? If so,
X'01' = X'0100'. If not, X'01' < X'0100'.

The decision was: X'01' < X'0100'. There is no
padding for BINARY/VARBINARY.

For example, this is MySQL current behaviour:

mysql> create table tb (s1 binary(5));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tb values (X'01');
Query OK, 1 row affected (0.01 sec)

mysql> select count(*) from tb where s1 = X'01';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.35 sec)

The SQL:200x document says that such behaviour is
"implementation-defined". There will be no change.

Truncating 0x00 for assignment
-------------------------------

Currently, if one tries to assign X'0100' to a
one-byte BINARY/VARBINARY target (that is, the value
is longer than the maximum due to trailing 0x00 bytes),
MySQL gives a warning in non-strict mode and an error in
strict mode (if sql_mode=STRICT_TRANS_TABLES or
STRICT_ALL_TABLES).

For example, this is MySQL current behaviour:

mysql> create table ta (s1 binary(5), s2 varbinary(5));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into ta values (X'010000000000',X'010000000000');
Query OK, 1 row affected, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------+
| Level   | Code | Message                                 |
+---------+------+-----------------------------------------+
| Warning | 1265 | Data truncated for column 's1' at row 1 |
| Warning | 1265 | Data truncated for column 's2' at row 1 |
+---------+------+-----------------------------------------+
2 rows in set (0.00 sec)

mysql> set sql_mode=traditional;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into ta values (X'010000000000',X'010000000000');
ERROR 1406 (22001): Data too long for column 's1' at row 1

The SQL:200x document says that such behaviour is wrong --
trailing X'00' bytes can be truncated without error.
Compare what happens with inserting 'A       ' into a
CHAR(1) target -- MySQL doesn't give either warning or
error for that.

However, Alexander Barkov voted not to change MySQL's
current behaviour. There will be no change.

BINARY VARYING
--------------

Currently MySQL allows only one way to describe a
varyiable-length binary data type: VARBINARY.

The SQL:200x document says: allow BINARY VARYING as
a VARBINARY synonym. For comparison, there is a
CHAR VARYING synonym for VARCHAR.

There will be a change: MySQL will allow BINARY
VARYING in definitions of columns and variables.

The SQL:200x document says that one should see
'BINARY VARYING' (not 'VARBINARY') for data_type
columns in information_schema. However, MySQL
already says 'varchar' instead of 'CHARACTER
VARYING', and already says 'varbinary' instead
of 'BINARY VARYING'. There was no vote about
this detail. There will be no change.

Literals
--------

Currently MySQL accepts [var]binary literals in
the standard form: X'hh...' where hh... means
pairs of hexadecimal digits, repeated indefinitely.

For example, this is MySQL current behaviour:

mysql> create table td as select x'01';
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select data_type from information_schema.columns
    -> where table_name='td';
+-----------+
| data_type |
+-----------+
| varbinary |
+-----------+
1 row in set (0.00 sec)

The SQL:200x document says that binary literals may
contain spaces, and may be combined.
That is: X'ABCD' and X'AB' CD' are the same.
And: X'ABCD' and X'AB CD' are the same.

This is similar to what happens for other literals,
see WL#3529 Unicode Escape Sequences. It is optional,
not part of Feature T021.

There will be a change: MySQL will allow such literals.

References
----------

WL#3759 "Data types: BLOB/TEXT
[VAR]CHAR/[VAR]BINARY should be different types"

dev-private thread "BINARY and VARBINARY"
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=18197

feature request:
BUG#26799  	Char and binary types without padding/stripping