The BINARY and VARBINARY
types are similar to CHAR and
VARCHAR, except that they
contain binary strings rather than nonbinary strings. That is,
they contain byte strings rather than character strings. This
means that they have no character set, and sorting and
comparison are based on the numeric values of the bytes in the
values.
The permissible maximum length is the same for
BINARY and VARBINARY as
it is for CHAR and
VARCHAR, except that the length
for BINARY and VARBINARY
is a length in bytes rather than in characters.
The BINARY and VARBINARY
data types are distinct from the CHAR
BINARY and VARCHAR BINARY data
types. For the latter types, the BINARY
attribute does not cause the column to be treated as a binary
string column. Instead, it causes the binary collation for the
column character set to be used, and the column itself
contains nonbinary character strings rather than binary byte
strings. For example, CHAR(5) BINARY is
treated as CHAR(5) CHARACTER SET latin1 COLLATE
latin1_bin, assuming that the default character set
is latin1. This differs from
BINARY(5), which stores 5-bytes binary
strings that have no character set or collation. For
information about differences between nonbinary string binary
collations and binary strings, see
Section 10.1.7.6, “The _bin and binary Collations”.
If strict SQL mode is not enabled and you assign a value to a
BINARY or VARBINARY
column that exceeds the column's maximum length, the value is
truncated to fit and a warning is generated. For cases of
truncation, you can cause an error to occur (rather than a
warning) and suppress insertion of the value by using strict
SQL mode. See Section 5.1.7, “Server SQL Modes”.
When BINARY values are stored, they are
right-padded with the pad value to the specified length. The
pad value and how it is handled is version specific:
As of MySQL 5.0.15, the pad value is
0x00 (the zero byte). Values are
right-padded with 0x00 on insert, and
no trailing bytes are removed on select. All bytes are
significant in comparisons, including ORDER
BY and DISTINCT operations.
0x00 bytes and spaces are different in
comparisons, with 0x00 < space.
Example: For a BINARY(3) column,
'a ' becomes
'a \0' when inserted.
'a\0' becomes
'a\0\0' when inserted. Both inserted
values remain unchanged when selected.
Before MySQL 5.0.15, the pad value is space. Values are
right-padded with space on insert, and trailing spaces are
removed on select. Trailing spaces are ignored in
comparisons, including ORDER BY and
DISTINCT operations.
0x00 bytes and spaces are different in
comparisons, with 0x00 < space.
Example: For a BINARY(3) column,
'a ' becomes
'a ' when inserted and
'a' when selected.
'a\0' becomes
'a\0 ' when inserted and
'a\0' when selected.
For VARBINARY, there is no padding on
insert and no bytes are stripped on select. All bytes are
significant in comparisons, including ORDER
BY and DISTINCT operations.
0x00 bytes and spaces are different in
comparisons, with 0x00 < space.
(Exceptions: Before MySQL 5.0.3, trailing spaces are removed
when values are stored. Before MySQL 5.0.15, trailing 0x00
bytes are removed for ORDER BY operations.)
Note: The InnoDB storage engine
continues to preserve trailing spaces in
BINARY and VARBINARY
column values through MySQL 5.0.18. Beginning with MySQL
5.0.19, InnoDB uses trailing space
characters in making comparisons as do other MySQL storage
engines.
For those cases where trailing pad bytes are stripped or
comparisons ignore them, if a column has an index that
requires unique values, inserting into the column values that
differ only in number of trailing pad bytes will result in a
duplicate-key error. For example, if a table contains
'a', an attempt to store
'a\0' causes a duplicate-key error.
You should consider the preceding padding and stripping
characteristics carefully if you plan to use the
BINARY data type for storing binary data
and you require that the value retrieved be exactly the same
as the value stored. The following example illustrates how
0x00-padding of BINARY
values affects column value comparisons:
mysql>CREATE TABLE t (c BINARY(3));Query OK, 0 rows affected (0.01 sec) mysql>INSERT INTO t SET c = 'a';Query OK, 1 row affected (0.01 sec) mysql>SELECT HEX(c), c = 'a', c = 'a\0\0' from t;+--------+---------+-------------+ | HEX(c) | c = 'a' | c = 'a\0\0' | +--------+---------+-------------+ | 610000 | 0 | 1 | +--------+---------+-------------+ 1 row in set (0.09 sec)
If the value retrieved must be the same as the value specified
for storage with no padding, it might be preferable to use
VARBINARY or one of the
BLOB data types instead.

User Comments
When BINARY or VARBINARY values are stored, e.g. from literal strings like 'abc' or 'Hello', there is of course a character set involved. It' s the standard character set of the operating system that is used to translate each character 'a','b','c' or 'H','e','l','o' to its byte value. (Or byte values for multi-byte character sets.)
Thus, the operating system, with its standard character set, defines how characters are converted into binary values. Only there is no MySQL character set definition involved.
To escape binary values, MySQL supports the following formats: X'val', x'val', or 0xval. The former is SQL standard, the latter is ODBC standard.
Examples:
mysql> SELECT x'4D7953514C';
-> 'MySQL'
mysql> SELECT 0x4D7953514C;
-> 'MySQL'
@Andreas, what it means that a value "has no character set" is that what you describe breaks down if the operating system has a different charset when inserting and when querying. If one user inserts text as binary and he's using for example ASCII, then another user reads the binary as text but he's using utf-16, he'll read a different text.
When you use char types the server takes care of things so that anyone will read the same text that was inserted, regardless of everyone's local encoding settings.
I'm not really sure that the documentation explains whether the binary column is indexed as a number or as a binarystring. This becomes relevant regarding where the LSB is in UUID fields. Typically with sequences (eg auto-increment) it's the least significant value which changes most frequently, whereas with UUID() I notice that it is the most significant value which changes most frequently. Which generates the most efficient indices?
In the end, I guess I am wondering if it would be more efficient to index the reverse of the UUID, or leave it as it is?
So the options I'm interested in are as follows (assumption is that a primary field of BINARY(16) is populated with the function):
CREATE FUNCTION id() RETURNS binary(16) RETURN unhex(REPLACE(UUID(),'-',''));
or
CREATE FUNCTION id() RETURNS binary(16) RETURN unhex(reverse(REPLACE(UUID(),'-','')));
Add your own comment.