This section describes how the binary
collation for binary strings compares to _bin
collations for nonbinary strings.
Binary strings (as stored using the
BINARY
,
VARBINARY
, and
BLOB
data types) have a character
set and collation named binary
. Binary
strings are sequences of bytes and the numeric values of those
bytes determine comparison and sort order. See
Section 12.10.8, “The Binary Character Set”.
Nonbinary strings (as stored using the
CHAR
,
VARCHAR
, and
TEXT
data types) have a character
set and collation other than binary
. A given
nonbinary character set can have several collations, each of
which defines a particular comparison and sort order for the
characters in the set. For most character sets, one of these is
the binary collation, indicated by a _bin
suffix in the collation name. For example, the binary collations
for latin1
and big5
are
named latin1_bin
and
big5_bin
, respectively.
utf8mb4
is an exception that has two binary
collations, utf8mb4_bin
and
utf8mb4_0900_bin
; see
Section 12.10.1, “Unicode Character Sets”.
The binary
collation differs from
_bin
collations in several respects,
discussed in the following sections:
Binary strings are sequences of bytes. For the
binary
collation, comparison and sorting
are based on numeric byte values. Nonbinary strings are
sequences of characters, which might be multibyte. Collations
for nonbinary strings define an ordering of the character
values for comparison and sorting. For _bin
collations, this ordering is based on numeric character code
values, which is similar to ordering for binary strings except
that character code values might be multibyte.
A nonbinary string has a character set and is automatically
converted to another character set in many cases, even when
the string has a _bin
collation:
When assigning column values to another column that has a different character set:
UPDATE t1 SET utf8mb4_bin_column=latin1_column; INSERT INTO t1 (latin1_column) SELECT utf8mb4_bin_column FROM t2;
When assigning column values for
INSERT
orUPDATE
using a string literal:SET NAMES latin1; INSERT INTO t1 (utf8mb4_bin_column) VALUES ('string-in-latin1');
When sending results from the server to a client:
SET NAMES latin1; SELECT utf8mb4_bin_column FROM t2;
For binary string columns, no conversion occurs. For cases similar to those preceding, the string value is copied byte-wise.
Collations for nonbinary character sets provide information
about lettercase of characters, so characters in a nonbinary
string can be converted from one lettercase to another, even
for _bin
collations that ignore lettercase
for ordering:
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
mysql> SELECT LOWER('aA'), UPPER('zZ');
+-------------+-------------+
| LOWER('aA') | UPPER('zZ') |
+-------------+-------------+
| aa | ZZ |
+-------------+-------------+
The concept of lettercase does not apply to bytes in a binary string. To perform lettercase conversion, the string must first be converted to a nonbinary string using a character set appropriate for the data stored in the string:
mysql> SET NAMES binary;
mysql> SELECT LOWER('aA'), LOWER(CONVERT('aA' USING utf8mb4));
+-------------+------------------------------------+
| LOWER('aA') | LOWER(CONVERT('aA' USING utf8mb4)) |
+-------------+------------------------------------+
| aA | aa |
+-------------+------------------------------------+
MySQL collations have a pad attribute, which has a value of
PAD SPACE
or NO PAD
:
Most MySQL collations have a pad attribute of
PAD SPACE
.The Unicode collations based on UCA 9.0.0 and higher have a pad attribute of
NO PAD
; see Section 12.10.1, “Unicode Character Sets”.
For nonbinary strings (CHAR
,
VARCHAR
, and TEXT
values), the string collation pad attribute determines
treatment in comparisons of trailing spaces at the end of
strings:
For
PAD SPACE
collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.NO PAD
collations treat trailing spaces as significant in comparisons, like any other character.
The differing behaviors can be demonstrated using the two
utf8mb4
binary collations, one of which is
PAD SPACE
, the other of which is
NO PAD
. The example also shows how to use
the INFORMATION_SCHEMA
COLLATIONS
table to determine the
pad attribute for collations.
mysql> SELECT COLLATION_NAME, PAD_ATTRIBUTE
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE 'utf8mb4%bin';
+------------------+---------------+
| COLLATION_NAME | PAD_ATTRIBUTE |
+------------------+---------------+
| utf8mb4_bin | PAD SPACE |
| utf8mb4_0900_bin | NO PAD |
+------------------+---------------+
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 1 |
+------------+
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_0900_bin;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 0 |
+------------+
“Comparison” in this context does not include
the LIKE
pattern-matching
operator, for which trailing spaces are significant,
regardless of collation.
For binary strings (BINARY
,
VARBINARY
, and BLOB
values), all bytes are significant in comparisons, including
trailing spaces:
mysql> SET NAMES binary;
mysql> SELECT 'a ' = 'a';
+------------+
| 'a ' = 'a' |
+------------+
| 0 |
+------------+
CHAR(
columns
store nonbinary strings N
)N
characters long. For inserts, values shorter than
N
characters are extended with
spaces. For retrievals, trailing spaces are removed.
BINARY(
columns store binary strings N
)N
bytes long. For inserts, values shorter than
N
bytes are extended with
0x00
bytes. For retrievals, nothing is
removed; a value of the declared length is always returned.
mysql> CREATE TABLE t1 (
a CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
b BINARY(10)
);
mysql> INSERT INTO t1 VALUES ('x','x');
mysql> INSERT INTO t1 VALUES ('x ','x ');
mysql> SELECT a, b, HEX(a), HEX(b) FROM t1;
+------+------------------------+--------+----------------------+
| a | b | HEX(a) | HEX(b) |
+------+------------------------+--------+----------------------+
| x | 0x78000000000000000000 | 78 | 78000000000000000000 |
| x | 0x78200000000000000000 | 78 | 78200000000000000000 |
+------+------------------------+--------+----------------------+