The utfmb3
character set has these
characteristics:
Supports BMP characters only (no support for supplementary characters)
Requires a maximum of three bytes per multibyte character.
Applications that use UTF-8 data but require supplementary
character support should use utf8mb4
rather
than utf8mb3
(see
Section 10.9.1, “The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)”).
Exactly the same set of characters is available in
utf8mb3
and ucs2
. That is,
they have the same
repertoire.
utf8
is an alias for
utf8mb3
; the character limit is implicit,
rather than explicit in the name.
utf8mb3
can be used in CHARACTER
SET
clauses, and
utf8mb3_
in collation_substring
COLLATE
clauses, where
collation_substring
is
bin
, czech_ci
,
danish_ci
, esperanto_ci
,
estonian_ci
, and so forth. For example:
CREATE TABLE t (s1 CHAR(1) CHARACTER SET utf8mb3;
SELECT * FROM t WHERE s1 COLLATE utf8mb3_general_ci = 'x';
DECLARE x VARCHAR(5) CHARACTER SET utf8mb3 COLLATE utf8mb3_danish_ci;
SELECT CAST('a' AS CHAR CHARACTER SET utf8) COLLATE utf8_czech_ci;
MySQL immediately converts instances of
utf8mb3
in statements to
utf8
, so in statements such as SHOW
CREATE TABLE
or SELECT CHARACTER_SET_NAME
FROM INFORMATION_SCHEMA.COLUMNS
or SELECT
COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS
, users
will see the name utf8
or
utf8_
.
collation_substring
utf8mb3
is also valid in contexts other than
CHARACTER SET
clauses. For example:
mysqld --character-set-server=utf8mb3
SET NAMES 'utf8mb3'; /* and other SET statements that have similar effect */
SELECT _utf8mb3 'a';
To save space with utf8mb3
, use
VARCHAR
instead of
CHAR
. Otherwise, MySQL must
reserve three bytes for each character in a
CHAR
column that uses
utf8mb3
because that is the maximum
possible length. For example, MySQL must reserve 30 bytes for
a CHAR(10)
column that uses
utf8mb3
.
For additional information about data type storage, see
Section 11.7, “Data Type Storage Requirements”. For information about
InnoDB
physical row storage, including how
InnoDB
tables that use
COMPACT
row format handle UTF-8
CHAR(
columns
internally, see Section 14.8.1.2, “The Physical Row Structure of an InnoDB Table”.
N
)