This section describes issues that you may face when converting
utf8 character set to the
utf8mb4 character set, or vice versa.
The discussion here focuses primarily on converting between
similar principles apply to converting between the
ucs2 character set and character sets such
character sets differ as follows:
utf8supports only characters in the Basic Multilingual Plane (BMP).
utf8mb4additionally supports supplementary characters that lie outside the BMP.
utf8uses a maximum of three bytes per character.
utf8mb4uses a maximum of four bytes per character.
One advantage of converting from
utf8mb4 is that this enables applications to
use supplementary characters. One tradeoff is that this may
increase data storage space requirements.
In most respects, converting from
utf8mb4 should present few problems. These
are the primary potential areas of incompatibility:
Consequently, to convert tables from
utf8mb4, it may be necessary to change some
column or index definitions.
Tables can be converted from
utf8mb4 by using
TABLE. Suppose that a table was originally defined as
CREATE TABLE t1 ( col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL, col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ) CHARACTER SET utf8;
The following statement converts
t1 to use
ALTER TABLE t1 DEFAULT CHARACTER SET utf8mb4, MODIFY col1 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL, MODIFY col2 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;
In terms of table content, conversion from
For a BMP character,
utf8mb4have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character,
utf8cannot store the character at all, whereas
utf8mb4requires four bytes. Because
utf8cannot store the character at all,
utf8columns have no supplementary characters and you need not worry about converting characters or losing data when converting to
In terms of table structure, the catch when converting from
utf8mb4 is that
the maximum length of a column or index key is unchanged in
terms of bytes. Therefore, it is smaller in
terms of characters because the maximum
length of a character is four bytes instead of three. For the
TEXT data types, watch for these
issues when converting your MySQL tables:
Check all definitions of
utf8columns and make sure they will not exceed the maximum length for the storage engine.
Check all indexes on
utf8columns and make sure they will not exceed the maximum length for the storage engine. Sometimes the maximum can change due to storage engine enhancements.
If the preceding conditions apply, you must either reduce the
defined length of columns or indexes, or continue to use
utf8 rather than
Here are some examples where structural changes may be needed:
TINYTEXTcolumn can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. Suppose that you have a
TINYTEXTcolumn that uses
utf8but must be able to contain more than 63 characters. You cannot convert it to
utf8mb4unless you also change the data type to a longer type such as
InnoDBhas a maximum index length of 767 bytes, so for
utf8mb4columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have
utf8columns with indexes longer than 191 characters, you will need to index a smaller number of characters. In an
InnoDBtable, these column and index definitions are legal:
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
utf8mb4instead, the index must be smaller:
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
The preceding types of changes are most likely to be required
only if you have very long columns or indexes. Otherwise, you
should be able to convert your tables from
ALTER TABLE as
The following items summarize other potential areas of incompatibility:
Performance of 4-byte UTF-8 (
utf8mb4) is slower than for 3-byte UTF-8 (
utf8). To avoid this penalty, continue to use
SET NAMES 'utf8mb4'causes use of the 4-byte character set for connection character sets. As long as no 4-byte characters are sent from the server, there should be no problems. Otherwise, applications that expect to receive a maximum of three bytes per character may have problems. Conversely, applications that expect to send 4-byte characters must ensure that the server understands them. More generally, applications cannot send
utf32character data to an older server that does not understand it.
utf32are not recognized before MySQL 5.5.
For replication, if character sets that support supplementary characters are to be used on the master, all slaves must understand them as well. If you attempt to replicate from a newer master to an older slave,
utf8data will be seen as
utf8by the slave and should replicate correctly. But you cannot send
utf32data to an older slave that does not understand it.
utf32are not recognized before MySQL 5.5.
Also, keep in mind the general principle that if a table has different definitions on the master and slave, this can lead to unexpected results. For example, the differences in maximum index key length make it risky to use
utf8on the master and
utf8mb4on the slave.
If you have converted to
utf32, and then
decide to convert back to
ucs2 (for example, to downgrade to an older
version of MySQL), these considerations apply:
ucs2data should present no problems.
The server must be recent enough to recognize definitions referring to the character set from which you are converting.
For object definitions that refer to the
utf8mb4character set, you can dump them with mysqldump prior to downgrading, edit the dump file to change instances of
utf8, and reload the file in the older server, as long as there are no 4-byte characters in the data. The older server will see
utf8in the dump file object definitions and create new objects that use the (3-byte)