This section describes issues that you may face when converting
        character data between the utf8mb3 and
        utf8mb4 character sets.
          This discussion focuses primarily on converting between
          utf8mb3 and utf8mb4, but
          similar principles apply to converting between the
          ucs2 character set and character sets such
          as utf16 or utf32.
        The utf8mb3 and utf8mb4
        character sets differ as follows:
- utf8mb3supports only characters in the Basic Multilingual Plane (BMP).- utf8mb4additionally supports supplementary characters that lie outside the BMP.
- utf8mb3uses a maximum of three bytes per character.- utf8mb4uses a maximum of four bytes per character.
          This discussion refers to the utf8mb3 and
          utf8mb4 character set names to be explicit
          about referring to 3-byte and 4-byte UTF-8 character set data.
        One advantage of converting from utf8mb3 to
        utf8mb4 is that this enables applications to
        use supplementary characters. One tradeoff is that this may
        increase data storage space requirements.
      
        In terms of table content, conversion from
        utf8mb3 to utf8mb4
        presents no problems:
- For a BMP character, - utf8mb4and- utf8mb3have identical storage characteristics: same code values, same encoding, same length.
- For a supplementary character, - utf8mb4requires four bytes to store it, whereas- utf8mb3cannot store the character at all. When converting- utf8mb3columns to- utf8mb4, you need not worry about converting supplementary characters because there are none.
In terms of table structure, these are the primary potential incompatibilities:
- For the variable-length character data types ( - VARCHARand the- TEXTtypes), the maximum permitted length in characters is less for- utf8mb4columns than for- utf8mb3columns.
- For all character data types ( - CHAR,- VARCHAR, and the- TEXTtypes), the maximum number of characters that can be indexed is less for- utf8mb4columns than for- utf8mb3columns.
        Consequently, to convert tables from utf8mb3
        to utf8mb4, it may be necessary to change
        some column or index definitions.
      
        Tables can be converted from utf8mb3 to
        utf8mb4 by using ALTER
        TABLE. Suppose that a table has this definition:
      
CREATE TABLE t1 (
  col1 CHAR(10) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci NOT NULL,
  col2 CHAR(10) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL
) CHARACTER SET utf8mb3;
        The following statement converts t1 to use
        utf8mb4:
      
ALTER TABLE t1
  DEFAULT CHARACTER SET utf8mb4,
  MODIFY col1 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  MODIFY col2 CHAR(10)
    CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;
        The catch when converting from utf8mb3 to
        utf8mb4 is that the maximum length of a
        column or index key is unchanged in terms of
        bytes. Therefore, it is smaller in terms of
        characters because the maximum length of a
        character is four bytes instead of three. For the
        CHAR,
        VARCHAR, and
        TEXT data types, watch for these
        issues when converting your MySQL tables:
- Check all definitions of - utf8mb3columns and make sure they do not exceed the maximum length for the storage engine.
- Check all indexes on - utf8mb3columns and make sure they do not exceed the maximum length for the storage engine. Sometimes the maximum can change due to storage engine enhancements.
        If the preceding conditions apply, you must either reduce the
        defined length of columns or indexes, or continue to use
        utf8mb3 rather than
        utf8mb4.
      
Here are some examples where structural changes may be needed:
- A - TINYTEXTcolumn can hold up to 255 bytes, so it can hold up to 85 3-byte or 63 4-byte characters. Suppose that you have a- TINYTEXTcolumn that uses- utf8mb3but must be able to contain more than 63 characters. You cannot convert it to- utf8mb4unless you also change the data type to a longer type such as- TEXT.- Similarly, a very long - VARCHARcolumn may need to be changed to one of the longer- TEXTtypes if you want to convert it from- utf8mb3to- utf8mb4.
- InnoDBhas a maximum index length of 767 bytes for tables that use- COMPACTor- REDUNDANTrow format, so for- utf8mb3or- utf8mb4columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have- utf8mb3columns with indexes longer than 191 characters, you must index a smaller number of characters.- In an - InnoDBtable that uses- COMPACTor- REDUNDANTrow format, these column and index definitions are legal:- col1 VARCHAR(500) CHARACTER SET utf8mb3, INDEX (col1(255))- To use - utf8mb4instead, the index must be smaller:- col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))Note- For - InnoDBtables that use- COMPRESSEDor- DYNAMICrow format, index key prefixes longer than 767 bytes (up to 3072 bytes) are permitted. Tables created with these row formats enable you to index a maximum of 1024 or 768 characters for- utf8mb3or- utf8mb4columns, respectively. For related information, see Section 17.21, “InnoDB Limits”, and DYNAMIC Row Format.
        The preceding types of changes are most likely to be required
        only if you have very long columns or indexes. Otherwise, you
        should be able to convert your tables from
        utf8mb3 to utf8mb4 without
        problems, using ALTER TABLE as
        described previously.
      
The following items summarize other potential incompatibilities:
- SET NAMES 'utf8mb4'causes use of the 4-byte character set for connection character sets. As long as no 4-byte characters are sent from the server, there should be no problems. Otherwise, applications that expect to receive a maximum of three bytes per character may have problems. Conversely, applications that expect to send 4-byte characters must ensure that the server understands them.
- For replication, if character sets that support supplementary characters are to be used on the source, all replicas must understand them as well. - Also, keep in mind the general principle that if a table has different definitions on the source and replica, this can lead to unexpected results. For example, the differences in maximum index key length make it risky to use - utf8mb3on the source and- utf8mb4on the replica.
        If you have converted to utf8mb4,
        utf16, utf16le, or
        utf32, and then decide to convert back to
        utf8mb3 or ucs2 (for
        example, to downgrade to an older version of MySQL), these
        considerations apply:
- utf8mb3and- ucs2data should present no problems.
- The server must be recent enough to recognize definitions referring to the character set from which you are converting. 
- For object definitions that refer to the - utf8mb4character set, you can dump them with mysqldump prior to downgrading, edit the dump file to change instances of- utf8mb4to- utf8, and reload the file in the older server, as long as there are no 4-byte characters in the data. The older server sees- utf8in the dump file object definitions and create new objects that use the (3-byte)- utf8character set.