Documentation Home
MySQL 5.7 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 35.4Mb
PDF (A4) - 35.6Mb
PDF (RPM) - 33.8Mb
EPUB - 8.5Mb
HTML Download (TGZ) - 8.3Mb
HTML Download (Zip) - 8.4Mb
HTML Download (RPM) - 7.2Mb
Eclipse Doc Plugin (TGZ) - 9.2Mb
Eclipse Doc Plugin (Zip) - 11.3Mb
Man Pages (TGZ) - 198.4Kb
Man Pages (Zip) - 302.3Kb
Info (Gzip) - 3.2Mb
Info (Zip) - 3.2Mb
Excerpts from this Manual

MySQL 5.7 Reference Manual  /  ...  /  The utf8 Character Set (3-Byte UTF-8 Unicode Encoding) The utf8 Character Set (3-Byte UTF-8 Unicode Encoding)

UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data. It is implemented according to RFC 3629, which describes encoding sequences that take from one to four bytes. (An older standard for UTF-8 encoding, RFC 2279, describes UTF-8 sequences that take from one to six bytes. RFC 3629 renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.)

The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different lengths:

  • Basic Latin letters, digits, and punctuation signs use one byte.

  • Most European and Middle East script letters fit into a 2-byte sequence: extended Latin letters (with tilde, macron, acute, grave and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac, and others.

  • Korean, Chinese, and Japanese ideographs use 3-byte or 4-byte sequences.

The utf8 character set is the same in MySQL 5.7 as before 5.7 and has exactly the same characteristics:

  • No support for supplementary characters (BMP characters only).

  • A maximum of three bytes per multibyte character.

Exactly the same set of characters is available in utf8 and ucs2. That is, they have the same repertoire.


To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible character length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.

For additional information about data type storage, see Section 12.8, “Data Type Storage Requirements”. For information about InnoDB physical row storage, including how InnoDB tables that use COMPACT row format handle UTF-8 CHAR(N) columns internally, see Section, “Physical Row Structure”.

User Comments
Sign Up Login You must be logged in to post a comment.