Documentation Home
MySQL 5.7 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 35.7Mb
PDF (A4) - 35.7Mb
PDF (RPM) - 34.8Mb
EPUB - 8.7Mb
HTML Download (TGZ) - 8.5Mb
HTML Download (Zip) - 8.5Mb
HTML Download (RPM) - 7.3Mb
Eclipse Doc Plugin (TGZ) - 9.3Mb
Eclipse Doc Plugin (Zip) - 11.5Mb
Man Pages (TGZ) - 203.6Kb
Man Pages (Zip) - 309.0Kb
Info (Gzip) - 3.4Mb
Info (Zip) - 3.4Mb
Excerpts from this Manual

MySQL 5.7 Reference Manual  /  ...  /  Unicode Support

11.1.9 Unicode Support

The Unicode Standard includes characters from the Basic Multilingual Plane (BMP) and supplementary characters that lie outside the BMP. This section describes support for Unicode in MySQL. For information about the Unicode Standard itself, visit the Unicode Consortium Web site.

BMP characters have these characteristics:

  • Their code point values are between 0 and 65535 (or U+0000 and U+FFFF).

  • They can be encoded in a variable-length encoding using 8, 16, or 24 bits (1 to 3 bytes).

  • They can be encoded in a fixed-length encoding using 16 bits (2 bytes).

  • They are sufficient for almost all characters in major languages.

Supplementary characters lie outside the BMP. Their code point values are between U+10000 and U+10FFFF). Unicode support for supplementary characters requires character sets that have a range outside BMP characters and therefore take more space than BMP characters.

MySQL supports these Unicode character sets:

  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.

  • ucs2, the UCS-2 encoding of the Unicode character set using two bytes per character.

  • utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character.

  • utf16, the UTF-16 encoding for the Unicode character set using two or four bytes per character. Like ucs2 but with an extension for supplementary characters.

  • utf16le, the UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian.

  • utf32, the UTF-32 encoding for the Unicode character set using four bytes per character.

Table 11.2, “Unicode Character Set General Characteristics”, summarizes the general characteristics of Unicode character sets supported by MySQL.

Table 11.2 Unicode Character Set General Characteristics

Character Set Supported Characters Required Storage Per Character
utf8 BMP only 1, 2, or 3 bytes
ucs2 BMP only 2 bytes
utf8mb4 BMP and supplementary 1, 2, 3, or 4 bytes
utf16 BMP and supplementary 2 or 4 bytes
utf16le BMP and supplementary 2 or 4 bytes
utf32 BMP and supplementary 4 bytes

Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set that supports only BMP characters (utf8 or ucs2).

If you use character sets that support supplementary characters and thus are wider than the BMP-only utf8 and ucs2 character sets, there are potential incompatibility issues for your applications; see Section 11.1.9.8, “Converting Between 3-Byte and 4-Byte Unicode Character Sets”. That section also describes how to convert tables from utf8 to the (4-byte) utf8mb4 character set, and what constraints may apply in doing so.

A similar set of collations is available for most Unicode character sets. For example, each has a Danish collation, the names of which are ucs2_danish_ci, utf16_danish_ci, utf32_danish_ci, utf8_danish_ci, and utf8mb4_danish_ci. The exception is utf16le, which has only two collations. For information about Unicode collations and their differentiating properties, including collation properties for supplementary characters, see Section 11.1.10.1, “Unicode Character Sets”.

The MySQL implementation of UCS-2, UTF-16, and UTF-32 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL. The implementation of UTF-16LE is little-endian.

MySQL uses no BOM for UTF-8 values.

Client applications that communicate with the server using Unicode should set the client character set accordingly; for example, by issuing a SET NAMES 'utf8' statement. ucs2, utf16, utf16le, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET. (See Section 11.1.4, “Connection Character Sets and Collations”.)

The following sections provide additional detail on the Unicode character sets in MySQL.


User Comments
  Posted by Haakon Meland Eriksen on January 24, 2006
Connect with the same characterset as your data to display correctly. This example connects to the MySQL-server using UTF-8:

mysql --default-character-set=utf8 -uyour_username -p -h your_databasehost.your_domain.com your_database

If you get into trouble from a PHP-based web application, check the characterset configurations of these components:

1) the MySQL database
2) php.ini
3) httpd.conf
4) your server

  Posted by lorenz pressler on May 2, 2006
if you get data via php from your mysql-db (everything utf-8)
but still get '?' for some special characters in your browser
(<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />),
try this:

after mysql_connect() , and mysql_select_db() add this lines:
mysql_query("SET NAMES utf8");

worked for me.
i tried first with the utf8_encode, but this only worked for äüöéè...
and so on, but not for kyrillic and other chars.
  Posted by Eliram on August 6, 2008
I had a problem submitting unicode data from ASP pages to the MySQL server while everything was set to utf8 .

It turns out the problem was that my ODBC driver was version 3.5.1 and that's what caused the problem. Installing version 5.1 solved the problem.

http://dev.mysql.com/downloads/connector/odbc/
  Posted by David Busby on August 3, 2012
As of mySQL 5.x you can use the init_connect commands to force UTF-8 compliance from any client connection.

I have blogged about this here: http://blog.oneiroi.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections

Removing the need to use SET NAME in your PHP/ASP/Ruby/C++ code.

Sign Up Login You must be logged in to post a comment.