Documentation Home
MySQL 5.6 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 31.1Mb
PDF (A4) - 31.1Mb
PDF (RPM) - 30.4Mb
EPUB - 7.8Mb
HTML Download (TGZ) - 7.6Mb
HTML Download (Zip) - 7.6Mb
HTML Download (RPM) - 6.5Mb
Eclipse Doc Plugin (TGZ) - 8.3Mb
Eclipse Doc Plugin (Zip) - 10.1Mb
Man Pages (TGZ) - 182.5Kb
Man Pages (Zip) - 293.9Kb
Info (Gzip) - 2.9Mb
Info (Zip) - 2.9Mb
Excerpts from this Manual

MySQL 5.6 Reference Manual  /  ...  /  Unicode Support

10.1.11 Unicode Support

The initial implementation of Unicode support (in MySQL 4.1) included two character sets for storing Unicode data:

  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.

  • ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character.

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

  • Their code values are between 0 and 65535 (or U+0000 .. U+FFFF).

  • They can be encoded with 8, 16, or 24 bits, as in utf8.

  • They can be encoded with a fixed 16-bit word, as in ucs2.

  • They are sufficient for almost all characters in major languages.

Characters not supported by the aforementioned character sets include supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.

Unicode support for supplementary characters requires character sets that have a broader range (including non-BMP characters) and therefore take more space. The following table shows a brief feature comparison of the original and expanded Unicode support.

Before MySQL 5.5MySQL 5.5 and up
All Unicode 3.0 charactersAll Unicode 5.0 and 6.0 characters
No supplementary charactersWith supplementary characters
utf8 character set for up to three bytes, BMP onlyNo change
ucs2 character set, BMP onlyNo change
 utf8mb4 character set for up to four bytes, BMP or supplemental
 utf16 character set, BMP or supplemental
 utf16le character set, BMP or supplemental (5.6.1 and up)
 utf32 character set, BMP or supplemental

If you want to use the character sets that are wider than the original utf8 and ucs2 character sets, there are potential incompatibility issues for your applications; see Section 10.1.11.8, “Converting Between 3-Byte and 4-Byte Unicode Character Sets”. That section also describes how to convert tables from utf8 to the (4-byte) utf8mb4 character set, and what constraints may apply in doing so.

MySQL supports these Unicode character sets:

  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.

  • utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character.

  • ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character.

  • utf16, the UTF-16 encoding for the Unicode character set; like ucs2 but with an extension for supplementary characters.

  • utf16le, the UTF-16LE encoding for the Unicode character set; like utf16 but little-endian rather than big-endian.

  • utf32, the UTF-32 encoding for the Unicode character set using 32 bits per character.

utf8 and ucs2 support BMP characters. utf8mb4, utf16, utf16le, and utf32 support BMP and supplementary characters.

A similar set of collations is available for most Unicode character sets. For example, each has a Danish collation, the names of which are ucs2_danish_ci, utf16_danish_ci, utf32_danish_ci, utf8_danish_ci, and utf8mb4_danish_ci. The exception is utf16le, which has only two collations. For a description of Unicode collations and their differentiating properties, including collation properties for supplementary characters, see Section 10.1.14.1, “Unicode Character Sets”.

The MySQL implementation of UCS-2, UTF-16, and UTF-32 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL. The implementation of UTF-16LE is little-endian.

MySQL uses no BOM for UTF-8 values.

Client applications that need to communicate with the server using Unicode should set the client character set accordingly; for example, by issuing a SET NAMES 'utf8' statement. ucs2, utf16, utf16le, and utf32 cannot be used as a client character set, which means that they do not work for SET NAMES or SET CHARACTER SET. (See Section 10.1.5, “Connection Character Sets and Collations”.)

The following sections provide additional detail on the Unicode character sets in MySQL.


User Comments
  Posted by Haakon Meland Eriksen on January 24, 2006
Connect with the same characterset as your data to display correctly. This example connects to the MySQL-server using UTF-8:

mysql --default-character-set=utf8 -uyour_username -p -h your_databasehost.your_domain.com your_database

If you get into trouble from a PHP-based web application, check the characterset configurations of these components:

1) the MySQL database
2) php.ini
3) httpd.conf
4) your server

  Posted by lorenz pressler on May 2, 2006
if you get data via php from your mysql-db (everything utf-8)
but still get '?' for some special characters in your browser
(<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />),
try this:

after mysql_connect() , and mysql_select_db() add this lines:
mysql_query("SET NAMES utf8");

worked for me.
i tried first with the utf8_encode, but this only worked for äüöéè...
and so on, but not for kyrillic and other chars.
  Posted by Eliram on August 6, 2008
I had a problem submitting unicode data from ASP pages to the MySQL server while everything was set to utf8 .

It turns out the problem was that my ODBC driver was version 3.5.1 and that's what caused the problem. Installing version 5.1 solved the problem.

http://dev.mysql.com/downloads/connector/odbc/
  Posted by David Busby on August 3, 2012
As of mySQL 5.x you can use the init_connect commands to force UTF-8 compliance from any client connection.

I have blogged about this here: http://blog.oneiroi.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connections

Removing the need to use SET NAME in your PHP/ASP/Ruby/C++ code.

Sign Up Login You must be logged in to post a comment.