Documentation Home
MySQL Globalization
Related Documentation Download this Excerpt
PDF (US Ltr) - 497.1Kb
PDF (A4) - 495.8Kb


MySQL Globalization  /  Character Sets, Collations, Unicode

Chapter 1 Character Sets, Collations, Unicode

Table of Contents

1.1 Character Sets and Collations in General
1.2 Character Sets and Collations in MySQL
1.2.1 Character Set Repertoire
1.2.2 UTF-8 for Metadata
1.3 Specifying Character Sets and Collations
1.3.1 Collation Naming Conventions
1.3.2 Server Character Set and Collation
1.3.3 Database Character Set and Collation
1.3.4 Table Character Set and Collation
1.3.5 Column Character Set and Collation
1.3.6 Character String Literal Character Set and Collation
1.3.7 The National Character Set
1.3.8 Character Set Introducers
1.3.9 Examples of Character Set and Collation Assignment
1.3.10 Compatibility with Other DBMSs
1.4 Connection Character Sets and Collations
1.5 Configuring Application Character Set and Collation
1.6 Error Message Character Set
1.7 Column Character Set Conversion
1.8 Collation Issues
1.8.1 Using COLLATE in SQL Statements
1.8.2 COLLATE Clause Precedence
1.8.3 Character Set and Collation Compatibility
1.8.4 Collation Coercibility in Expressions
1.8.5 The binary Collation Compared to _bin Collations
1.8.6 Examples of the Effect of Collation
1.8.7 Using Collation in INFORMATION_SCHEMA Searches
1.9 Unicode Support
1.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding)
1.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding)
1.9.3 The utf8 Character Set (Deprecated alias for utf8mb3)
1.9.4 The ucs2 Character Set (UCS-2 Unicode Encoding)
1.9.5 The utf16 Character Set (UTF-16 Unicode Encoding)
1.9.6 The utf16le Character Set (UTF-16LE Unicode Encoding)
1.9.7 The utf32 Character Set (UTF-32 Unicode Encoding)
1.9.8 Converting Between 3-Byte and 4-Byte Unicode Character Sets
1.10 Supported Character Sets and Collations
1.10.1 Unicode Character Sets
1.10.2 West European Character Sets
1.10.3 Central European Character Sets
1.10.4 South European and Middle East Character Sets
1.10.5 Baltic Character Sets
1.10.6 Cyrillic Character Sets
1.10.7 Asian Character Sets
1.10.8 The Binary Character Set
1.11 Restrictions on Character Sets
1.12 Setting the Error Message Language
1.13 Adding a Character Set
1.13.1 Character Definition Arrays
1.13.2 String Collating Support for Complex Character Sets
1.13.3 Multi-Byte Character Support for Complex Character Sets
1.14 Adding a Collation to a Character Set
1.14.1 Collation Implementation Types
1.14.2 Choosing a Collation ID
1.14.3 Adding a Simple Collation to an 8-Bit Character Set
1.14.4 Adding a UCA Collation to a Unicode Character Set
1.15 Character Set Configuration
1.16 MySQL Server Locale Support

MySQL includes character set support that enables you to store data using a variety of character sets and perform comparisons according to a variety of collations. The default MySQL server character set and collation are utf8mb4 and utf8mb4_0900_ai_ci, but you can specify character sets at the server, database, table, column, and string literal levels. To maximize interoperability and future-proofing of your data and applications, we recommend that you use the utf8mb4 character set whenever possible.

Note

UTF8 is a deprecated synonym for utf8mb3, and you should expect it to be removed in a future version of MySQL. Specify utfmb3 or (preferably) utfmb4 instead.

This chapter discusses the following topics:

  • What are character sets and collations?

  • The multiple-level default system for character set assignment.

  • Syntax for specifying character sets and collations.

  • Affected functions and operations.

  • Unicode support.

  • The character sets and collations that are available, with notes.

  • Selecting the language for error messages.

  • Selecting the locale for day and month names.

Character set issues affect not only data storage, but also communication between client programs and the MySQL server. If you want the client program to communicate with the server using a character set different from the default, you need to indicate which one. For example, to use the latin1 Unicode character set, issue this statement after connecting to the server:

SET NAMES 'latin1';

For more information about configuring character sets for application use and character set-related issues in client/server communication, see Section 1.5, “Configuring Application Character Set and Collation”, and Section 1.4, “Connection Character Sets and Collations”.