A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations.
A collation orders characters based on weights. Each character in a character set maps to a weight. Characters with equal weights compare as equal, and characters with unequal weights compare according to the relative magnitude of their weights.
The WEIGHT_STRING()
function can be
used to see the weights for the characters in a string. The value
that it returns to indicate weights is a binary string, so it is
convenient to use
HEX(WEIGHT_STRING(
to display the weights in printable form. The following example
shows that weights do not differ for lettercase for the letters in
str
))'AaBb'
if it is a nonbinary case-insensitive
string, but do differ if it is a binary string:
mysql> SELECT HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci));
+------------------------------------------------------+
| HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci)) |
+------------------------------------------------------+
| 41414242 |
+------------------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));
+-----------------------------------+
| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
+-----------------------------------+
| 41614262 |
+-----------------------------------+
MySQL supports several collation implementations, as discussed in Section 1.14.1, “Collation Implementation Types”. Some of these can be added to MySQL without recompiling:
Simple collations for 8-bit character sets.
UCA-based collations for Unicode character sets.
Binary (
) collations.xxx
_bin
The following sections describe how to add user-defined collations of the first two types to existing character sets. All existing character sets already have a binary collation, so there is no need here to describe how to add one.
Summary of the procedure for adding a new user-defined collation:
Choose a collation ID.
Add configuration information that names the collation and describes the character-ordering rules.
Restart the server.
Verify that the server recognizes the collation.
The instructions here cover only user-defined collations that can be added without recompiling MySQL. To add a collation that does require recompiling (as implemented by means of functions in a C source file), use the instructions in Section 1.13, “Adding a Character Set”. However, instead of adding all the information required for a complete character set, just modify the appropriate files for an existing character set. That is, based on what is already present for the character set's current collations, add data structures, functions, and configuration information for the new collation.
If you modify an existing user-defined collation, that may affect the ordering of rows for indexes on columns that use the collation. In this case, rebuild any such indexes to avoid problems such as incorrect query results. See Rebuilding or Repairing Tables or Indexes.
Additional Resources
Example showing how to add a collation for full-text searches: Adding a User-Defined Collation for Full-Text Indexing
The Unicode Collation Algorithm (UCA) specification: http://www.unicode.org/reports/tr10/
The Locale Data Markup Language (LDML) specification: http://www.unicode.org/reports/tr35/