Pre-General Availability Draft: 2017-11-22
MySQL collation names follow these conventions:
A collation name starts with the name of the character set with which it is associated, generally followed by one or more suffixes indicating other collation characteristics. For example,
latin1_swedish_ciare collations for the
latin1character sets, respectively. The
binarycharacter set has a single collation, also named
binary, with no suffixes.
A language-specific collation includes a locale code or language name. For example,
utf8mb4_hu_0900_ai_cisort characters for the
utf8mb4character set using the rules of Turkish and Hungarian, respectively.
utf8mb4_hungarian_ciare similar but based on a less recent version of the Unicode Collation Algorithm.
Collation suffixes indicate whether a collation is case and accent sensitive, or binary. The following table shows the suffixes used to indicate these characteristics.
Table 10.1 Collation Case/Accent Sensitivity Suffixes
For nonbinary collation names that do not specify accent sensitivity, it is determined by case sensitivity. If a collation name does not contain
_ciin the name implies
_csin the name implies
_as. For example,
latin1_general_ciis explicitly case insensitive and implicitly accent insensitive,
latin1_general_csis explicitly case sensitive and implicitly accent sensitive, and
utf8mb4_0900_ai_ciis explicitly case and accent insensitive.
For Japanese collations, the
_kssuffix indicates that a collation is kana sensitive; that is, it distinguishes Katakana characters from Hiragana characters. Japanese collations without the
_kssuffix are not kana sensitive and treat Katakana and Hiragana characters equal for sorting.
binarycollation of the
binarycharacter set, comparisons are based on numeric byte values. For the
_bincollation of a nonbinary character set, comparisons are based on numeric character code values, which differ from byte values for multibyte characters. For more information, see Section 10.1.8.5, “The binary Collation Compared to _bin Collations”.
For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:
utf8mb4_0900_ai_ciis based on UCA 9.0.0 weight keys (http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt).
utf8mb4_unicode_520_ciis based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).
utf8mb4_unicode_ci(with no version named) is based on UCA 4.0.0 weight keys (http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).
For Unicode character sets, the
collations preserve the pre-5.1.24 ordering of the original
collations and permit upgrades for tables created before MySQL 5.1.24 (Bug #27877).