Documentation Home
MySQL Globalization
Related Documentation Download this Excerpt
PDF (US Ltr) - 461.3Kb
PDF (A4) - 461.0Kb
HTML Download (TGZ) - 90.0Kb
HTML Download (Zip) - 92.8Kb

1.3.1 Collation Naming Conventions

MySQL collation names follow these conventions:

  • A collation name starts with the name of the character set with which it is associated, generally followed by one or more suffixes indicating other collation characteristics. For example, utf8_general_ci and latin1_swedish_ci are collations for the utf8 and latin1 character sets, respectively. The binary character set has a single collation, also named binary, with no suffixes.

  • A language-specific collation includes a language name. For example, utf8_turkish_ci and utf8_hungarian_ci sort characters for the utf8 character set using the rules of Turkish and Hungarian, respectively.

  • Collation suffixes indicate whether a collation is case and accent sensitive, or binary. The following table shows the suffixes used to indicate these characteristics.

    Table 1.1 Collation Case Sensitivity Suffixes

    Suffix Meaning
    _ai Accent insensitive
    _as Accent sensitive
    _ci Case insensitive
    _cs Case sensitive
    _bin Binary

    For nonbinary collation names that do not specify accent sensitivity, it is determined by case sensitivity. If a collation name does not contain _ai or _as, _ci in the name implies _ai and _cs in the name implies _as. For example, latin1_general_ci is explicitly case insensitive and implicitly accent insensitive, and latin1_general_cs is explicitly case sensitive and implicitly accent sensitive.

    For the binary collation of the binary character set, comparisons are based on numeric byte values. For the _bin collation of a nonbinary character set, comparisons are based on numeric character code values, which differ from byte values for multibyte characters. For more information, see Section 1.8.5, “The binary Collation Compared to _bin Collations”.

  • For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:

  • For Unicode character sets, the xxx_general_mysql500_ci collations preserve the pre-5.1.24 ordering of the original xxx_general_ci collations and permit upgrades for tables created before MySQL 5.1.24 (Bug #27877).