Documentation Home
MySQL 8.0 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 30.9Mb
PDF (A4) - 30.8Mb
PDF (RPM) - 29.6Mb
HTML Download (TGZ) - 7.7Mb
HTML Download (Zip) - 7.7Mb
HTML Download (RPM) - 6.6Mb
Man Pages (TGZ) - 141.5Kb
Man Pages (Zip) - 200.3Kb
Info (Gzip) - 2.9Mb
Info (Zip) - 2.9Mb

Pre-General Availability Draft: 2017-04-21 Collation Naming Conventions

MySQL collation names follow these conventions:

  • A collation name starts with the name of the character set with which it is associated, generally followed by one or more suffixes indicating other collation characteristics. For example, utf8mb4_general_ci and latin1_swedish_ci are collations for the utf8mb4 and latin1 character sets, respectively. The binary character set has a single collation, also named binary, with no suffixes.

  • A language-specific collation includes a locale code or language name. For example, utf8mb4_tr_0900_ai_ci and utf8mb4_hu_0900_ai_ci sort characters for the utf8mb4 character set using the rules of Turkish and Hungarian, respectively. utf8mb4_turkish_ci and utf8mb4_hungarian_ci are similar but based on a less recent version of the Unicode Collation Algorithm.

  • Collation suffixes indicate whether a collation is case and accent sensitive, or binary. The following table shows the suffixes used to indicate these characteristics.

    Table 11.1 Collation Case/Accent Sensitivity Suffixes

    Suffix Meaning
    _ai Accent insensitive
    _as Accent sensitive
    _ci Case insensitive
    _cs Case sensitive
    _bin Binary

    For nonbinary collation names that do not specify accent sensitivity, it is determined by case sensitivity. If a collation name does not contain _ai or _as, _ci in the name implies _ai and _cs in the name implies _as. For example, latin1_general_ci is explicitly case insensitive and implicitly accent insensitive, latin1_general_cs is explicitly case sensitive and implicitly accent sensitive, and utf8mb4_0900_ai_ci is explicitly case and accent insensitive.

    For the binary collation of the binary character set, comparisons are based on numeric byte values. For the _bin collation of a nonbinary character set, comparisons are based on numeric character code values, which differ from byte values for multibyte characters. For more information, see Section, “The binary Collation Compared to _bin Collations”.

  • For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:

  • For Unicode character sets, the xxx_general_mysql500_ci collations preserve the pre-5.1.24 ordering of the original xxx_general_ci collations and permit upgrades for tables created before MySQL 5.1.24 (Bug #27877).

User Comments
Sign Up Login You must be logged in to post a comment.