WL#5476: Croatian collation
Affects: Server-5.6 — Status: Complete — Priority: Medium
Support Croatian collation for Unicode character sets.
Add Unicode-4.0 based collations utf8_croatian_ci ucs2_croatian_ci utf8mb4_croatian_ci utf16_croatian_ci utf32_croatian_ci These collations will all be variants of utf8_unicode_ci etc., that is, the base collation is what we used for old collations with Unicode 4.0. We will have tailoring for Croatian letters: Č, Ć, Dž, Đ, Lj, Nj, Š, Ž. The new collations are case insensitive. There will be no support for secondary or tertiary sorts. utf8_croatian_520_ci etc. ------------------------- Originally there was an intent to add five Unicode-5.2.0 based collations: utf8_croatian_520_ci ucs2_croatian_520_ci utf8mb4_croatian_520_ci utf16_croatian_520_ci utf32_croatian_520_ci with the same tailoring as in utf8_croatian_ci etc., but these collations will all be based on Unicode 5.2. This part of the plan is cancelled. We must be careful about adding new collations, because InnoDB will support only a few more. UCA and CLDR ------------ For new collations MySQL follows the Unicode Collation Algorithm (UCA) with tailoring according to a Common Locale Data Repository (CLDR) specification, which in this case is the file 'hr.xml', attached. For details see section "Principles" in WL#2673. Translating from the XML, the hr.xml CLDR specification is saying that Croatian tailoring is thus: C WITH CARON follows C C WITH ACUTE follows C WITH CARON "D + Z WITH CARON" follows D (contraction) "DZ WITH CARON" is equal to "D + Z WITH CARON" D WITH STROKE follows "DZ WITH CARON" "L + J" follows L (contraction) LJ is equal to "L + J" "N + J" follows L (contraction) NJ is equal to "N + J" S WITH CARON follows S Z WITH CARON follows Z Sorting order is: A B C Č Ć D DŽ Ǆ Đ E F G H I J K L LJ Ǉ M N NJ Ǌ O P Q R S Š T U V W X Y Z Ž Tailoring --------- &C < č <<< Č < ć <<< Ć D < dž = ǆ <<< dŽ <<< Dž = ǅ <<< DŽ = Ǆ < đ <<< Đ L < lj = ǉ <<< lJ <<< Lj = ǈ <<< LJ = Ǉ N < nj = ǌ <<< nJ <<< Nj = ǋ <<< NJ = Ǌ S < š <<< Š Z < ž <<< Ž The same, using code notation: &C < \u010D <<< \u010C < \u0107 <<< \u0106 &D < d\\u017E = \u01C6 <<< d\u017D <<< D\u017E = \u01C5 <<< D\u017D = \u01C4 < \u0111 <<< \u0110 &L < lj = \u01C9 <<< lJ <<< Lj = \u01C8 <<< LJ = \u01C7 &N < nj = \u01CC <<< nJ <<< Nj = \u01CB <<< NJ = \u01CA &S < \u0161 <<< \u0160 &Z < \u017E <<< \u017D References ---------- Unicode database http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (This is for reference only -- we won't support the latest version.) Croatian CLDR file http://unicode.org/cldr/trac/browser/trunk/common/collation/hr.xml Real Croatian collations for cp1250, latin2 http://forge.mysql.com/worklog/task.php?id=3286
Copyright (c) 2000, 2016, Oracle Corporation and/or its affiliates. All rights reserved.