WL#3286: Real Croatian collations for cp1250, latin2
Affects: Server-7.1 — Status: Un-Assigned — Priority: Medium
The latin2_croatian_ci and cp1250_croatian_ci collations in MySQL are simplified versions of the real Croatian sorting rules. We have complaints about wrong sorting order from Croatian users. The problem is that these collations do not support contractions: DŽ, LJ and NJ, which must be treated as single letters. Sorting order should be: A,B,C,Č,Ć,D,DŽ,Đ,E,F,G,H,I,J,K,L,LJ,M,N,NJ,O,P,Q,R,S,Š,T,U,V,W,X,Y,Z,Ž MySQL is also missing Croatian collations for Unicode character sets, like utf8_croatian_ci and ucs2_croatian_ci, but these collations have been moved to a separate WL#5476. The goal of this worklog item is to create "real" Croatian collations for 8bit character sets only: - cp1250 - latin2 Resources: http://bugs.mysql.com/bug.php?id=16373 http://bugs.mysql.com/bug.php?id=44523 http://forums.mysql.com/read.php?20,260051,260051 http://en.wikipedia.org/wiki/Gajica http://www.evertype.com/alphabets/croatian.pdf http://www.omniglot.com/writing/serbo-croat.htm There is also a worklog task "Add Hungarian collations". There was discussion of this topic in May 2006 in dev-private. For new collations MySQL follows the Unicode Collation Algorithm (UCA) with tailoring according to a Common Locale Data Repository (CLDR) specification, which in this case is the file 'hr.xml', attached.
Note: WL#2673 "Unicode Collation Algorithm new version" will allow to treat "DŽ" as contraction.
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.