The latin2_croatian_ci and cp1250_croatian_ci collations
in MySQL are simplified versions of the real Croatian sorting rules.
We have complaints about wrong sorting order from Croatian users.
The problem is that these collations do not support contractions:
DŽ, LJ and NJ, which must be treated as single letters.
Sorting order should be:
A,B,C,Č,Ć,D,DŽ,Đ,E,F,G,H,I,J,K,L,LJ,M,N,NJ,O,P,Q,R,S,Š,T,U,V,W,X,Y,Z,Ž
MySQL is also missing Croatian collations for Unicode character sets,
like utf8_croatian_ci and ucs2_croatian_ci, but these collations have
been moved to a separate WL#5476.
The goal of this worklog item is to create "real" Croatian collations
for 8bit character sets only:
- cp1250
- latin2
Resources:
http://bugs.mysql.com/bug.php?id=16373
http://bugs.mysql.com/bug.php?id=44523
http://forums.mysql.com/read.php?20,260051,260051
http://en.wikipedia.org/wiki/Gajica
http://www.evertype.com/alphabets/croatian.pdf
http://www.omniglot.com/writing/serbo-croat.htm
There is also a worklog task "Add Hungarian collations".
There was discussion of this topic in May 2006 in dev-private.
For new collations MySQL follows the Unicode Collation Algorithm (UCA)
with tailoring according to a Common Locale Data Repository (CLDR)
specification, which in this case is the file 'hr.xml', attached.