Adopt a Danish collation which follows standards and is consistent across many character sets. Start with a new Danish collation for latin9.
A collation for latin9 which is based on Unicode UCA and CLDR Danish tailoring. Like latin1_danish_ci but without the bugs and more like the standards. Principles ---------- New collations are based on Unicode Collation Algorithm (UCA), and are tailored according to Common Locale Data Repository (CLDR) from the Unicode site. Fuller description of the principles is in WL#5170 Swedish collation. Names ----- Since the convention is character set name _ language name _ UCA version _ case-insensitivity abbreviation the new collation is latin9_danish_520_ci The Rules --------- The tailoring rules come from the CLDR file da.xml, attached to this worklog task, or through these steps: Go to http://cldr.unicode.org/ Click "CLDR Releases/Downloads" Click "CLDR 1.7.2" Click "core.zip" Unzip core.zip Copy ./common/collation/da.xml Remember that, according to "Principles", any ligatures are sorted as equal to the first character of the expansion, because we want to keep the collations simple (one weight per character, primary weights only). We'll use the da.xml "standard", not "proposed", rules. These are the apparent rules, with comparisons to other collations: latin9_ latin1_ utf8_ Microsoft Oracle swedish_ danish_ danish_ 520_ci ci ci ------- ------- ------- --------- ------ AE BEFORE EZH no yes yes yes yes O STROKE AFTER AE no yes yes yes yes A RING AFTER O STROKE no yes yes yes yes ETH = D yes yes no, > D yes yes D STROKE = D yes - no - - THORN = TH yes no no no, > T no, = T O STROKE = O DIAERESIS yes yes yes yes yes AE = A DIAERESIS yes yes yes yes yes O DOUBLE ACUTE = O DIAERESIS yes - no - - U DIAERESIS = Y yes yes yes yes yes U DOUBLE ACUTE = Y yes no yes - - A DIAERESIS = E OGONEK yes no no - - OE = O DIAERESIS yes no no no no A RING = AA no no yes yes no We may ignore the A RING = AA rule for a simple collation. The complete character list --------------------------- See section "The complete character list" in WL#5170 Swedish collation. The only different weights are for the characters which differ for Danish as described in section "Rules" above, and we won't use any of the Swedish tailoring, and our concern now is latin9 not latin1. Some Problems ------------- Before we can accept this task, we need to agree: * The "Principles" agreed for WL#5170 are okay generally. * The "Rules" section above correctly reflects CLDR da.xml * The CLDR da.xml does not contain errors. References ---------- WL#5170 Swedish collation WL#5171 Norwegian collation BUG#11699 Danish collations sort strange with [ and ] BUG#37571 Inconsistence in Danish collations http://www.collation-charts.org/mysql60/mysql604.latin1_danish_ci.html http://www.collation-charts.org/mysql60/mysql604.utf8_danish_ci.html