WL#5213: Danish collation

Affects: Server-Prototype Only — Status: Un-Assigned

Description
Dependent Tasks
High Level Architecture

Adopt a Danish collation which follows standards
and is consistent across many character sets.
Start with a new Danish collation for latin9.

WL#5170: Swedish collation

A collation for latin9 which is based
on Unicode UCA and CLDR Danish tailoring.
Like latin1_danish_ci
but without the bugs and more like the standards.


Principles
----------

New collations are based on Unicode Collation
Algorithm (UCA), and are tailored according to
Common Locale Data Repository (CLDR)
from the Unicode site. Fuller description of
the principles is in WL#5170 Swedish collation.

Names
-----

Since the convention is
character set name _ language name _ UCA version _ case-insensitivity abbreviation
the new collation is
latin9_danish_520_ci

The Rules
---------

The tailoring rules come from the CLDR file da.xml,
attached to this worklog task, or through these steps:
Go to http://cldr.unicode.org/
Click "CLDR Releases/Downloads"
Click "CLDR 1.7.2"
Click "core.zip"
Unzip core.zip
Copy ./common/collation/da.xml

Remember that, according to "Principles",
any ligatures are sorted as equal to the
first character of the expansion, because
we want to keep the collations simple
(one weight per character, primary weights only).

We'll use the da.xml "standard", not "proposed", rules.
These are the apparent rules, with comparisons to other collations:

                                latin9_  latin1_ utf8_   Microsoft Oracle
                                swedish_ danish_ danish_
                                520_ci   ci      ci
                                -------  ------- ------- --------- ------

AE BEFORE EZH                    no      yes     yes     yes       yes
O STROKE AFTER AE                no      yes     yes     yes       yes
A RING AFTER O STROKE            no      yes     yes     yes       yes
ETH = D                          yes     yes     no, > D yes       yes    
D STROKE = D                     yes     -       no      -         -
THORN = TH                       yes     no      no      no, > T   no, = T
O STROKE = O DIAERESIS           yes     yes     yes     yes       yes
AE = A DIAERESIS                 yes     yes     yes     yes       yes
O DOUBLE ACUTE = O DIAERESIS     yes     -       no      -         -
U DIAERESIS = Y                  yes     yes     yes     yes       yes
U DOUBLE ACUTE = Y               yes     no      yes     -         -
A DIAERESIS = E OGONEK           yes     no      no      -         -
OE = O DIAERESIS                 yes     no      no      no        no
A RING = AA                      no      no      yes     yes       no

We may ignore the A RING = AA rule for a simple collation.

The complete character list
---------------------------

See section "The complete character list" in
WL#5170 Swedish collation. The only different
weights are for the characters which differ for Danish
as described in section "Rules" above,
and we won't use any of the Swedish tailoring,
and our concern now is latin9 not latin1.

Some Problems
-------------

Before we can accept this task, we need to agree:
* The "Principles" agreed for WL#5170 are okay generally.
* The "Rules" section above correctly reflects CLDR da.xml
* The CLDR da.xml does not contain errors.

References
----------

WL#5170 Swedish collation
WL#5171 Norwegian collation

BUG#11699 Danish collations sort strange with [ and ]
BUG#37571 Inconsistence in Danish collations

http://www.collation-charts.org/mysql60/mysql604.latin1_danish_ci.html
http://www.collation-charts.org/mysql60/mysql604.utf8_danish_ci.html