The world's most popular open source database

Documentation Downloads MySQL.com

Developer Zone

Section Menu:

About Worklog
MySQL Worklogs are design specifications for changes that may define past work, or be considered for future development.

WL#5211: Spanish collation

Affects: Server-Prototype Only — Status: Un-Assigned

Description
Dependent Tasks
High Level Architecture

Adopt a Spanish collation which follows standards
and is consistent across many character sets.
Start with a new Spanish collation for latin9.

WL#5170: Swedish collation

A collation for latin9 which is based
on Unicode UCA and CLDR Spanish tailoring.
Like latin1_spanish_ci
but without the bugs and more like the standards.


Principles
----------

New collations are based on Unicode Collation
Algorithm (UCA), and are tailored according to
Common Locale Data Repository (CLDR)
from the Unicode site. Fuller description of
the principles is in WL#5170 Swedish collation.

Names
-----

Since the convention is
character set name _ language name _ UCA version _ case-insensitivity abbreviation
the new collation is
latin9_spanish_520_ci

The Rules
---------

The tailoring rules come from the CLDR file es.xml,
attached to this worklog task, or through these steps:
Go to http://cldr.unicode.org/
Click "CLDR Releases/Downloads"
Click "CLDR 1.7.2"
Click "core.zip"
Unzip core.zip
Copy ./common/collation/es.xml

Remember that, according to "Principles",
any ligatures are sorted as equal to the
first character of the expansion, because
we want to keep the collations simple
(one weight per character, primary weights only).

So Peter Gulutzan thinks these are the rules:

For es.xml collation_type="standard" (not "traditional"):

Ñ > N

This passage in the CLDR for collation_type="standard" is hard to understand:
"
ae
æ
Æ
"
For our limited purpose, it appears we must say Æ = A.

The complete character list
---------------------------

See section "The complete character list" in
WL#5170 Swedish collation. The only different
weights are for the characters Æ Ñ
as described in section "Rules" above,
and we won't use any of the Swedish tailoring.

Some Problems
-------------

Before we can accept this task, we need to agree:
* The "Principles" agreed for WL#5170 are okay generally.
* The "Rules" section above correctly reflects CLDR es.xml
* The CLDR es.xml does not contain errors.