The world's most popular open source database

Documentation Downloads MySQL.com

Developer Zone

Section Menu:

About Worklog
MySQL Worklogs are design specifications for changes that may define past work, or be considered for future development.

WL#5212: General collation

Affects: Server-Prototype Only — Status: Un-Assigned

Description
Dependent Tasks
High Level Architecture

Adopt general collations which follow standards
and are consistent across many character sets.
Start with new general collations for latin9.

WL#5170: Swedish collation

Two collations for latin9 which are based
on Unicode UCA and no tailoring.
Like latin1_general_ci and latin1_general_cs
but without the bugs and more like the standards.

Principles
----------

New collations are based on Unicode Collation
Algorithm (UCA), and are tailored according to
Common Locale Data Repository (CLDR)
from the Unicode site. Fuller description of
the principles is in WL#5170 Swedish collation.

Names
-----

Since the convention is
character set name _ language name _ UCA version _ case-insensitivity abbreviation
the new collations are
latin9_general_520_ci
latin9_general_520_cs

The Rules
---------

For a "general" collation, we have a choice:

1 Use the CLDR rules for English
2 Do whatever we did for latin1_general_ci
3 Stop with DUCET UCA, no special rules.

Peter Gulutzan thinks we'll pick '3'.
So far nobody has said otherwise.

The _cs is the same as the _ci except it's case sensitive.

The complete character list
---------------------------

See section "The complete character list" in
WL#5170 Swedish collation. The only different
weights are for the characters (none listed so far),
and we won't use any of the Swedish tailoring.

Some Problems
-------------

Before we can accept this task, we need to agree:
* The "Principles" agreed for WL#5170 are okay generally.
* The "Rules" section above correctly reflects what we want for 'general'.