The world's most popular open source database

Documentation Downloads MySQL.com

Developer Zone

Section Menu:

About Worklog
MySQL Worklogs are design specifications for changes that may define past work, or be considered for future development.

WL#3764: Sinhala Collation

Affects: Server-5.5 — Status: Complete

Description
High Level Architecture

Support Sinhala (Sri Lanka) collation for ucs2 and utf8 character sets.

The original patch came from Harshula Jayasuriya, for the
community edition. Decisions and discussions are in the thread
"Re: Patch to add Sinhala (Sri Lanka) collation to MySQL"
(recipients: Bar, Lenz, Peter, Sergei).

See also:
BUG#26474 Add Sinhala script (Sri Lanka) collation to MySQL

From Harshula: """

The Unicode codepage for Sinhala can be broken into 3 categories.
U+0D85 - U+0D96 = independent vowels
U+0D9A - U+0DC6 = consonants
U+0DCA - U+0DF3 = dependent vowels.
U+0D82 - U+0D83 = consonant modifiers.

The collation order of the groups are:
1) independent vowels
2) consonant modifiers
3) consonants
4) dependent vowels.
"""

The standard speaks rather obliquely:  """

a) Conjunct letters (බ ඳ අකර) are decomposed into the equivalent
    sequence e.g. ක ->කක .
b) Touching letters are decomposed into the equivalent  sequence e.g. සව -> සව, මම -> මම.c)        The
   yansaya and rakaransaya are decomposed into their equivalent forms e.g:ක
   -> කය and ක-> කර.
d) The repaya is decomposed in its equivalent form e.g.ර ->රම.
e) The letter ඥ is decomposed as follows:
        ඥ->ජඤ.
   Thus, ඥ න is collated as being equivalent to ජඤ න.

...

The algorithm for the Simple collation is the same as for the Dictionary
collation sequence, except that the decomposition in step d) of 4.1 is omitted.
Therefore, ඥ is not decomposed into ජඤ but treated as a single letter.

"""

(Additional notes:  The standard is not described using Unicode code-points. 
The PDF specification can't be scraped via normal means to get literal data.)