WL#3764: Sinhala Collation

Affects: Server-5.5   —   Status: Complete   —   Priority: Medium

Support Sinhala (Sri Lanka) collation for ucs2 and utf8 character sets.

The original patch came from Harshula Jayasuriya, for the
community edition. Decisions and discussions are in the thread
"Re: Patch to add Sinhala (Sri Lanka) collation to MySQL"
(recipients: Bar, Lenz, Peter, Sergei).

See also:
BUG#26474 Add Sinhala script (Sri Lanka) collation to MySQL
From Harshula: """

The Unicode codepage for Sinhala can be broken into 3 categories.
U+0D85 - U+0D96 = independent vowels
U+0D9A - U+0DC6 = consonants
U+0DCA - U+0DF3 = dependent vowels.
U+0D82 - U+0D83 = consonant modifiers.

The collation order of the groups are:
1) independent vowels
2) consonant modifiers
3) consonants
4) dependent vowels.

The standard speaks rather obliquely:  """

a) Conjunct letters (බ ඳ අකර) are decomposed into the equivalent
   <pure consonant, consonant-with-vowel> sequence e.g. ක ->කක .
b) Touching letters are decomposed into the equivalent <pure consonant,
   consonant-with-vowel> sequence e.g. සව -> සව, මම -> මම.c)        The
   yansaya and rakaransaya are decomposed into their equivalent forms e.g:ක
   -> කය and ක-> කර.
d) The repaya is decomposed in its equivalent form e.g.ර ->රම.
e) The letter ඥ is decomposed as follows:
   Thus, ඥ න is collated as being equivalent to ජඤ න.


The algorithm for the Simple collation is the same as for the Dictionary
collation sequence, except that the decomposition in step d) of 4.1 is omitted.
Therefore, ඥ is not decomposed into ජඤ but treated as a single letter.


(Additional notes:  The standard is not described using Unicode code-points. 
The PDF specification can't be scraped via normal means to get literal data.)