WL#10753: Add Russian collations for utf8mb4
Affects: Server-8.0
—
Status: Complete
As one of the major languages, Russian collation is to be added for character set utf8mb4. We add both utf8mb4_ru_0900_ai_ci and utf8mb4_ru_0900_as_cs.
F-1: The collations shall be compiled ones. F-2: The collations shall work for all characters in range [U+0, U+10FFFF]. F-3: The collations shall sort characters of Russian language correctly according to language specific rules defined by CLDR. F-4: The collations shall sort characters not belonging to the language according to their order in DUCET. F-5: For the characters whose weight is not assigned in DUCET, the collations shall sort them with their implicit weight value which is constructed in the UCA way.
CLDR defines a very simple rule for Russian language, [reorder Cyrl]. We'll do reordering according to this rule, putting all Cyrillic characters before others, like Latin, Greek. CLDR doesn't define weight-tailoring rule for Russian, so it falls back to "root collation" (root.xml of CLDR) to find rule. "Root collation" is a collation for all languages if no tailoring rule is defined for it. "Root collation" defines a few weight-tailoring rules for some Cyrillic characters. But these rules are of 'collation type' of 'eor', although Russian language's default 'collation type' is 'standard'. According to CLDR definition, we do not need to import these 'eor' rules. So there is no special weight-tailoring rules for Russian collation. As we have already implemented 'reordering', the implementation of these 2 Russian collations are very straightforward. We just need to add the reordering rule.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.