WL#9479: Upgrade Unicode data to 9.0.0

Affects: Server-8.0   —   Status: Complete

We have added collation utf8mb4_800_ai_ci with WL#9125, and added 20 language
specific collations with WL#9108. All these collations are using Unicode data of
version 8.0.0. But Unicode committee has announced Unicode 9.0.0 on Jun 21. To
have our new collations built on latest Unicode data, we'll upgrade our data and
collations as well.
F-1: Make collation weight tables of new collations use latest data

F-2: Make case mapping tables of new collations use latest data

F-3: Make new collations sort characters correctly
What's the difference between Unicode 9.0.0 and 8.0.0?
1. Unicode 9.0.0 added new case mapping for 10 characters
2. Unicode 9.0.0 added Tangut and a few lesser-used characters.
3. Unicode 9.0.0 added a few emoji characters.

What we do to upgrade to Unicode 9.0.0?
The difference between Unicode 9.0.0 and 8.0.0 is not too much. To upgrade to
Unicode 9.0.0, we need to:
1. Import all collation weights defined in DUCET 9.0.0 to replace the weight
   table we are using now.
   Many character's weight is changed in new DUCET. For example, in DUCET
   8.0.0, the weight of 'a' is: [.1BC2.0020.0002], but in DUCET 9.0.0, it is
   [.1C47.0020.0002].
2. Import the case mapping info defined in UnicodeData.txt and CaseFolding.txt
   published by Unicode to replace the case mapping table we are using now.
   9 Cyrillic characters and 1 Latin character has new case mapping. These
   characters are:
   1C80;CYRILLIC SMALL LETTER ROUNDED VE
   1C81;CYRILLIC SMALL LETTER LONG-LEGGED DE
   1C82;CYRILLIC SMALL LETTER NARROW O
   1C83;CYRILLIC SMALL LETTER WIDE ES
   1C84;CYRILLIC SMALL LETTER TALL TE
   1C85;CYRILLIC SMALL LETTER THREE-LEGGED TE
   1C86;CYRILLIC SMALL LETTER TALL HARD SIGN
   1C87;CYRILLIC SMALL LETTER TALL YAT
   1C88;CYRILLIC SMALL LETTER UNBLENDED UK
   A7AE;LATIN CAPITAL LETTER SMALL CAPITAL I
3. Add code lines to calculate implicit weight of Tangut characters, because
   Unicode defined special algorithm for them.
   All new added Tangut characters are in range U+17000..U+187EC. For these
   characters, we compose their implicit weight [FB00.0020.0002][BBBB.0000.
   0000] (BBBB = (codepoint - 0x17000) | 0x8000).
4. Change all collation names to include the correct Unicode version, "0900".
   Change the version string from "800" to "0900" is for the coming Unicode 10.
   This change can make the sorting of collation names right.

Reference:
http://www.unicode.org/versions/Unicode9.0.0/