WL#10480: Add Japanese kana sensitive collation to utf8mb4

Affects: Server-8.0 — Status: Complete

We have implemented utf8mb4_ja_0900_as_cs collation which sorts characters by
using three levels' weight. But customer thinks it is good to have a collation
which has additional kana sensitive feature.

New collation will have the name: utf8mb4_ja_0900_as_cs_ks, with 'ks' stands
for 'kana sensitive'.

Suffix '_ks' is only for Japanese language currently. For hiragana and katakana,
DUCET assigns different weight for them with difference at third level. So other
non-Japanese collation can distinguish them already. But Japanese's default
collating rule defines that hiragana and katakana are only different at
quaternary level, which means the default Japanese collation,
utf8mb4_ja_0900_as_cs compares hiragana and katakana equal. This is why we 
introduce this new suffix and collation.

User Documentation
==================

* https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-2.html
* https://dev.mysql.com/doc/refman/8.0/en/charset-collation-names.html (_ks suffix)
* https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

F-1: The collation shall be compiled one.

F-2: The collation shall work for all characters in range [U+0, U+10FFFF].

F-3: The collation shall sort characters of Japanese language correctly
     according to language specific rules defined by CLDR.

F-4: The collation shall sort characters not belonging to the language
     according to their order in DUCET.

F-5: For the characters whose weight is not assigned in DUCET, the
     collation shall sort them with their implicit weight value which is
     constructed in the UCA way.

F-6: The collation shall be case / accent sensitive and Kana sensitive.

NF-1: The performance of this collation's functions should have regression of
      no more than 33% comparing to utf8mb4_ja_0900_as_cs since the number of
      weight levels is increased from 3 to 4.

Quaternary weight is neccesary to implement Japanese kana-sensitive collation.
It helps to destinguish Katakana from Hiragana. According to current CLDR
rules defined for Japanese (e.g. &き<<<<キ), Hiragana equals Katakana with
the default collating level (3).

This is a sample showing how kana-sensitive impacts sorting order:
We have rule: &き<<<<キ, &ゅ<<<<ュ, &ゆ<<<<ユ, and &う<<<<ウ.
A. kana insensitive (default, what utf8mb4_ja_0900_as_cs does)
きゅう = キュウ < きゆう = キユウ
きゅう = キュウ is because three characters in both strings are equal one by
one on first three levels' weight.
キュウ < きゆう is because the tertiary weight of ュ is less than it of ゆ.
きゆう = キユウ is because three characters in both strings are equal one by
one on first three levels' weight.
B. kana sensitive
きゅう < キュウ < きゆう < キユウ
きゅう < キュウ is because the quaternary weight of き is less than it of キ.
キュウ < きゆう is because the tertiary weight of ュ is less than it of ゆ.
きゆう < キユウ is because the quaternary weight of き is less than it of キ.
(Keep in mind that we compare characters' primary weight first. If the primary
weight is equal, then we compare their secondary weight. Keep comparing this
way until we find different weight, or the end of weight. That is why
キュウ < きゆう when we have き <<<< キ.)

UCA defines its way to assign quaternary weight for characters: a big enough
weight (e.g. 0xFFFF) for every normal character and 0x0000 for combining
marks, then shift the weight based on it.
We can see that for most cases, this quaternary weight is not needed. For
example, Latin character can be distinguished from Kanji, Kanji can be
distinguished from Katakana / Hiragana by three levels of weight. And for
Japanese, it is only necessary when there is Katakana / Hiragana character in
the string we want to compare. So we'd like to simplify the way that quaternary
weight is assigned. Instead of adding piles of unneccesary 0xFFFF in the weight
tables, we only assign quaternary weight to character when we know it is
Katakana / Hiragana.
The value of quaternary weight can be any positive integer. Because of the
existence of level seperator (0x0000), the quaternary weight doesn't impact
the comparing result if the result is determined with first three levels'
weight.

Reference:
http://www.unicode.org/reports/tr10/#Variable_Weighting