WL#9109: Add case and accent sensitive collations for utf8mb4

Affects: Server-8.0 — Status: Complete

Description
Requirements
High Level Architecture

Case and accent insensitive collations have been added with WL#9108 and WL#9125.
This is to add case and accent sensitive collations.

DUCET defines 3 levels collation weight. Of which the first level (primary level)
is used to compare base letter, the secondary level is used to compare accent if
the base letters are equal, and the third (tertiary) level is used to compare
case if the base letter and its accent are equal.

Our case and accent insensitive collations use only the first level of collation
weight defined in DUCET. We'll use all 3 levels' weight to implement this WL.

F-1, the function should still work with case / accent insensitive collations
F-2, the function should return secondary / tertiary weights as demanded
F-3, the function should pad correct space character's secondary / tertiary
     weights to the end of string

As we said in the High-Level Description, DUCET defines 3 levels collation
weight. With accent and case sensitive collations, we'll compare 2 strings'
first level weights first. If equal, then we compare their secondary level
weights. If equal again, then compare the third level weights.

For example, following 4 characters are equal if we compare them by the accent
and case insensitive collations.
006F  ; [.1D58.0020.0002] # LATIN SMALL LETTER O
004F  ; [.1D58.0020.0008] # LATIN CAPITAL LETTER O
00D3  ; [.1D58.0020.0008][.0000.0024.0002] # LATIN CAPITAL LETTER O WITH ACUTE
00D2  ; [.1D58.0020.0008][.0000.0025.0002] # LATIN CAPITAL LETTER O WITH GRAVE
It is because their first level weights are all 0x1D58. But with accent and 
case sensitive collation, the order should be: 006F <<< 004F << 00D3 << 00D2.
Because 006F's third level weight '0002' < 004F's third level weight '0008',
and 004F's secondary level weight '0000' < 00D3's secondary level weight '0024'
and so on. The '<<<' means 'case level less than', and '<<' means 'accent level
less than'. In this way, we can distinguish all these 4 characters.

The strnxfrm function is used to return weight of characters. With it, we'll
return weight data one level followed by another, primary level first,
followed by secondary level and then tertiary level.
For example, for characters in string 'o\u00D3' and 'O\u00D2', our current
strnxfrm returns "0x1D58, 0x1D58" for both strings, so that they sort equal. 
After this implementation, for string 'o\u00D3', the weights returned from 
strnxfrm should be:
    "0x1D58, 0x1D58, 0000, 0x0020, 0x0020, 0x0024, 0000, 0x0002, 0x0008,
     0x0002",
and for string 'O\u00D2', the weights returned should be:
    "0x1D58, 0x1D58, 0000, 0x0020, 0x0020, 0x0025, 0000, 0x0008, 0x0008,
     0x0002".
In this way, we'll be able to distinguish these 2 strings.
The '0000' in above weights is called weight separator. It is because the
secondary weight range in DUCET is [0020, 0192], and the third level weight
range is [0002, 001F]. There might be overlap after weight shift for specific
languages.

For the spaces padding to the right of string, because the weight of space(0x20)
is defined as: 0020  ; [*0209.0020.0002] # SPACE
If we are to add, for instance, one padding space to the string 'o\u00D3', our
current strnxfrm returns "0x1D58, 0x1D58, 0x0209". After this implementation, it
should append space's primary weight to the end of character's primary weights,
space's secondary weight to the end if character's secondary weights, and same
for tertiary weight. The weight returned should be:
    "0x1D58, 0x1D58, 0x0209, 0000, 0x0020, 0x0020, 0x0024, 0x0020, 0000,
     0x0002, 0x0008, 0x0002, 0x0002".