WL#13054: Add utf8mb4 binary no-pad collation

Affects: Server-8.0   —   Status: Complete

We have a binary collation for utf8mb4, utf8mb4_bin. But it has PAD_SPACE
attribute, so it will add pad spaces at the trailing. JSON needs a binary
collation for utf8mb4 that doesn't add pad space.

We use this WL to track the work adding a new binary collation for utf8mb4,
F-1: The collation shall be a compiled one.

F-2: The collation shall work for all valid Unicode code points in range
     [U+0, U+10FFFF].

F-3: The collation sorts Unicode code points by their code point order.

F-4: This new collation will be NO_PAD, which means it won't add trailing space.
Because we only need to change the collation not to add pad spaces, we can
re-use most of the code of the collation utf8mb4_bin.

But there is one more thing we want to change, that is what to return as a
character's weight. utf8mb4_bin returns three bytes for any one character.
The three bytes are, the bytes of this character's Unicode code point, and 
leading zero bytes if the code point does not have three bytes. For example,
the weight bytes for U+1234 is, 0x00, 0x12, and 0x34. (Please see 
We can make it simpler, to return the same bytes as the utf8mb4 code points.
For example, U+1234, its utf8mb4 code points is 0xE1, 0x88, 0xB4. We can give
U+1234 the weight of 0xE188B4 too. For utf8mb4 byte, we don't need to consider
the endian problem, so we don't need to do the bit shift. And since the length
of the weight is easy to know, we don't need to check the boundary of the 
weight buffer for every byte. This gives some performance improvement.

For the collating result, utf8mb4 code point's first byte is always greater
than the following bytes. Utf8mb4's first byte might be (we only think about
the value it might be, don't think about the character validity):
0xxx xxxx // First byte of one byte encoding character, from 0x00 to 0x7F
110x xxxx // First byte of two bytes encoding character, from 0xC0 to 0xDF
1110 xxxx // First byte of three bytes encoding character, from 0xE0 to 0xEF
1111 0xxx // First byte of four bytes encoding character, from 0xF0 to 0xF7

Except for the first byte, the other bytes are in same form: 10xx xxxx. Its
value varies from 0x80 to 0xBF. We can see that there is no overlap between
the value of the leading byte and the following bytes.

When we compare two characters, we start from the first byte to the last byte,
the collating result is same as we use Unicode code point as character's weight.