WL#4024: gb18030 Chinese character set

Affects: Server-5.7   —   Status: Complete

In 2000, mainland China introduced a new character set:
gb18030, ""Chinese National Standard GB 18030-2000:
Information Technology -- Chinese ideograms coded character
set for information interchange -- Extension for the basic set".
There was a modification in 2005 so now it's GB 18030-2005.
This supersedes the two character sets that MySQL supports,
gb2312 and gbk.

Since 18030 is upward compatible with gb2312 and gbk,
and since the new characters in gb18030 are rare,
it has been possible to use gb2312 or gbk when the
true target is gb18030. But the Chinese government
doesn't approve of that, and it does cause problems
for users.

A prerequisite is: MySQL must support supplementary
Unicode characters as described in
WL#1213 Implement 4-byte UTF8, UTF16 and UTF32.
This was implemented in MySQL-6.0.

MySQL character set will be gb18030.
gb18030 will support UPPER/LOWER conversion
for all letters, including Latin extended characters
and non-Latin letters (Greek, Cyrillic, etc).


Collation names will be:

- gb18030_bin

- gb18030_chinese_ci
this collation will sort according to UPPER map,
.i.e. using UPPER(letter) as weight for this letter.



References
----------

Wikipedia article "GB 18030"
http://en.wikipedia.org/wiki/GB_18030