In 2000, mainland China introduced a new character set: gb18030, ""Chinese National Standard GB 18030-2000: Information Technology -- Chinese ideograms coded character set for information interchange -- Extension for the basic set". There was a modification in 2005 so now it's GB 18030-2005. This supersedes the two character sets that MySQL supports, gb2312 and gbk. Since 18030 is upward compatible with gb2312 and gbk, and since the new characters in gb18030 are rare, it has been possible to use gb2312 or gbk when the true target is gb18030. But the Chinese government doesn't approve of that, and it does cause problems for users. A prerequisite is: MySQL must support supplementary Unicode characters as described in WL#1213 Implement 4-byte UTF8, UTF16 and UTF32. This was implemented in MySQL-6.0. MySQL character set will be gb18030. gb18030 will support UPPER/LOWER conversion for all letters, including Latin extended characters and non-Latin letters (Greek, Cyrillic, etc). Collation names will be: - gb18030_bin - gb18030_chinese_ci this collation will sort according to UPPER map, .i.e. using UPPER(letter) as weight for this letter. References ---------- Wikipedia article "GB 18030" http://en.wikipedia.org/wiki/GB_18030 "GB18030-2000 - The New Chinese National Standard" http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html lists.mysql.com "Re: how to continue invalid gbk char?" http://lists.mysql.com/internals/34872 Windows code page 54936 - Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030) http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx GB18030 at "Oracle Globalization Support" http://download.oracle.com/docs/cd/B32110_01/content.1013/b32191/globesup.htm GB18030 at "International Features in Microsoft SQL Server 2005" http://msdn.microsoft.com/en-us/library/bb330962(SQL.90).aspx
