WL#4616: Implement UTF-16LE
Affects: Server-5.6
—
Status: Complete
As of version 5.5, MySQL supports only UTF16 (i.e. big-endian) character set. This task is about adding UTF-16LE (i.e. little endian). Rationale ========= - We need UTF16-LE as a prerequisite for "WL#4616 Support Unicode for Windows command line client", as Windows console API functions are all UTF16-LE. - Sun Globalization rules require us to support UTF-16LE - UTF16-LE can be useful for other Unicode applications, especially on Windows Jeff Balint (connectors) wrote: >We are currently doing character set conversions in the driver >(character_set_results=null). We convert it to UTF8, and have our own >code to convert it to UTF16. If UTF-16LE support is added to the server >and driver, it should improve performance for apps using many/large >unicode strings. Johannes Schlüter wrote: >PHP 6, which will, probably, be release in 1.5 - 2 years, will use >Utf-16 internally. In the current development tree of the PHP connectors >we're converting to UTF-8. So from that side it'd be nice to save that >conversion. >... >PHP 6's Unicode implementation is based on IBM's ICU library which uses >system dependent endianess. So being able to pass through Utf-16 BE and >LE would, again, save the conversion on our side. Peter Laursen (Basic Quality Contributor) writes in BUG#52494: > Server side support for UTF16 (LE) would be > very nice and would solve quite a lot of problems/issues for Windows > users working in a multi-lingual environment.
Character set name ================== MySQL character set name will be utf16le. Built-in collation names ========================= As of WL#4616, utf16le will be used mostly for conversion purposes. We will NOT implement the whole bunch of language collations for utf16le. We can add all UCA-based utf16le collations later, when InnoDB supports 2-byte collations IDs. WL#4616 will add two built-in collations: - utf16le_general_ci, the default collation, case insensitive (similar to utf16_general_ci) - utf16le_bin, case sensitive collation with codepoint-to-codepoint comparison style Note, in MySQL-5.5.8 we reverted the part of the patch for: WL#55980 Character sets: supplementary character _bin ordering is wrong which made utf16_bin sort in byte-by-byte order. Now utf16_bin implements code point order. So utf16le_bin and utf16_bin will give exactly the same character order to each other, and to utf32_bin/utf8mb4_bin. User-defined collations ======================= We will not add utf16le_unicode_ci at this point. That means adding user-defined collations using character set definition file Index.xml will not be possible for utf16le. We'll add utf16le_unicode_ci (together with a possibility to have user-defined collations) later, when InnoDB supports 2-byte collation IDs.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.