WL#5331: Support Unicode for Windows command line client
Affects: Server-5.6
—
Status: Complete
The purpose of this task is to support Unicode for command line clients, specifically on Windows, and to do it right, i.e using Unicode (UTF-16LE) APIs provided by Windows for reading and writing on the console. The idea of this worklog is that the only thing a user shall do to use Unicode on the command line is to change console settings to use a TrueType font (Lucida, Consolas) instead of using default raster fonts. The MSI installer can help here (changing a shortcut to mysql.exe to use a TrueType font), but installer changes are outside this WL's scope.
Affected MySQL tools ==================== This task is about command line client mysql.exe only. In the future we can think of extending the other tools as well: - mysqladmin - mysqlshow - mysqlcheck - mysqlimport - maybe others... Unicode API vs standard C API ============================= There was an idea to introduce a new command line parameter --unicode, which will switch between the standard C stdin/stdout communication routines (fprintf/fputc/fgets) to the Windows console API (ReadConsoleW/WriteConsoleW): mysql --unicode test mysql.exe will keep using fprintf/fputc/fgets-family functions by default. The --unicode parameter would only be valid if we determine that the input stream (stdin) is a console (i.e. not a file or a pipe). Otherwise, an error will be printed. The --unicode parameter would be available on Windows only. No Linux, no Mac, etc. The idea of the --unicode parameter was canceled. mysql.exe will always use Unicode API to read lines in interactive mode, and always use Unicode API when printing if stdout/stderr are Windows console. Connection character set ======================== As MySQL server does not support UTF16-LE as a connection character set, we'll convert back and forth between the connection character set and the Windows wide string character set (UTF16-LE). We could change the default connection character set for mysql.exe to be UTF8 (so all of Unicode or at least the BMP is covered "out of the box"), but: - We want the new code to get stabilized first - Change from OEM character set to UTF8 will potentially lead to "illegal mix of collations" errors in some cases. We want to avoid that. So, the connection character set will keep working according to the usual character set detection procedure (as made by WL#1349) in this order: - Command line parameter: mysql --default-character-set=xxx - my.ini: default-character-set=xxx - automatic detection according to machine localization, as reported by GetLocaleInfo(). To start mysql.exe in the full-featured Unicode mode, users will run: mysql.exe --default-character-set=utf8 test which will "use Unicode API to communicate with console" and "use UTF8 to communicate with MySQL server". Although, the command line may look too long and inconvenient, that should not be a problem - the plan is to create a shortcut later anyway (in a separate installer WL). Another approach to avoid long command line is to put "default-character-set=utf8" into my.ini. Convert command line parameters ================================= We'll also correctly translate mysql.exe command line parameters (as they can have Unicode characters) to the connection character set. This is required, for example, for -e parameter, for SQL queries that contain Unicode characters: mysql -uroot -e "SELECT" We cannot use argc or argv from main() routine (since C runtime already does some translation from Unicode to the current ANSI codepage). Instead we're going to use GetCommandLineW() and CommandLinetoArgvW() APIs and translate them appropriately to client encoding. Implementation ============== Vladislav Vaintroub previously committed a preliminary patch implementing using of Unicode console API in mysql.exe (See the References sections). The final patch is planned to be based mostly on the Vlad's original patch, with some details changed: - Connection character set will stay detected according to WL#1349, rather than utf8. - utf16le conversion functions will be removed, as our string library now supports utf16le References ========== - Vlad's patch: http://lists.mysql.com/commits/105379 - Michael Kaplan's BLOG: http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx - Creating shotcuts: http://msdn.microsoft.com/en-us/library/xsy6k3ys discuss thread "UTF-16LE in Windows" (Roel Van de Paar, Alexander Barkov) [ mysql intranet ]/secure/mailarchive/mail.php?folder=6&mail=14934 [ mysql intranet ]/secure/mailarchive/mail.php?folder=6&mail=14936
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.