WL#5331: Support Unicode for Windows command line client

Affects: Server-5.6   —   Status: Complete

The purpose of this task is to support Unicode for command line clients, 
specifically on Windows, and to do it right, i.e using Unicode (UTF-16LE) APIs 
provided by Windows for reading and writing on the console. 

The idea of this worklog is that the only thing a user shall do to use Unicode
on the command line is to change console settings to use a TrueType font
(Lucida, Consolas) instead of using default raster fonts.  The MSI installer
can help here (changing a shortcut to mysql.exe to use a TrueType font), but
installer changes are outside this WL's scope.
Affected MySQL tools
====================
This task is about command line client mysql.exe only.

In the future we can think of extending the other tools as well:
- mysqladmin
- mysqlshow
- mysqlcheck
- mysqlimport
- maybe others...

Unicode API vs standard C API
=============================
There was an idea to introduce a new command line parameter --unicode,
which will switch between the standard C stdin/stdout communication routines
(fprintf/fputc/fgets) to the Windows console API (ReadConsoleW/WriteConsoleW):

  mysql --unicode test

mysql.exe will keep using fprintf/fputc/fgets-family functions by default.

The --unicode parameter would only be valid if we determine that 
the input stream (stdin) is a console (i.e. not a file or a pipe).
Otherwise, an error will be printed.
The --unicode parameter would be available on Windows only.
No Linux, no Mac, etc.

The idea of the --unicode parameter was canceled.
mysql.exe will always use Unicode API to read lines in interactive mode,
and always use Unicode API when printing if stdout/stderr are Windows
console.

Connection character set
========================
As MySQL server does not support UTF16-LE as a connection character set,
we'll convert back and forth  between the connection character set and
the Windows wide string character set (UTF16-LE).

We could change the default connection character set for mysql.exe
to be UTF8 (so all of Unicode or at least the BMP is covered "out of the box"),
but:
- We want the new code to get stabilized first
- Change from OEM character set to UTF8 will potentially lead to
  "illegal mix of collations" errors in some cases. We want to avoid that.


So,  the connection character set will keep working according
to the usual character set detection procedure (as made by WL#1349)
in this order:
- Command line parameter: mysql --default-character-set=xxx
- my.ini: default-character-set=xxx
- automatic detection according to machine localization,
  as reported by GetLocaleInfo().

To start mysql.exe in the full-featured Unicode mode, users will run:

  mysql.exe --default-character-set=utf8 test

which will "use Unicode API to communicate with console"
and "use UTF8 to communicate with MySQL server".

Although, the command line may look too long and inconvenient,
that should not be a problem - the plan is to create a shortcut later anyway
(in a separate installer WL). Another approach to avoid long command 
line is to put "default-character-set=utf8" into my.ini.


Convert command line parameters
=================================
We'll also correctly translate mysql.exe command line parameters
(as they can have Unicode characters) to the connection character set.

This is required, for example, for -e parameter, for SQL queries
that contain Unicode characters:

  mysql -uroot -e "SELECT "

We cannot use argc or argv from main() routine (since C runtime already does 
some translation from Unicode to the current ANSI codepage).
Instead we're going to use GetCommandLineW() and CommandLinetoArgvW() APIs
and translate them  appropriately to client encoding.


Implementation
==============
Vladislav Vaintroub previously committed a preliminary patch implementing
using of Unicode console API in mysql.exe (See the References sections).
The final patch is planned to be based mostly on the Vlad's original patch, 
with some details changed:
- Connection character set will stay detected according to WL#1349,
  rather than utf8.
- utf16le conversion functions will be removed, as our string library
  now supports utf16le


References
==========
- Vlad's patch: 
  http://lists.mysql.com/commits/105379
- Michael Kaplan's BLOG:
http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx
- Creating shotcuts:
http://msdn.microsoft.com/en-us/library/xsy6k3ys

discuss thread "UTF-16LE in Windows" (Roel Van de Paar, Alexander Barkov)
[ mysql intranet ]/secure/mailarchive/mail.php?folder=6&mail=14934
[ mysql intranet ]/secure/mailarchive/mail.php?folder=6&mail=14936