WL#826: Alternative collation in indexes

Affects: Server-7.1   —   Status: Un-Assigned

Add a possibility to create an index with collation different from
the collation in the column description.

For example:
CREATE TABLE t1 (f1 CHAR(5) CHARACTER SET latin1 COLLATE latin1_general_ci);
CREATE INDEX k1 ON t1 (f1 COLLATE latin1_general_ci);
CREATE INDEX k1 ON t1 (f1 COLLATE latin1_german2_ci);
CREATE INDEX k1 ON t1 (f1 COLLATE latin1_bin);

This can be very useful with Unicode columns, as we
have many collations for UCS2/UTF8. However, having
for example, 15 language specific indexes on the same
column is an enormous waste of space. Another approach
would be to be able to reuse for example an index for
language A for searches in language B. It seems to
be possible in Unicode, at least with restrictions,
because all Unicode collations, unlike MySQL 8bit collations,
are "almost the same". I.e. they are the same for almost 64K characters,
and slightly different in some, up to only several dozen, language specific
characters. So we could reuse indexes with different collations by means of 
extending a search range, applying language  ordering difference.
This needs to be additionally studied.

The simplest case is to reuse a non-binary index for binary searches.

Another use case came up in discussion of
WL#896 Primary, Secondary and Tertiary Sorts.
The question was: if there is a _ci index and
a _w123 index, will the optimizer give priority
to the _w123 index? Bar replied: 
If we have a _ci and _w123 index on the same column, then _w123      
index should be picked up in the optimizer for ORDER BY,      
but a _ci index for GROUP/DISTINCT/WHERE optimization.      
What do you think?