In the great majority of statements, it is obvious what
collation MySQL uses to resolve a comparison operation. For
example, in the following cases, it should be clear that the
collation is the collation of column
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column
x, or of the string literal
'Y' have collations, so which collation
Standard SQL resolves such questions using what used to be called “coercibility” rules. MySQL assigns coercibility values as follows:
COLLATE clause has a
coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column has a coercibility of 2.
The collation of a literal has a coercibility of 4.
NULL or an expression that is derived
NULL has a coercibility of 5.
The preceding coercibility values are current as of MySQL
4.1.11. Before MySQL 4.1.11, there is no system constant or
NULL coercibility. Functions such as
USER() have a coercibility of 2
rather than 3, and literals have a coercibility of 3 rather
MySQL uses coercibility values with the following rules to resolve ambiguities:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of
utf8 and the same collation as
utf8_column. Values of
latin1_column are automatically
For an operation with operands from the same character
set but that mix a
_bin collation is
used. This is similar to how operations that mix
nonbinary and binary strings evaluate the operands as
binary strings, except that it is for collations
rather than data types.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
|Use collation of |
|Use collation of |
COERCIBILITY() function can
be used to determine the coercibility of a string expression:
SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);-> 0 mysql>
SELECT COERCIBILITY(VERSION());-> 3 mysql>
SELECT COERCIBILITY('A');-> 4
For implicit conversion of a numeric or temporal value to a
string, such as occurs for the argument
in the expression
'abc'), the result is a binary string for which the
character set and collation are
Section 11.2, “Type Conversion in Expression Evaluation”.