Every character string literal has a character set and a collation.
For the simple statement SELECT
', the string has
the connection default character set and collation defined by
the string'character_set_connection
and collation_connection system
variables.
A character string literal may have an optional character set
introducer and COLLATE clause, to designate
it as a string that uses a particular character set and
collation:
[_charset_name]'string' [COLLATE collation_name]
The _
expression is formally called an
introducer. It tells the parser, “the
string that follows uses character set
charset_namecharset_name.” An introducer
does not change the string to the introducer character set like
CONVERT() would do. It does not
change the string value, although padding may occur. The
introducer is just a signal. See
Section 12.3.8, “Character Set Introducers”.
Examples:
SELECT 'abc';
SELECT _latin1'abc';
SELECT _binary'abc';
SELECT _utf8mb4'abc' COLLATE utf8mb4_danish_ci;
Character set introducers and the COLLATE
clause are implemented according to standard SQL specifications.
MySQL determines the character set and collation of a character string literal in the following manner:
If both
_charset_nameandCOLLATEare specified, character setcollation_namecharset_nameand collationcollation_nameare used.collation_namemust be a permitted collation forcharset_name.If
_charset_nameis specified butCOLLATEis not specified, character setcharset_nameand its default collation are used. To see the default collation for each character set, use theSHOW CHARACTER SETstatement or query theINFORMATION_SCHEMACHARACTER_SETStable.If
_charset_nameis not specified butCOLLATEis specified, the connection default character set given by thecollation_namecharacter_set_connectionsystem variable and collationcollation_nameare used.collation_namemust be a permitted collation for the connection default character set.Otherwise (neither
_charset_namenorCOLLATEis specified), the connection default character set and collation given by thecollation_namecharacter_set_connectionandcollation_connectionsystem variables are used.
Examples:
A nonbinary string with
latin1character set andlatin1_german1_cicollation:SELECT _latin1'Müller' COLLATE latin1_german1_ci;A nonbinary string with
utf8mb4character set and its default collation (that is,utf8mb4_0900_ai_ci):SELECT _utf8mb4'Müller';A binary string with
binarycharacter set and its default collation (that is,binary):SELECT _binary'Müller';A nonbinary string with the connection default character set and
utf8mb4_0900_ai_cicollation (fails if the connection character set is notutf8mb4):SELECT 'Müller' COLLATE utf8mb4_0900_ai_ci;A string with the connection default character set and collation:
SELECT 'Müller';
An introducer indicates the character set for the following
string, but does not change how the parser performs escape
processing within the string. Escapes are always interpreted by
the parser according to the character set given by
character_set_connection.
The following examples show that escape processing occurs using
character_set_connection even
in the presence of an introducer. The examples use
SET NAMES (which changes
character_set_connection, as
discussed in Section 12.4, “Connection Character Sets and Collations”), and display
the resulting strings using the
HEX() function so that the exact
string contents can be seen.
Example 1:
mysql> SET NAMES latin1;
mysql> SELECT HEX('à\n'), HEX(_sjis'à\n');
+------------+-----------------+
| HEX('à\n') | HEX(_sjis'à\n') |
+------------+-----------------+
| E00A | E00A |
+------------+-----------------+
Here, à (hexadecimal value
E0) is followed by \n, the
escape sequence for newline. The escape sequence is interpreted
using the
character_set_connection value
of latin1 to produce a literal newline
(hexadecimal value 0A). This happens even for
the second string. That is, the _sjis
introducer does not affect the parser's escape processing.
Example 2:
mysql> SET NAMES sjis;
mysql> SELECT HEX('à\n'), HEX(_latin1'à\n');
+------------+-------------------+
| HEX('à\n') | HEX(_latin1'à\n') |
+------------+-------------------+
| E05C6E | E05C6E |
+------------+-------------------+
Here, character_set_connection
is sjis, a character set in which the
sequence of à followed by
\ (hexadecimal values 05
and 5C) is a valid multibyte character.
Hence, the first two bytes of the string are interpreted as a
single sjis character, and the
\ is not interpreted as an escape character.
The following n (hexadecimal value
6E) is not interpreted as part of an escape
sequence. This is true even for the second string; the
_latin1 introducer does not affect escape
processing.