Documentation Home
Connectors and APIs Manual
Download this Manual
PDF (US Ltr) - 4.1Mb
PDF (A4) - 4.1Mb


Connectors and APIs Manual  /  ...  /  Using Character Sets and Unicode

3.5.7 Using Character Sets and Unicode

All strings sent from the JDBC driver to the server are converted automatically from native Java Unicode form to the connection's character encoding, including all queries sent using Statement.execute(), Statement.executeUpdate(), and Statement.executeQuery(), as well as all PreparedStatement and CallableStatement parameters, excluding parameters set using the following methods:

  • setBlob()

  • setBytes()

  • setClob()

  • setNClob()

  • setAsciiStream()

  • setBinaryStream()

  • setCharacterStream()

  • setNCharacterStream()

  • setUnicodeStream()

Number of Encodings Per Connection

Connector/J supports a single character encoding between the client and the server, and any number of character encodings for data returned by the server to the client in ResultSets.

Setting the Character Encoding

For Connector/J 8.0.25 and earlier: The character encoding between the client and the server is automatically detected upon connection (provided that the Connector/J connection properties characterEncoding and connectionCollation are not set). The encoding on the server is specified using the system variable character_set_server (for more information, see Server Character Set and Collation), and the driver automatically uses the encoding. For example, to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding and connectionCollation out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting. To override the automatically detected encoding on the client side, use the characterEncoding property in the connection URL to the server.

For Connector/J 8.0.26 and later: There are two phases during the connection initialization in which the character encoding and collation are set.

  • Pre-Authentication Phase: In this phase, the character encoding between the client and the server is determined by the settings of the Connector/J connection properties, in the following order of priority:

  • Post-Authentication Phase: In this phase, the character encoding between the client and the server for the rest of the session is determined by the settings of the Connector/J connection properties, in the following order of priority:

    This means Connector/J needs to issue a SET NAMES Statement to change the character set and collation that were established in the pre-authentication phase only if passwordCharacterEncoding is set, but its setting is different from that of connectionCollation, or different from that of characterEncoding (when connectionCollation is not set), or different from utf8mb4 (when both connectionCollation and characterEncoding are not set).

Custom Character Sets and Collations

For Connector/J 8.0.26 and later only: To support the use of custom character sets and collations on the server, set the Connector/J connection property detectCustomCollations to true, and provide the mapping between the custom character sets and the Java character encodings by supplying the customCharsetMapping connection property with a comma-delimited list of custom_charset:java_encoding pairs (for example: customCharsetMapping=charset1:UTF-8,charset2:Cp1252).

MySQL to Java Encoding Name Translations

Use Java-style names when specifying character encodings. The following table lists MySQL character set names and their corresponding Java-style names:

Table 3.24 MySQL to Java Encoding Name Translations

MySQL Character Set Name Java-Style Character Encoding Name
ascii US-ASCII
big5 Big5
gbk GBK
sjis SJIS or Cp932
cp932 Cp932 or MS932
gb2312 EUC_CN
ujis EUC_JP
euckr EUC_KR
latin1 Cp1252
latin2 ISO8859_2
greek ISO8859_7
hebrew ISO8859_8
cp866 Cp866
tis620 TIS620
cp1250 Cp1250
cp1251 Cp1251
cp1257 Cp1257
macroman MacRoman
macce MacCentralEurope

For 8.0.12 and earlier: utf8

For 8.0.13 and later: utf8mb4

UTF-8
ucs2 UnicodeBig

Notes

For Connector/J 8.0.12 and earlier: In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

For Connector/J 8.0.13 and later:

  • When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4.

  • If the connection option connectionCollation is also set alongside characterEncoding and is incompatible with it, characterEncoding will be overridden with the encoding corresponding to connectionCollation.

  • Because there is no Java-style character set name for utfmb3 that you can use with the connection option charaterEncoding, the only way to use utf8mb3 as your connection character set is to use a utf8mb3 collation (for example, utf8_general_ci) for the connection option connectionCollation, which forces a utf8mb3 character set to be used, as explained in the last bullet.

Warning

Do not issue the query SET NAMES with Connector/J, as the driver will not detect that the character set has been changed by the query, and will continue to use the character set configured when the connection was first set up.