Documentation Home
Connectors and APIs Manual
Download this Manual

Connectors and APIs Manual  /  ...  /  Using Character Sets and Unicode

4.5.6 Using Character Sets and Unicode

All strings sent from the JDBC driver to the server are converted automatically from native Java Unicode form to the client character encoding, including all queries sent using Statement.execute(), Statement.executeUpdate(), and Statement.executeQuery(), as well as all PreparedStatement and CallableStatement parameters, excluding parameters set using setBytes(), setBinaryStream(), setAsciiStream(), setUnicodeStream(), and setBlob().

Number of Encodings Per Connection

Connector/J supports a single character encoding between client and server, and any number of character encodings for data returned by the server to the client in ResultSets.

Setting the Character Encoding

The character encoding between client and server is automatically detected upon connection (provided that the Connector/J connection properties characterEncoding and connectionCollation are not set). You specify the encoding on the server using the system variable character_set_server (for more information, see Server Character Set and Collation). The driver automatically uses the encoding specified by the server. For example, to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding and connectionCollation out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.

To override the automatically detected encoding on the client side, use the characterEncoding property in the connection URL to the server. Use Java-style names when specifying character encodings. The following table lists MySQL character set names and their corresponding Java-style names:

Table 4.7 MySQL to Java Encoding Name Translations

MySQL Character Set NameJava-Style Character Encoding Name
asciiUS-ASCII
big5Big5
gbkGBK
sjisSJIS or Cp932
cp932Cp932 or MS932
gb2312EUC_CN
ujisEUC_JP
euckrEUC_KR
latin1Cp1252
latin2ISO8859_2
greekISO8859_7
hebrewISO8859_8
cp866Cp866
tis620TIS620
cp1250Cp1250
cp1251Cp1251
cp1257Cp1257
macromanMacRoman
macceMacCentralEurope

For 8.0.12 and earlier: utf8

For 8.0.13 and later: utf8mb4

UTF-8
ucs2UnicodeBig

Notes

For Connector/J 8.0.12 and earlier: In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

For Connector/J 8.0.13 and later:

  • When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4.

  • If the connection option connectionCollation is also set alongside characterEncoding and is incompatible with it, characterEncoding will be overridden with the encoding corresponding to connectionCollation.

  • Because there is no Java-style character set name for utfmb3 that you can use with the connection option charaterEncoding, the only way to use utf8mb3 as your connection character set is to use a utf8mb3 collation (for example, utf8_general_ci) for the connection option connectionCollation, which forces a utf8mb3 character set to be used, as explained in the last bullet.

Warning

Do not issue the query SET NAMES with Connector/J, as the driver will not detect that the character set has been changed by the query, and will continue to use the character set configured when the connection was first set up.