MySQL has a very flexible character set support as documented in Character Sets, Collations, Unicode. The list of character sets along with the names and IDs of their default collations can be queried as follows:
mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME, ID
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE IS_DEFAULT = 'yes'
ORDER BY ID;
+--------------------+---------------------+-----+
| CHARACTER_SET_NAME | COLLATION_NAME | ID |
+--------------------+---------------------+-----+
| big5 | big5_chinese_ci | 1 |
| dec8 | dec8_swedish_ci | 3 |
| cp850 | cp850_general_ci | 4 |
| hp8 | hp8_english_ci | 6 |
| koi8r | koi8r_general_ci | 7 |
| latin1 | latin1_swedish_ci | 8 |
| latin2 | latin2_general_ci | 9 |
| swe7 | swe7_swedish_ci | 10 |
| ascii | ascii_general_ci | 11 |
| ujis | ujis_japanese_ci | 12 |
| sjis | sjis_japanese_ci | 13 |
| hebrew | hebrew_general_ci | 16 |
| tis620 | tis620_thai_ci | 18 |
| euckr | euckr_korean_ci | 19 |
| koi8u | koi8u_general_ci | 22 |
| gb2312 | gb2312_chinese_ci | 24 |
| greek | greek_general_ci | 25 |
| cp1250 | cp1250_general_ci | 26 |
| gbk | gbk_chinese_ci | 28 |
| latin5 | latin5_turkish_ci | 30 |
| armscii8 | armscii8_general_ci | 32 |
| utf8 | utf8_general_ci | 33 |
| ucs2 | ucs2_general_ci | 35 |
| cp866 | cp866_general_ci | 36 |
| keybcs2 | keybcs2_general_ci | 37 |
| macce | macce_general_ci | 38 |
| macroman | macroman_general_ci | 39 |
| cp852 | cp852_general_ci | 40 |
| latin7 | latin7_general_ci | 41 |
| cp1251 | cp1251_general_ci | 51 |
| utf16 | utf16_general_ci | 54 |
| utf16le | utf16le_general_ci | 56 |
| cp1256 | cp1256_general_ci | 57 |
| cp1257 | cp1257_general_ci | 59 |
| utf32 | utf32_general_ci | 60 |
| binary | binary | 63 |
| geostd8 | geostd8_general_ci | 92 |
| cp932 | cp932_japanese_ci | 95 |
| eucjpms | eucjpms_japanese_ci | 97 |
| gb18030 | gb18030_chinese_ci | 248 |
| utf8mb4 | utf8mb4_0900_ai_ci | 255 |
+--------------------+---------------------+-----+
The following table shows a few common character sets.
Number |
Hex |
Character Set Name |
---|---|---|
8 |
0x08 |
latin1_swedish_ci |
33 |
0x21 |
utf8_general_ci |
63 |
0x3f |
binary |
Protocol::CharacterSet
A character set is defined in the protocol as a integer.
Fields
charset_nr (2) -- number of the character set and collation
This “charset” ID is actually a collation ID. A
given collation implies the character set and the names of both
can be seen in the INFORMATION_SCHEMA
COLLATIONS
table. For example:
mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE ID = 255;
+--------------------+--------------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+--------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+--------------------+--------------------+