Character sets are used by MySQL when storing information, both to
ensure that the information is stored (and returned) in the
correct format, but also for the purposes of collation and
sorting. Each character set supports one or more collations, and
so these are collectively known as Collation
Sets, rather than character sets.
Character sets are recorded against individual tables and returned
as part of the field data. For example, the
MYSQL_FIELD data type definition includes the
field charsetnr:
typedef struct st_mysql_field {
char *name; /* Name of column */
char *org_name; /* Original column name, if an alias */
char *table; /* Table of column if column was a field */
char *org_table; /* Org table name, if table was an alias */
char *db; /* Database for table */
char *catalog; /* Catalog for table */
char *def; /* Default value (set by mysql_list_fields) */
unsigned long length; /* Width of column (create length) */
unsigned long max_length; /* Max width for selected set */
unsigned int name_length;
unsigned int org_name_length;
unsigned int table_length;
unsigned int org_table_length;
unsigned int db_length;
unsigned int catalog_length;
unsigned int def_length;
unsigned int flags; /* Div flags */
unsigned int decimals; /* Number of decimals in field */
unsigned int charsetnr; /* Character set */
enum enum_field_types type; /* Type of field. See mysql_com.h for types */
} MYSQL_FIELD;
Character set and collation information are specific to a server
version and installation, and are generated automatically from the
sql/share/charsets/Index.xml file in the source
distribution.
You can obtain a list of the available character sets configured
within a server by running SHOW COLLATION, or
by running a query on the
INFORMATION_SCHEMA.COLLATION table. A sample of
the information from that table has been provided here for
reference.
Collation Id |
Charset |
Collation |
Default |
Sortlen |
64 |
|
|
?? |
1 |
32 |
|
|
Yes |
1 |
65 |
|
|
?? |
1 |
11 |
|
|
Yes |
1 |
84 |
|
|
?? |
1 |
1 |
|
|
Yes |
1 |
63 |
|
|
Yes |
1 |
66 |
|
|
?? |
1 |
44 |
|
|
?? |
1 |
34 |
|
|
?? |
2 |
26 |
|
|
Yes |
1 |
50 |
|
|
?? |
1 |
14 |
|
|
?? |
1 |
52 |
|
|
?? |
1 |
23 |
|
|
?? |
1 |
51 |
|
|
Yes |
1 |
67 |
|
|
?? |
1 |
57 |
|
|
Yes |
1 |
58 |
|
|
?? |
1 |
29 |
|
|
?? |
1 |
59 |
|
|
Yes |
1 |
80 |
|
|
?? |
1 |
4 |
|
|
Yes |
1 |
81 |
|
|
?? |
1 |
40 |
|
|
Yes |
1 |
68 |
|
|
?? |
1 |
36 |
|
|
Yes |
1 |
96 |
|
|
?? |
1 |
95 |
|
|
Yes |
1 |
69 |
|
|
?? |
1 |
3 |
|
|
Yes |
1 |
98 |
|
|
?? |
1 |
97 |
|
|
Yes |
1 |
85 |
|
|
?? |
1 |
19 |
|
|
Yes |
1 |
86 |
|
|
?? |
1 |
24 |
|
|
Yes |
1 |
87 |
|
|
?? |
1 |
28 |
|
|
Yes |
1 |
93 |
|
|
?? |
1 |
92 |
|
|
Yes |
1 |
70 |
|
|
?? |
1 |
25 |
|
|
Yes |
1 |
71 |
|
|
?? |
1 |
16 |
|
|
Yes |
1 |
72 |
|
|
?? |
1 |
6 |
|
|
Yes |
1 |
73 |
|
|
?? |
1 |
37 |
|
|
Yes |
1 |
74 |
|
|
?? |
1 |
7 |
|
|
Yes |
1 |
75 |
|
|
?? |
1 |
22 |
|
|
Yes |
1 |
47 |
|
|
?? |
1 |
15 |
|
|
?? |
1 |
48 |
|
|
?? |
1 |
49 |
|
|
?? |
1 |
5 |
|
|
?? |
1 |
31 |
|
|
?? |
2 |
94 |
|
|
?? |
1 |
8 |
|
|
Yes |
1 |
77 |
|
|
?? |
1 |
27 |
|
|
?? |
1 |
2 |
|
|
?? |
4 |
21 |
|
|
?? |
1 |
9 |
|
|
Yes |
1 |
78 |
|
|
?? |
1 |
30 |
|
|
Yes |
1 |
79 |
|
|
?? |
1 |
20 |
|
|
?? |
1 |
42 |
|
|
?? |
1 |
41 |
|
|
Yes |
1 |
43 |
|
|
?? |
1 |
38 |
|
|
Yes |
1 |
53 |
|
|
?? |
1 |
39 |
|
|
Yes |
1 |
88 |
|
|
?? |
1 |
13 |
|
|
Yes |
1 |
82 |
|
|
?? |
1 |
10 |
|
|
Yes |
1 |
89 |
|
|
?? |
1 |
18 |
|
|
Yes |
4 |
90 |
|
|
?? |
1 |
138 |
|
|
?? |
8 |
139 |
|
|
?? |
8 |
145 |
|
|
?? |
8 |
134 |
|
|
?? |
8 |
146 |
|
|
?? |
8 |
129 |
|
|
?? |
8 |
130 |
|
|
?? |
8 |
140 |
|
|
?? |
8 |
144 |
|
|
?? |
8 |
133 |
|
|
?? |
8 |
131 |
|
|
?? |
8 |
143 |
|
|
?? |
8 |
141 |
|
|
?? |
8 |
132 |
|
|
?? |
8 |
142 |
|
|
?? |
8 |
135 |
|
|
?? |
8 |
136 |
|
|
?? |
8 |
137 |
|
|
?? |
8 |
128 |
|
|
?? |
8 |
35 |
|
|
Yes |
1 |
91 |
|
|
?? |
1 |
12 |
|
|
Yes |
1 |
83 |
|
|
?? |
1 |
202 |
|
|
?? |
8 |
203 |
|
|
?? |
8 |
209 |
|
|
?? |
8 |
198 |
|
|
?? |
8 |
210 |
|
|
?? |
8 |
193 |
|
|
?? |
8 |
194 |
|
|
?? |
8 |
204 |
|
|
?? |
8 |
208 |
|
|
?? |
8 |
197 |
|
|
?? |
8 |
195 |
|
|
?? |
8 |
207 |
|
|
?? |
8 |
205 |
|
|
?? |
8 |
196 |
|
|
?? |
8 |
206 |
|
|
?? |
8 |
199 |
|
|
?? |
8 |
200 |
|
|
?? |
8 |
201 |
|
|
?? |
8 |
192 |
|
|
?? |
8 |
33 |
|
|
Yes |
1 |
Note that it is the collation ID, not the character set ID, that
is used to identify the unique combination of character set and
collation. Thus, when requesting character set information using
one of the character set functions in
mysys/charset.c, such as
get_charset(), different IDs may return the
same base character set, but a different collation set.
The following functions provide an internal interface to the collation and character set information, enabling you to access the information by name or ID:
static uint get_collation_number_internal(const char *name)
uint get_collation_number(const char *name)
uint get_charset_number(const char *charset_name, uint cs_flags)
const char *get_charset_name(uint charset_number)
static CHARSET_INFO *get_internal_charset(uint cs_number, myf flags)
CHARSET_INFO *get_charset(uint cs_number, myf flags)
CHARSET_INFO *get_charset_by_name(const char *cs_name, myf flags)
CHARSET_INFO *get_charset_by_csname(const char *cs_name,
uint cs_flags,
myf flags)
The table below details the functions, the key argument that is supplied, and the return value.
Function |
Supplied Argument |
Return Value |
|
Collation name |
Collation ID |
|
Collation name |
Collation ID |
|
Character set name |
Collation ID |
|
Collation ID |
Character set name |
|
Collation ID |
Character datatype |
|
Collation ID |
Character datatype |
An example of using the collation/character set functions is
available in the extras/charset2html.c, which
outputs an HTML version of the internal collation set table.
