This section discusses the procedure for adding a character set to MySQL. The proper procedure depends on whether the character set is simple or complex:
If the character set does not need special string collating routines for sorting and does not need multibyte character support, it is simple.
If the character set needs either of those features, it is complex.
are simple character sets, whereas
czech are complex character sets.
To use the following instructions, you must have a MySQL source
distribution. In the instructions,
MYSET represents the name of the
character set that you want to add.
<charset> element for
MYSET to the
sql/share/charsets/Index.xml file. Use
the existing contents in the file as a guide to adding new
contents. A partial listing for the
<charset> element follows:
<charset name="latin1"> <family>Western</family> <description>cp1252 West European</description> ... <collation name="latin1_swedish_ci" id="8" order="Finnish, Swedish"> <flag>primary</flag> <flag>compiled</flag> </collation> <collation name="latin1_danish_ci" id="15" order="Danish"/> ... <collation name="latin1_bin" id="47" order="Binary"> <flag>binary</flag> <flag>compiled</flag> </collation> ... </charset>
<charset> element must list all
the collations for the character set. These must include at
least a binary collation and a default (primary) collation.
The default collation is often named using a suffix of
general_ci (general, case insensitive). It
is possible for the binary collation to be the default
collation, but usually they are different. The default
collation should have a
primary flag. The
binary collation should have a
You must assign a unique ID number to each collation, chosen from the range 1 to 254. To find the maximum of the currently used collation IDs, use this query:
SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multibyte functions, or both.
For a simple character set, create a configuration file,
that describes the character set properties. Create this file
sql/share/charsets directory. You
can use a copy of
latin1.xml as the basis
for this file. The syntax for the file is very simple:
Comments are written as ordinary XML comments
<map> array elements
are separated by arbitrary amounts of whitespace.
Each word within
elements must be a number in hexadecimal format.
<map> array element for the
<ctype> element has 257 words.
<map> array elements
after that have 256 words. See
Section 10.3.1, “Character Definition Arrays”.
For each collation listed in the
<charset> element for the
character set in
must contain a
element that defines the character ordering.
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
Create the file
strings directory. Look at one
of the existing
ctype-*.c files (such
ctype-big5.c) to see what needs to
be defined. The arrays in your file must have names like
and so on. These correspond to the arrays for a simple
character set. See Section 10.3.1, “Character Definition Arrays”.
listed in the
for the character set in
file must provide an implementation of the collation.
If the character set requires string collating functions, see Section 10.3.2, “String Collating Support for Complex Character Sets”.
If the character set requires multibyte character support, see Section 10.3.3, “Multi-Byte Character Support for Complex Character Sets”.
Modify the configuration information. Use the existing
configuration information as a guide to adding information for
MYSYS. The example here assumes
that the character set has default and binary collations, but
more lines are needed if
“register” the collations for the new
Add these lines to the “declaration” section:
MYSETextern CHARSET_INFO my_charset_
MYSET_general_ci; extern CHARSET_INFO my_charset_
Add these lines to the “registration” section:
If the character set uses
strings/Makefile.am and add
to each definition of the
variable, and to the
If the character set uses
libmysql/Makefile.shared and add
MYSET to one of the
in alphabetic order.
CHARSETS_COMPLEX. This is needed
even for simple character sets, or
configure will not recognize
MYSET to the first
case control structure. Omit the
USE_MB_IDENT lines for 8-bit
MYSET, 1, [Define to enable charset
MYSET]) AC_DEFINE([USE_MB], 1, [Use multi-byte character routines]) AC_DEFINE(USE_MB_IDENT, 1) ;;
MYSET to the second
case control structure:
Reconfigure, recompile, and test.