Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:
The server character set and collation can be determined
from the values of the
character_set_server and
collation_server system variables.
The character set and collation of the default database can
be determined from the values of the
character_set_database and
collation_database system variables.
Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has connection-related character set and collation system variables.
Consider what a “connection” is: It's what you make when you connect to the server. The client sends SQL statements, such as queries, over the connection to the server. The server sends responses, such as result sets, over the connection back to the client. This leads to several questions about character set and collation handling for client connections, each of which can be answered in terms of system variables:
What character set is the statement in when it leaves the client?
The server takes the character_set_client
system variable to be the character set in which statements
are sent by the client.
What character set should the server translate a statement to after receiving it?
For this, the server uses the
character_set_connection and
collation_connection system variables. It
converts statements sent by the client from
character_set_client to
character_set_connection (except for
string literals that have an introducer such as
_latin1 or _utf8).
collation_connection is important for
comparisons of literal strings. For comparisons of strings
with column values, collation_connection
does not matter because columns have their own collation,
which has a higher collation precedence.
What character set should the server translate to before shipping result sets or error messages back to the client?
The character_set_results system variable
indicates the character set in which the server returns
query results to the client. This includes result data such
as column values, and result metadata such as column names.
You can fine-tune the settings for these variables, or you can depend on the defaults (in which case, you can skip the rest of this section).
There are two statements that affect the connection character sets:
SET NAMES 'charset_name' SET CHARACTER SETcharset_name
SET NAMES indicates what character set the
client will use to send SQL statements to the server. Thus,
SET NAMES 'cp1251' tells the server
“future incoming messages from this client are in
character set cp1251.” It also
specifies the character set that the server should use for
sending results back to the client. (For example, it indicates
what character set to use for column values if you use a
SELECT statement.)
A SET NAMES '
statement is equivalent to these three statements:
x'
SET character_set_client =x; SET character_set_results =x; SET character_set_connection =x;
Setting character_set_connection to
x also sets
collation_connection to the default collation
for x. It is not necessary to set
that collation explicitly. To specify a particular collation for
the character sets, use the optional COLLATE
clause:
SET NAMES 'charset_name' COLLATE 'collation_name'
SET CHARACTER SET is similar to SET
NAMES but sets
character_set_connection and
collation_connection to
character_set_database and
collation_database. A SET CHARACTER
SET statement is
equivalent to these three statements:
x
SET character_set_client =x; SET character_set_results =x; SET collation_connection = @@collation_database;
Setting collation_connection also sets
character_set_connection to the character set
associated with the collation (equivalent to executing
SET character_set_connection =
@@character_set_database). It is not necessary to set
character_set_connection explicitly.
When a client connects, it sends to the server the name of the
character set that it wants to use. The server uses the name to
set the character_set_client,
character_set_results, and
character_set_connection system variables. In
effect, the server performs a SET NAMES
operation using the character set name.
With the mysql client, it is not necessary to
execute SET NAMES every time you start up if
you want to use a character set different from the default. You
can add the --default-character-set option
setting to your mysql statement line, or in
your option file. For example, the following option file setting
changes the three character set variables set to
koi8r each time you invoke
mysql:
[mysql] default-character-set=koi8r
If you are using the mysql client with
auto-reconnect enabled (which is not recommended), it is
preferable to use the charset command rather
than SET NAMES. For example:
mysql> charset utf8
Charset changed
The charset command issues a SET
NAMES statement, and also changes the default
character set that is used if mysql
reconnects after the connection has dropped.
Example: Suppose that column1 is defined as
CHAR(5) CHARACTER SET latin2. If you do not
say SET NAMES or SET CHARACTER
SET, then for SELECT column1 FROM
t, the server sends back all the values for
column1 using the character set that the
client specified when it connected. On the other hand, if you
say SET NAMES 'latin1' or SET
CHARACTER SET latin1 before issuing the
SELECT statement, the server converts the
latin2 values to latin1
just before sending results back. Conversion may be lossy if
there are characters that are not in both character sets.
If you do not want the server to perform any conversion of
result sets, set character_set_results to
NULL:
SET character_set_results = NULL;
ucs2 cannot be used as a client character
set, which means that it does not work for SET
NAMES or SET CHARACTER SET.
To see the values of the character set and collation system variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%'; SHOW VARIABLES LIKE 'collation%';
You must also consider the environment within which your MySQL application executes. For example, if you will send statements using UTF-8 text taken from a file that you create in an editor, you should edit the file with the locale of your environment set to UTF-8 so that the file's encoding is correct and so that the operating system handles it correctly. For a script that executes in a Web environment, the script must handle the character encoding properly for its interaction with the MySQL server, and it must generate pages that correctly indicate the encoding so that browsers know now to display the content of the pages.

User Comments
This example are usable for russian users who want to have windows-1251 encoding on the site and koi8-r encoding into the database:
set CHARACTER SET cp1251_koi8
If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that the _connection_ character set is still standard latin1.
To change the connection charset permanently to UTF-8, add the following line in the [mysqld] section:
[mysqld]
init-connect='SET NAMES utf8'
The other way to let MySQL know what connection charset you intend to use is per-connection based. After a connection is established (with host, name, password), add the following two lines in your application:
SET NAMES utf8;
SET CHARACTER_SET utf8;
The last hint is given most of the time, but not everybody is happy to change every application (esp. when some lazy add-on and extension programmers use their own connection stuff instead of the (PHP) application.
If you are wondering why -despite all UTF8 settings- you still don't get non-ASCII characters right, it might be the case that:
1. you are using mysqlimport to insert data from files with utf8 characters;
AND
2. you have created a Database with character set latin1 (this is the default!) and not with character set utf8.
3. You have created a Table with character set utf8.
Even if you use the --default-character-set=utf8 option for mysqlimport this doesn't work! Mysqlimport is only using the value of 'character_set_database' as character set and in this case it is 'latin1'.
PHP/mysql connections seem to be made by default in latin1, so if you are experiencing problems with characters just set everything you can to UTF-8 (collations, charsets, html page encoding) and just after connecting to da database send the following sql query to Mysql:
SET NAMES 'utf8'
This should be done once every time you connect to mysql.
This example shows how to configure mysqld server to use pure utf8 for server's character set and collation instead of default latin1. This will help to correctly store non-latin character data in db (cyrillic сharacters for example and russian language in particularly).
Go to the [mysqld] section in my.cnf and add two strings:
collation_server=utf8_unicode_ci
character_set_server=utf8
You can also add
skip-character-set-client-handshake
to enforce using of utf8 encoding in db.
Recently I have encountered the same problems mentioned in comments regarding UTF-8 collation and proper appearance of symbols in a webpage (specifically in Georgian and Russian). The very key to my troubles turned up the following: regardless all the efforts to set some default parameters, you still need to send the query SET NAMES 'utf-8' to the server every time after you select a database and before you select from the table:
mysql_query("SET NAMES 'utf-8'");
The thing is that you set character_set_client and character_set_results to utf-8 by executing this query. In addition, you have to deal with some little tasks of defining utf-8 collations and charsets inside your database that are clearly explained in comments above.
If you think "set names utf8" for each connection is too trouble, you can modify my.cnf to solve the problem forever. In my.cnf, add the line "default-character-set=utf8" in both [mysqld] and [client] sections:
[client]
default-character-set=utf8
[mysqld]
default-character-set=utf8
The mysql will use utf8 after you restart it.
I tried all the above hints but still couldn´t get it to work (using mySql and PHP 5 server)... ´till I found and tried also this:
add to the .htaccess file this single line:
AddDefaultCharset UTF-8
Still not sure why this was important to the whole, but now it works as planned! Thanks!
I've spent quite a lot of time trying to make MySQL 4.1 and PHP working with my tables in cp1251 charset.
Adding on my local PC the following line to the my.ini solved the problem but unfortunately I can't do the same on my hosting provider space
default-character-set=cp1251
So after trying quite a lot of combinations finally I've found that I need to add only one query set just after connecting to the database in my php script
mysql_query ('SET NAMES CP1251');
Please note if you run another query like
mysql_query ('SET CHARACTER SET CP1251');
just after first one it will reset character_set_connection option and all your symbols most likely will be converted to the question marks as it happens for me many times before.
For Web applications, don't forget about HTTP and some HTML tags. <FORM> tag and <META> have encoding values, additionally to HTTP protocol.
Avoid unnecessary conversions, use the same/compatible character sets.
"Reversed Question Mark" is a special Unicode symbol for codepoints not having representation in a specific encoding scheme.
HTTP Response Header:
Content-Type: text/html; charset=windows-1251
Usually (for most browsers and search engines) HTTP is enough and you don't need additional HTML:
<meta http-equiv="Content-Language" content="ru">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
I am still using ascii for some fields (UTF-8 at MySQL requires 3 bytes!?)...
I did all the suggestions above (editing my.cnf, creating database and tables with utf8 and SET NAMES for every session from PHP) but still I was not able to get Japanese characters to work.
I solved the problem by opening my all html and php files and saving them as utf8.
So please remember you must have application files in utf8 as well.
I hope this helps to someone.
I managed to fix my problem with french accents on
http://www.millionnairedeluxe.com
by choosing utf8_bin as MySQL connection collation and MySQL charset: UTF-8 Unicode (utf8)
to import my database to dreamhost.
But I don't know why utf8_bin works but utf8_unicode_ci and utf8_general_ci didn't work ?!
Hope it can help.
In case you would store UTF-8 characters in a non UTF-8 supporting database (< MySQL 4.1.2), you would have to increase for example your varchar(20) field by the factor 3.
UTF-8 characters can take up to 3 bytes in space and with some special characters up to 4 bytes.
Note that this does not apply to a database which supports UTF-8.
When you change the charset of a table, every column is explicitly set to take the *old* charset - even if the columns in your tables all took the default charset! You will need to issue ALTER TABLE statements for every column, or dump the database to file and replace all the character set references. I did the latter with three text replacements:
Change table-specific charsets
Replace: " DEFAULT CHARSET=latin1"
With: " DEFAULT CHARSET=utf8"
Remove any column-specific charsets
Delete all occurrences of: " character set latin1"
Change all char's to varchar's because char has to store 3 bytes/character in utf8 while varchar can store 1 for short characters
Replace: " char("
With: " varchar("
You might want to search for "latin1" and see if anything else needs to be changed. Then follow the instructions above and everything will be fine. Hopefully this will save someone else a day's worth of trouble.
Hello,
in my case I used mysql_query("SET NAMES `utf8` COLLATE `utf8_general_ci`") and went well =)
best regards,
Thiago
If you don't want to change all your tables and their fields collation manualy you can use some code like this, it worked well for me. don't forget to change your charset/collation everywhere in the code:
<?php
$db = mysqli_connect('localhost','username','password', 'db');
if(mysqli_connect_errno())
{
echo "Cannot connect to the database - incorrect details";
}
$sql = 'SHOW TABLES';
if ( !( $result = $db->query( $sql ) ) ) {
echo '<span style="color: red;">Get SHOW TABLE - SQL Error: <br>' . "</span>\n";
}
while ( $tables = $result->fetch_row() ) {
echo $tables[0];
# Loop through all tables in this database
$table = $tables[key($tables)];
if ( !( $result2 = $db->query("ALTER TABLE ".$table." COLLATE utf8_general_ci") ) ) {
echo '<span style="color: red;">UTF SET - SQL Error: <br>' . "</span>\n";
break;
}
print "$table changed to UTF-8 successfully.<br>\n";
# Now loop through all the fields within this table
if ( !($result2 = $db->query("SHOW COLUMNS FROM ".$table) ) ) {
echo '<span style="color: red;">Get Table Columns Query - SQL Error: <br>' . "</span>\n";
break;
}
while ( $column = $result2->fetch_assoc() )
{
$field_name = $column['Field'];
$field_type = $column['Type'];
# Change text based fields
$skipped_field_types = array('char', 'text', 'enum', 'set');
foreach ( $skipped_field_types as $type )
{
if ( strpos($field_type, $type) !== false )
{
$sql4 = "ALTER TABLE $table CHANGE `$field_name` `$field_name` $field_type CHARACTER SET utf8 COLLATE utf8_general_ci";
$result4 = $db->query($sql4);
echo "---- $field_name changed to UTF-8 successfully.<br>\n";
}
}
}
echo "<hr>\n";
}
$result->free();
$result2->free();
$result4->free();
?>
In my process of adjusting the charset variables to correctly display the Chinese characters,I found client and server not only interact with each other, but also interact with windows OS,although I don't know how and when. When I use java application to write Chinese characters in utf8 charset to the database, even if I set all charset variables to utf8, the characters can not display correctly. Only after I set some charset to gb2312, the standard national Chinese charset, the display is right. I guess that is because the default charset of Windows OS in Chinese version is gb2312, and in between the interaction of client and server, Windows also make some charset translations in the process.
Thus when you still fail to display the correct characters after trying various combinations of charset variables, try the code below:
SET NAMES the_default_charset_of_your_local_OS
Hope this can do some help!
I have prepared a little hand-drawn diagram on this (while debugging problems with xplanner) which might be of some use. It's been scanned into a 850KB PDF:
http://public.m-plify.net/mysql/MySQL_Charset_Handling.pdf
Add your own comment.