WL#3759: Optimize identifier conversion in client-server protocol
Affects: Server-5.5
—
Status: Complete
Since 4.1, we use utf8 to store idenfifiers on disk, in memory, for lookups, for comparison, and so on. Move to utf8 was done as a consequence of introducing multiple character set support under the same server, in the same database, in the same table, or even in the same SQL statement - character set of identifiers must be a super set for all supported character sets. Tests with "valgrind --cachegrind" profiler detected some peformance degradation between mysqld versions 4.0 and 4.1. The source of slow down is in latin1->utf8->latin1 identifier conversion. A test client program using latin1 client character set was sending "SELECT a FROM t1", 100000 times, against an empty heap table: CREATE TABLE t1 (a int NOT NULL) TYPE=HEAP; Version 4.1 generated 1,813,494,123 work units on mysqld side. While version 4.0 produced only 1,393,268,776. 4.0 was 25% faster for this kind of client application. There were extra 420,225,347 work units, with most important being: 74,502,664 sql_string.cc:String::copy() 43,901,866 ctype-utf8.c:my_utf8_uni 42,902,112 ctype-latin1.c:my_wc_mb_latin1 34,800,812 protocol.cc:Protocol::store_string_aux 21,600,676 charset.c:my_charset_same 3,000,060 ctype-utf8.c:my_ismbchar_utf8 6,000,000 protocol.cc:Protocol::send_fields This is because of utf8->latin1 conversion is done during Protocol::send_fields(). This is very unpleasant performance degradation, especially for the users who want only a single character set (like in 4.0). The WL#1898 proposes to compile a "light" version mysqld, with a single character set, which will mean that no character set conversion is necessary at all, and performance should return closer towards performance of 4.0. However, even in "full" version, we can improve performance significantly. In many cases "full featured" conversion is not really necessary. For example, the test program was using just pure ASCII identifiers which are compatible between utf8 and latin1. We can optimize the code for the cases like utf8->latin1, and even for some multibyte character sets, for example utf8->gbk.
Typical conversion scenarios and the ways of their optimization =============================================================== 1. Conversion from utf8 to a 8bit character set: quickly copy a sequence of leading 7bit (ASCII) values until the end of the string - then exit, or until a 8bit value met - then switch to loop "get utf8 character -> put 8bit character". 2. Conversion from utf8 to a ASCII-based multi-byte character set (i.e. with mbminlen=1): quickly copy a sequence of 7bit (ASCII) values until the end of the string - then exit, or until a 8bit value met - then switch to loop "get utf8 character -> put multi-byte character" 3. Conversion from utf8 to a non-ascii-based multi-byte character set (e.g. with mbminlen>1 like in ucs2): It will use traditional (non-optimized) loop: "get utf8 character -> put multi-byte character". Other optimization ideas ======================== - mb_wc_quick(): A new function into MY_CHARSET_HANDLER can be added. It will work almost like mb_wc, but won't check if the destination string has enough space, assuming that the caller allocated enough space for the destination string before calling conversion routines. It will be faster than mb_wc. This is to optimize conversion of non-ASCII characters for identifiers. - Cache data in THD: Some data can be cached in THD structure whenever thd->variables.character_set_results is changed. The cached data can contain pointers to functions, for example, identifier_to_client_converter(), or some flags, for example; "bool quickly_do_ascii_characters_for_identifiers" This needs further investigations. - Four bytes at once: Copying of "leading pure ASCII part" can be implementing using "copy four bytes at once" approach. This is possible on i386 platforms, because i386 processor allows to cast non-aligned data as 32-bit integer. This needs further investigations, because it may slow down conversion of short identifiers with length 1 to 3 bytes. - Assembler: Copying of "leading pure ASCII part" can be written in assembler. at least for i386.
1. A new method for the "Protocol" class will be added: bool Protocol::net_store_data(const char *from, uint length, CHARSET_INFO *from_cs, CHARSET_INFO *to_cs) Storing text with character set conversion. It will quickly copy ASCII-compatible leading characters, as described in scenario N1 and N2. 2. Protocol::store_sting_aux() will be extended: 2a. It will detect the cases when fast conversion is possible and call the new method. 2b. If quick conversion is not possible, it will use the old slow method using "convert" as an intermediate storage for the converted data: { uint dummy_errors; return convert->copy(from, length, fromcs, tocs, &dummy_errors) || net_store_data(convert->ptr(), convert->length()); }
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.