Definitions of mb_wc (multibyte to wide character, ie., effectively “parse a UTF-8 character”) functions for UTF-8 (both three- and four-byte).
More...
|
template<bool RANGE_CHECK, bool SUPPORT_MB4> |
static int | my_mb_wc_utf8_prototype (my_wc_t *pwc, const uchar *s, const uchar *e) |
|
static int | my_mb_wc_utf8mb3 (my_wc_t *pwc, const uchar *s, const uchar *e) |
| Parses a single UTF-8 character from a byte string. More...
|
|
static int | my_mb_wc_utf8mb4 (my_wc_t *pwc, const uchar *s, const uchar *e) |
| Parses a single UTF-8 character from a byte string. More...
|
|
template<bool RANGE_CHECK, bool SUPPORT_MB4> |
static ALWAYS_INLINE int | my_mb_wc_utf8_prototype (my_wc_t *pwc, const uchar *s, const uchar *e) |
|
int | my_mb_wc_utf8mb3_thunk (const CHARSET_INFO *cs, my_wc_t *pwc, const uchar *s, const uchar *e) |
| A thunk to be able to use my_mb_wc_utf8mb3 in MY_CHARSET_HANDLER structs. More...
|
|
int | my_mb_wc_utf8mb4_thunk (const CHARSET_INFO *cs, my_wc_t *pwc, const uchar *s, const uchar *e) |
| A thunk to be able to use my_mb_wc_utf8mb4 in MY_CHARSET_HANDLER structs. More...
|
|
Definitions of mb_wc (multibyte to wide character, ie., effectively “parse a UTF-8 character”) functions for UTF-8 (both three- and four-byte).
These are available both as inline functions, as C-style thunks so that they can fit into MY_CHARSET_HANDLER, and as functors.
The functors exist so that you can specialize a class on them and get them inlined instead of having to call them through the function pointer in MY_CHARSET_HANDLER; mb_wc is in itself so cheap (the most common case is just a single byte load and a predictable compare) that the call overhead in a tight loop is significant, and these routines tend to take up a lot of CPU time when sorting. Typically, at the outermost level, you'd simply compare cs->cset->mb_wc with my_mb_wc_{utf8mb3,utf8mb4}_thunk, and if so, instantiate your function with the given class. If it doesn't match, you can use Mb_wc_through_function_pointer, which calls through the function pointer as usual. (It will cache the function pointer for you, which is typically faster than looking it up all the time – the compiler cannot always figure out on its own that it doesn't change.)
The Mb_wc_* classes should be sent by value, not by reference, since they are never larger than two pointers (and usually simply zero).