![]() |
MySQL 8.4.4
Source Code Documentation
|
Definitions of mb_wc (multibyte to wide character, ie., effectively “parse a UTF-8 character”) functions for UTF-8 (both three- and four-byte). More...
#include <cstdint>
#include <string.h>
#include "my_compiler.h"
#include "my_config.h"
#include "mysql/strings/m_ctype.h"
Go to the source code of this file.
Classes | |
struct | Mb_wc_utf8mb3 |
Functor that converts a UTF-8 multibyte sequence (up to three bytes) to a wide character. More... | |
struct | Mb_wc_utf8mb4 |
Functor that converts a UTF-8 multibyte sequence (up to four bytes) to a wide character. More... | |
class | Mb_wc_through_function_pointer |
Functor that uses a function pointer to convert a multibyte sequence to a wide character. More... | |
Functions | |
template<bool RANGE_CHECK, bool SUPPORT_MB4> | |
static int | my_mb_wc_utf8_prototype (my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
static int | my_mb_wc_utf8mb3 (my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
Parses a single UTF-8 character from a byte string. More... | |
static int | my_mb_wc_utf8mb4 (my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
Parses a single UTF-8 character from a byte string. More... | |
template<bool RANGE_CHECK, bool SUPPORT_MB4> | |
static ALWAYS_INLINE int | my_mb_wc_utf8_prototype (my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
int | my_mb_wc_utf8mb3_thunk (const CHARSET_INFO *cs, my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
A thunk to be able to use my_mb_wc_utf8mb3 in MY_CHARSET_HANDLER structs. More... | |
int | my_mb_wc_utf8mb4_thunk (const CHARSET_INFO *cs, my_wc_t *pwc, const uint8_t *s, const uint8_t *e) |
A thunk to be able to use my_mb_wc_utf8mb4 in MY_CHARSET_HANDLER structs. More... | |
Definitions of mb_wc (multibyte to wide character, ie., effectively “parse a UTF-8 character”) functions for UTF-8 (both three- and four-byte).
These are available both as inline functions, as C-style thunks so that they can fit into MY_CHARSET_HANDLER, and as functors.
The functors exist so that you can specialize a class on them and get them inlined instead of having to call them through the function pointer in MY_CHARSET_HANDLER; mb_wc is in itself so cheap (the most common case is just a single byte load and a predictable compare) that the call overhead in a tight loop is significant, and these routines tend to take up a lot of CPU time when sorting. Typically, at the outermost level, you'd simply compare cs->cset->mb_wc with my_mb_wc_{utf8mb3,utf8mb4}_thunk, and if so, instantiate your function with the given class. If it doesn't match, you can use Mb_wc_through_function_pointer, which calls through the function pointer as usual. (It will cache the function pointer for you, which is typically faster than looking it up all the time – the compiler cannot always figure out on its own that it doesn't change.)
The Mb_wc_* classes should be sent by value, not by reference, since they are never larger than two pointers (and usually simply zero).
|
static |
|
static |
|
inlinestatic |
Parses a single UTF-8 character from a byte string.
[out] | pwc | the parsed character, if any |
s | the string to read from | |
e | the end of the string; will not read past this |
int my_mb_wc_utf8mb3_thunk | ( | const CHARSET_INFO * | cs, |
my_wc_t * | pwc, | ||
const uint8_t * | s, | ||
const uint8_t * | e | ||
) |
A thunk to be able to use my_mb_wc_utf8mb3 in MY_CHARSET_HANDLER structs.
cs | Unused. |
pwc | [output] The parsed character, if any. |
s | The string to read from. |
e | The end of the string; will not read past this. |
|
static |
Parses a single UTF-8 character from a byte string.
The difference between this and my_mb_wc_utf8mb3 is that this function also can handle four-byte UTF-8 characters.
[out] | pwc | the parsed character, if any |
s | the string to read from | |
e | the end of the string; will not read past this |
int my_mb_wc_utf8mb4_thunk | ( | const CHARSET_INFO * | cs, |
my_wc_t * | pwc, | ||
const uint8_t * | s, | ||
const uint8_t * | e | ||
) |
A thunk to be able to use my_mb_wc_utf8mb4 in MY_CHARSET_HANDLER structs.
cs | Unused. |
pwc | [output] The parsed character, if any. |
s | The string to read from. |
e | The end of the string; will not read past this. |