MySQL 8.0.40
Source Code Documentation
|
This class represents the character input stream consumed during lexical analysis. More...
#include <sql_lexer_input_stream.h>
Public Member Functions | |
Lex_input_stream (uint grammar_selector_token_arg) | |
Constructor. More... | |
bool | init (THD *thd, const char *buff, size_t length) |
Object initializer. More... | |
void | reset (const char *buff, size_t length) |
Prepare Lex_input_stream instance state for use for handling next SQL statement. More... | |
void | set_echo (bool echo) |
Set the echo mode. More... | |
void | save_in_comment_state () |
void | restore_in_comment_state () |
void | skip_binary (int n) |
Skip binary from the input stream. More... | |
unsigned char | yyGet () |
Get a character, and advance in the stream. More... | |
unsigned char | yyGetLast () const |
Get the last character accepted. More... | |
unsigned char | yyPeek () const |
Look at the next character to parse, but do not accept it. More... | |
unsigned char | yyPeekn (int n) const |
Look ahead at some character to parse. More... | |
void | yyUnget () |
Cancel the effect of the last yyGet() or yySkip(). More... | |
void | yySkip () |
Accept a character, by advancing the input stream. More... | |
void | yySkipn (int n) |
Accept multiple characters at once. More... | |
char * | yyUnput (char ch) |
Puts a character back into the stream, canceling the effect of the last yyGet() or yySkip(). More... | |
char * | cpp_inject (char ch) |
Inject a character into the pre-processed stream. More... | |
bool | eof () const |
End of file indicator for the query text to parse. More... | |
bool | eof (int n) const |
End of file indicator for the query text to parse. More... | |
const char * | get_buf () const |
Get the raw query buffer. More... | |
const char * | get_cpp_buf () const |
Get the pre-processed query buffer. More... | |
const char * | get_end_of_query () const |
Get the end of the raw query buffer. More... | |
void | start_token () |
Mark the stream position as the start of a new token. More... | |
void | restart_token () |
Adjust the starting position of the current token. More... | |
const char * | get_tok_start () const |
Get the token start position, in the raw buffer. More... | |
const char * | get_cpp_tok_start () const |
Get the token start position, in the pre-processed buffer. More... | |
const char * | get_tok_end () const |
Get the token end position, in the raw buffer. More... | |
const char * | get_cpp_tok_end () const |
Get the token end position, in the pre-processed buffer. More... | |
const char * | get_ptr () const |
Get the current stream pointer, in the raw buffer. More... | |
const char * | get_cpp_ptr () const |
Get the current stream pointer, in the pre-processed buffer. More... | |
uint | yyLength () const |
Get the length of the current token, in the raw buffer. More... | |
const char * | get_body_utf8_str () const |
Get the utf8-body string. More... | |
uint | get_body_utf8_length () const |
Get the utf8-body length. More... | |
void | body_utf8_start (THD *thd, const char *begin_ptr) |
The operation is called from the parser in order to 1) designate the intention to have utf8 body; 1) Indicate to the lexer that we will need a utf8 representation of this statement; 2) Determine the beginning of the body. More... | |
void | body_utf8_append (const char *ptr) |
The operation appends unprocessed part of the pre-processed buffer till the given pointer (ptr) and sets m_cpp_utf8_processed_ptr to ptr. More... | |
void | body_utf8_append (const char *ptr, const char *end_ptr) |
The operation appends unprocessed part of pre-processed buffer till the given pointer (ptr) and sets m_cpp_utf8_processed_ptr to end_ptr. More... | |
void | body_utf8_append_literal (THD *thd, const LEX_STRING *txt, const CHARSET_INFO *txt_cs, const char *end_ptr) |
The operation converts the specified text literal to the utf8 and appends the result to the utf8-body. More... | |
uint | get_lineno (const char *raw_ptr) const |
void | add_digest_token (uint token, Lexer_yystype *yylval) |
void | reduce_digest_token (uint token_left, uint token_right) |
bool | is_partial_parser () const |
True if this scanner tokenizes a partial query (partition expression, generated column expression etc.) More... | |
void | warn_on_deprecated_charset (const CHARSET_INFO *cs, const char *alias) const |
Outputs warnings on deprecated charsets in complete SQL statements. More... | |
void | warn_on_deprecated_collation (const CHARSET_INFO *collation) const |
Outputs warnings on deprecated collations in complete SQL statements. More... | |
bool | text_string_is_7bit () const |
Lex_input_stream (uint grammar_selector_token_arg) | |
Constructor. More... | |
bool | init (THD *thd, const char *buff, size_t length) |
Object initializer. More... | |
void | reset (const char *buff, size_t length) |
void | set_echo (bool echo) |
Set the echo mode. More... | |
void | save_in_comment_state () |
void | restore_in_comment_state () |
void | skip_binary (int n) |
Skip binary from the input stream. More... | |
unsigned char | yyGet () |
Get a character, and advance in the stream. More... | |
unsigned char | yyGetLast () const |
Get the last character accepted. More... | |
unsigned char | yyPeek () const |
Look at the next character to parse, but do not accept it. More... | |
unsigned char | yyPeekn (int n) const |
Look ahead at some character to parse. More... | |
void | yyUnget () |
Cancel the effect of the last yyGet() or yySkip(). More... | |
void | yySkip () |
Accept a character, by advancing the input stream. More... | |
void | yySkipn (int n) |
Accept multiple characters at once. More... | |
char * | yyUnput (char ch) |
Puts a character back into the stream, canceling the effect of the last yyGet() or yySkip(). More... | |
char * | cpp_inject (char ch) |
Inject a character into the pre-processed stream. More... | |
bool | eof () const |
End of file indicator for the query text to parse. More... | |
bool | eof (int n) const |
End of file indicator for the query text to parse. More... | |
const char * | get_buf () const |
Get the raw query buffer. More... | |
const char * | get_cpp_buf () const |
Get the pre-processed query buffer. More... | |
const char * | get_end_of_query () const |
Get the end of the raw query buffer. More... | |
void | start_token () |
Mark the stream position as the start of a new token. More... | |
void | restart_token () |
Adjust the starting position of the current token. More... | |
const char * | get_tok_start () const |
Get the token start position, in the raw buffer. More... | |
const char * | get_cpp_tok_start () const |
Get the token start position, in the pre-processed buffer. More... | |
const char * | get_tok_end () const |
Get the token end position, in the raw buffer. More... | |
const char * | get_cpp_tok_end () const |
Get the token end position, in the pre-processed buffer. More... | |
const char * | get_ptr () const |
Get the current stream pointer, in the raw buffer. More... | |
const char * | get_cpp_ptr () const |
Get the current stream pointer, in the pre-processed buffer. More... | |
uint | yyLength () const |
Get the length of the current token, in the raw buffer. More... | |
const char * | get_body_utf8_str () const |
Get the utf8-body string. More... | |
uint | get_body_utf8_length () const |
Get the utf8-body length. More... | |
void | body_utf8_start (THD *thd, const char *begin_ptr) |
void | body_utf8_append (const char *ptr) |
void | body_utf8_append (const char *ptr, const char *end_ptr) |
void | body_utf8_append_literal (THD *thd, const LEX_STRING *txt, const CHARSET_INFO *txt_cs, const char *end_ptr) |
uint | get_lineno (const char *raw_ptr) const |
void | add_digest_token (uint token, Lexer_yystype *yylval) |
void | reduce_digest_token (uint token_left, uint token_right) |
bool | is_partial_parser () const |
True if this scanner tokenizes a partial query (partition expression, generated column expression etc.) More... | |
void | warn_on_deprecated_charset (const CHARSET_INFO *cs, const char *alias) const |
Outputs warnings on deprecated charsets in complete SQL statements. More... | |
void | warn_on_deprecated_collation (const CHARSET_INFO *collation) const |
Outputs warnings on deprecated collations in complete SQL statements. More... | |
bool | text_string_is_7bit () const |
Public Attributes | |
THD * | m_thd |
Current thread. More... | |
uint | yylineno |
Current line number. More... | |
uint | yytoklen |
Length of the last token parsed. More... | |
Lexer_yystype * | yylval |
Interface with bison, value of the last token parsed. More... | |
int | lookahead_token |
LALR(2) resolution, look ahead token. More... | |
Lexer_yystype * | lookahead_yylval |
LALR(2) resolution, value of the look ahead token. More... | |
bool | skip_digest |
Skip adding of the current token's digest since it is already added. More... | |
const CHARSET_INFO * | query_charset |
enum my_lex_states | next_state |
Current state of the lexical analyser. More... | |
const char * | found_semicolon |
Position of ';' in the stream, to delimit multiple queries. More... | |
uchar | tok_bitmap |
Token character bitmaps, to detect 7bit strings. More... | |
bool | ignore_space |
SQL_MODE = IGNORE_SPACE. More... | |
bool | stmt_prepare_mode |
true if we're parsing a prepared statement: in this mode we should allow placeholders. More... | |
bool | multi_statements |
true if we should allow multi-statements. More... | |
enum_comment_state | in_comment |
State of the lexical analyser for comments. More... | |
enum_comment_state | in_comment_saved |
const char * | m_cpp_text_start |
Starting position of the TEXT_STRING or IDENT in the pre-processed buffer. More... | |
const char * | m_cpp_text_end |
Ending position of the TEXT_STRING or IDENT in the pre-processed buffer. More... | |
const CHARSET_INFO * | m_underscore_cs |
Character set specified by the character-set-introducer. More... | |
sql_digest_state * | m_digest {nullptr} |
Current statement digest instrumentation. More... | |
const int | grammar_selector_token |
The synthetic 1st token to prepend token stream with. More... | |
Private Attributes | |
char * | m_ptr |
Pointer to the current position in the raw input stream. More... | |
const char * | m_tok_start |
Starting position of the last token parsed, in the raw buffer. More... | |
const char * | m_tok_end |
Ending position of the previous token parsed, in the raw buffer. More... | |
const char * | m_end_of_query |
End of the query text in the input stream, in the raw buffer. More... | |
const char * | m_buf |
Begining of the query text in the input stream, in the raw buffer. More... | |
size_t | m_buf_length |
Length of the raw buffer. More... | |
bool | m_echo |
Echo the parsed stream to the pre-processed buffer. More... | |
bool | m_echo_saved |
char * | m_cpp_buf |
Pre-processed buffer. More... | |
char * | m_cpp_ptr |
Pointer to the current position in the pre-processed input stream. More... | |
const char * | m_cpp_tok_start |
Starting position of the last token parsed, in the pre-processed buffer. More... | |
const char * | m_cpp_tok_end |
Ending position of the previous token parsed, in the pre-processed buffer. More... | |
char * | m_body_utf8 |
UTF8-body buffer created during parsing. More... | |
char * | m_body_utf8_ptr |
Pointer to the current position in the UTF8-body buffer. More... | |
const char * | m_cpp_utf8_processed_ptr |
Position in the pre-processed buffer. More... | |
This class represents the character input stream consumed during lexical analysis.
In addition to consuming the input stream, this class performs some comment pre processing, by filtering out out-of-bound special text from the query input stream.
Two buffers, with pointers inside each, are maintained in parallel. The 'raw' buffer is the original query text, which may contain out-of-bound comments. The 'cpp' (for comments pre processor) is the pre-processed buffer that contains only the query text that should be seen once out-of-bound data is removed.
|
inlineexplicit |
Constructor.
grammar_selector_token_arg | See grammar_selector_token. |
|
inlineexplicit |
Constructor.
grammar_selector_token_arg | See grammar_selector_token. |
void Lex_input_stream::add_digest_token | ( | uint | token, |
Lexer_yystype * | yylval | ||
) |
void Lex_input_stream::add_digest_token | ( | uint | token, |
Lexer_yystype * | yylval | ||
) |
void Lex_input_stream::body_utf8_append | ( | const char * | ptr | ) |
The operation appends unprocessed part of the pre-processed buffer till the given pointer (ptr) and sets m_cpp_utf8_processed_ptr to ptr.
ptr | Pointer in the pre-processed buffer, which specifies the end of the chunk, which should be appended to the utf8 body. |
void Lex_input_stream::body_utf8_append | ( | const char * | ptr | ) |
void Lex_input_stream::body_utf8_append | ( | const char * | ptr, |
const char * | end_ptr | ||
) |
The operation appends unprocessed part of pre-processed buffer till the given pointer (ptr) and sets m_cpp_utf8_processed_ptr to end_ptr.
The idea is that some tokens in the pre-processed buffer (like character set introducers) should be skipped.
Example: CPP buffer: SELECT 'str1', _latin1 'str2'; m_cpp_utf8_processed_ptr – points at the "SELECT ..."; In order to skip "_latin1", the following call should be made: body_utf8_append(<pointer to "_latin1 ...">, <pointer to " 'str2'...">)
ptr | Pointer in the pre-processed buffer, which specifies the end of the chunk, which should be appended to the utf8 body. |
end_ptr | Pointer in the pre-processed buffer, to which m_cpp_utf8_processed_ptr will be set in the end of the operation. |
void Lex_input_stream::body_utf8_append | ( | const char * | ptr, |
const char * | end_ptr | ||
) |
void Lex_input_stream::body_utf8_append_literal | ( | THD * | thd, |
const LEX_STRING * | txt, | ||
const CHARSET_INFO * | txt_cs, | ||
const char * | end_ptr | ||
) |
The operation converts the specified text literal to the utf8 and appends the result to the utf8-body.
thd | Thread context. |
txt | Text literal. |
txt_cs | Character set of the text literal. |
end_ptr | Pointer in the pre-processed buffer, to which m_cpp_utf8_processed_ptr will be set in the end of the operation. |
void Lex_input_stream::body_utf8_append_literal | ( | THD * | thd, |
const LEX_STRING * | txt, | ||
const CHARSET_INFO * | txt_cs, | ||
const char * | end_ptr | ||
) |
void Lex_input_stream::body_utf8_start | ( | THD * | thd, |
const char * | begin_ptr | ||
) |
The operation is called from the parser in order to 1) designate the intention to have utf8 body; 1) Indicate to the lexer that we will need a utf8 representation of this statement; 2) Determine the beginning of the body.
thd | Thread context. |
begin_ptr | Pointer to the start of the body in the pre-processed buffer. |
void Lex_input_stream::body_utf8_start | ( | THD * | thd, |
const char * | begin_ptr | ||
) |
|
inline |
Inject a character into the pre-processed stream.
Note, this function is used to inject a space instead of multi-character C-comment. Thus there is no boundary checks here (basically, we replace N-chars by 1-char here).
|
inline |
Inject a character into the pre-processed stream.
Note, this function is used to inject a space instead of multi-character C-comment. Thus there is no boundary checks here (basically, we replace N-chars by 1-char here).
|
inline |
End of file indicator for the query text to parse.
|
inline |
End of file indicator for the query text to parse.
|
inline |
End of file indicator for the query text to parse.
n | number of characters expected |
|
inline |
End of file indicator for the query text to parse.
n | number of characters expected |
|
inline |
Get the utf8-body length.
|
inline |
Get the utf8-body length.
|
inline |
Get the utf8-body string.
|
inline |
Get the utf8-body string.
|
inline |
Get the raw query buffer.
|
inline |
Get the raw query buffer.
|
inline |
Get the pre-processed query buffer.
|
inline |
Get the pre-processed query buffer.
|
inline |
Get the current stream pointer, in the pre-processed buffer.
|
inline |
Get the current stream pointer, in the pre-processed buffer.
|
inline |
Get the token end position, in the pre-processed buffer.
|
inline |
Get the token end position, in the pre-processed buffer.
|
inline |
Get the token start position, in the pre-processed buffer.
|
inline |
Get the token start position, in the pre-processed buffer.
|
inline |
Get the end of the raw query buffer.
|
inline |
Get the end of the raw query buffer.
uint Lex_input_stream::get_lineno | ( | const char * | raw_ptr | ) | const |
uint Lex_input_stream::get_lineno | ( | const char * | raw_ptr | ) | const |
|
inline |
Get the current stream pointer, in the raw buffer.
|
inline |
Get the current stream pointer, in the raw buffer.
|
inline |
Get the token end position, in the raw buffer.
|
inline |
Get the token end position, in the raw buffer.
|
inline |
Get the token start position, in the raw buffer.
|
inline |
Get the token start position, in the raw buffer.
bool Lex_input_stream::init | ( | THD * | thd, |
const char * | buff, | ||
size_t | length | ||
) |
Object initializer.
Perform initialization of Lex_input_stream instance.
Must be called before usage.
false | OK |
true | Error |
Basically, a buffer for a pre-processed query. This buffer should be large enough to keep a multi-statement query. The allocation is done once in Lex_input_stream::init() in order to prevent memory pollution when the server is processing large multi-statement queries.
bool Lex_input_stream::init | ( | THD * | thd, |
const char * | buff, | ||
size_t | length | ||
) |
Object initializer.
Must be called before usage.
false | OK |
true | Error |
|
inline |
True if this scanner tokenizes a partial query (partition expression, generated column expression etc.)
|
inline |
True if this scanner tokenizes a partial query (partition expression, generated column expression etc.)
void Lex_input_stream::reset | ( | const char * | buffer, |
size_t | length | ||
) |
Prepare Lex_input_stream instance state for use for handling next SQL statement.
It should be called between two statements in a multi-statement query. The operation resets the input stream to the beginning-of-parse state, but does not reallocate m_cpp_buf.
void Lex_input_stream::reset | ( | const char * | buff, |
size_t | length | ||
) |
|
inline |
Adjust the starting position of the current token.
This is used to compensate for starting whitespace.
|
inline |
Adjust the starting position of the current token.
This is used to compensate for starting whitespace.
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Set the echo mode.
When echo is true, characters parsed from the raw input stream are preserved. When false, characters parsed are silently ignored.
echo | the echo mode. |
|
inline |
Set the echo mode.
When echo is true, characters parsed from the raw input stream are preserved. When false, characters parsed are silently ignored.
echo | the echo mode. |
|
inline |
Skip binary from the input stream.
n | number of bytes to accept. |
|
inline |
Skip binary from the input stream.
n | number of bytes to accept. |
|
inline |
Mark the stream position as the start of a new token.
|
inline |
Mark the stream position as the start of a new token.
|
inline |
|
inline |
|
inline |
Outputs warnings on deprecated charsets in complete SQL statements.
[in] | cs | The character set/collation to check for a deprecation. |
[in] | alias | The name/alias of cs . |
|
inline |
Outputs warnings on deprecated charsets in complete SQL statements.
[in] | cs | The character set/collation to check for a deprecation. |
[in] | alias | The name/alias of cs . |
|
inline |
Outputs warnings on deprecated collations in complete SQL statements.
[in] | collation | The collation to check for a deprecation. |
|
inline |
Outputs warnings on deprecated collations in complete SQL statements.
[in] | collation | The collation to check for a deprecation. |
|
inline |
Get a character, and advance in the stream.
|
inline |
Get a character, and advance in the stream.
|
inline |
Get the last character accepted.
|
inline |
Get the last character accepted.
|
inline |
Get the length of the current token, in the raw buffer.
|
inline |
Get the length of the current token, in the raw buffer.
|
inline |
Look at the next character to parse, but do not accept it.
|
inline |
Look at the next character to parse, but do not accept it.
|
inline |
Look ahead at some character to parse.
n | offset of the character to look up |
|
inline |
Look ahead at some character to parse.
n | offset of the character to look up |
|
inline |
Accept a character, by advancing the input stream.
|
inline |
Accept a character, by advancing the input stream.
|
inline |
Accept multiple characters at once.
n | the number of characters to accept. |
|
inline |
Accept multiple characters at once.
n | the number of characters to accept. |
|
inline |
Cancel the effect of the last yyGet() or yySkip().
Note that the echo mode should not change between calls to yyGet / yySkip and yyUnget. The caller is responsible for ensuring that.
|
inline |
Cancel the effect of the last yyGet() or yySkip().
Note that the echo mode should not change between calls to yyGet / yySkip and yyUnget. The caller is responsible for ensuring that.
|
inline |
Puts a character back into the stream, canceling the effect of the last yyGet() or yySkip().
Note that the echo mode should not change between calls to unput, get, or skip from the stream.
|
inline |
Puts a character back into the stream, canceling the effect of the last yyGet() or yySkip().
Note that the echo mode should not change between calls to unput, get, or skip from the stream.
const char * Lex_input_stream::found_semicolon |
Position of ';' in the stream, to delimit multiple queries.
This delimiter is in the raw buffer.
const int Lex_input_stream::grammar_selector_token |
The synthetic 1st token to prepend token stream with.
This token value tricks parser to simulate multiple start-ing points. Currently the grammar is aware of 4 such synthetic tokens:
bool Lex_input_stream::ignore_space |
SQL_MODE = IGNORE_SPACE.
enum_comment_state Lex_input_stream::in_comment |
State of the lexical analyser for comments.
enum_comment_state Lex_input_stream::in_comment_saved |
int Lex_input_stream::lookahead_token |
LALR(2) resolution, look ahead token.
Value of the next token to return, if any, or -1, if no token was parsed in advance. Note: 0 is a legal token, and represents YYEOF.
Lexer_yystype * Lex_input_stream::lookahead_yylval |
LALR(2) resolution, value of the look ahead token.
|
private |
UTF8-body buffer created during parsing.
|
private |
Pointer to the current position in the UTF8-body buffer.
|
private |
Begining of the query text in the input stream, in the raw buffer.
Beginning of the query text in the input stream, in the raw buffer.
|
private |
Length of the raw buffer.
|
private |
Pre-processed buffer.
|
private |
Pointer to the current position in the pre-processed input stream.
const char * Lex_input_stream::m_cpp_text_end |
Ending position of the TEXT_STRING or IDENT in the pre-processed buffer.
NOTE: this member must be used within MYSQLlex() function only.
const char * Lex_input_stream::m_cpp_text_start |
Starting position of the TEXT_STRING or IDENT in the pre-processed buffer.
NOTE: this member must be used within MYSQLlex() function only.
|
private |
Ending position of the previous token parsed, in the pre-processed buffer.
|
private |
Starting position of the last token parsed, in the pre-processed buffer.
|
private |
Position in the pre-processed buffer.
The query from m_cpp_buf to m_cpp_utf_processed_ptr is converted to UTF8-body.
sql_digest_state * Lex_input_stream::m_digest {nullptr} |
Current statement digest instrumentation.
|
private |
Echo the parsed stream to the pre-processed buffer.
|
private |
|
private |
End of the query text in the input stream, in the raw buffer.
|
private |
Pointer to the current position in the raw input stream.
THD * Lex_input_stream::m_thd |
Current thread.
|
private |
Ending position of the previous token parsed, in the raw buffer.
|
private |
Starting position of the last token parsed, in the raw buffer.
const CHARSET_INFO * Lex_input_stream::m_underscore_cs |
Character set specified by the character-set-introducer.
NOTE: this member must be used within MYSQLlex() function only.
bool Lex_input_stream::multi_statements |
true if we should allow multi-statements.
enum my_lex_states Lex_input_stream::next_state |
Current state of the lexical analyser.
const CHARSET_INFO * Lex_input_stream::query_charset |
bool Lex_input_stream::skip_digest |
Skip adding of the current token's digest since it is already added.
Usually we calculate a digest token by token at the top-level function of the lexer: MYSQLlex(). However, some complex ("hintable") tokens break that data flow: for example, the SELECT /*+ HINT(t) */
is the single token from the main parser's point of view, and we add the "SELECT" keyword to the digest buffer right after the lex_one_token() call, but the "/*+ HINT(t) *‍/" is a sequence of separate tokens from the hint parser's point of view, and we add those tokens to the digest buffer inside the lex_one_token() call. Thus, the usual data flow adds tokens from the "/*+ HINT(t) *‍/" string first, and only than it appends the "SELECT" keyword token to that stream: "/*+ HINT(t) *‍/ SELECT". This is not acceptable, since we use the digest buffer to restore query strings in their normalized forms, so the order of added tokens is important. Thus, we add tokens of "hintable" keywords to a digest buffer right in the hint parser and skip adding of them at the caller with the help of skip_digest flag.
bool Lex_input_stream::stmt_prepare_mode |
true if we're parsing a prepared statement: in this mode we should allow placeholders.
uchar Lex_input_stream::tok_bitmap |
Token character bitmaps, to detect 7bit strings.
uint Lex_input_stream::yylineno |
Current line number.
Lexer_yystype * Lex_input_stream::yylval |
Interface with bison, value of the last token parsed.
uint Lex_input_stream::yytoklen |
Length of the last token parsed.