WL#5967: Use Bison 'locations' for accessing statement text elements
Affects: Server-5.7
—
Status: Complete
MySQL uses a set of dummy rules to be able to access elements from the text string that constitutes the current query. These rules, named 'remember_name' and 'remember_end' returns a pointer to the start and end of the current token. select_item: remember_name expr remember_end select_alias { ... $2->set_name($1, (uint) ($3 - $1), thd->charset()); } Bison has a built-in mechanism called 'locations' that is designed for error reporting, but which is also usable for replacing remember_name/remember_end. Using locations, the new syntax will be: select_item: expr select_alias { ... $1->set_name(@1->start, (uint) (@1->end - @1->start), thd->charset()); } Also the parser overuses lexical scanner's internals: Lex_input_stream class functions such as get_ptr(), get_tok_start(), get_tok_end(), get_cpp_ptr(), get_cpp_tok_start() etc. Most of these function calls look confusing (because of dependency on the current lookahead state etc), and Bison locations are natural and much more flexible replacements for them. References: http://www.gnu.org/software/bison/manual/html_node/Locations.html Note: this is a pure refactoring WL, no existent functionality should be changed [at this stage].
Location support is provided by the YYLTYPE structure. Its default implementation is: typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; } YYLTYPE; 1. Remove unneeded line/column-related fields and extend this structure with pointers to the start and end of the current token in the preprocessed and raw buffers: typedef struct YYLTYPE { char *start; // token start in the preprocessed buffer char *end; // the 1st byte after the token in the preprocessed buffer char *raw_start; // token start in the raw buffer char *raw_end; // ... } YYLTYPE; Other layouts of YYLTYPE buffer were attempted, but no performance impact was seen. YYLTYPE is 16 bytes long on 32bit platforms and 32 bytes long on 64bit platforms respectively. Initially, the parser allocates stack for YYINITDEPTH (100) YYLTYPE structures. Then, if the statement is long, the reallocated stack capacity grows by 1000 structures up to MY_YACC_MAX (32000) structures for really huge queries. Thus, YYLTYPE stack size varies from 1600 bytes to 512000 bytes on 32bit platform and from 3200 bytes to 1024000 bytes on 64bit platforms respectively. On 32bit platform sizeof(YYLTYPE) == sizeof(YYSTYPE) (see %union), so all numbers are same for both "location" and "value" stacks. Thus, since the "value" (YYSTYPE) stack size is not a problem for the server, most likely the new "location" stack should not be a problem as well. Note: The non-preprocessed buffer contains the input string as is, as it was received in the packet. The preprocessed buffer obviously contains results of preprocessor work. The preprocessor filters out: a. regular commentaries, b. "/*!", "/*!Mmmdd" (where M.mm.dd is a release number) and "*/" marks of conditional commentaries, c. bodies of conditional commentaries where a release number is bigger than the current one. "Raw buffer" pointers are necessary for SP processing, since it is supposed that the parser saves SP's body with commentaries. 2. Add a support for the YYLTYPE structure stack (like we already have for state and semantic action stacks). 3. Replace the use of the various get_tok_* etc methods in sql_yacc.yy with operations on the locations structure: 3.1. If $N refers to the "remember_name" nonterminal, then replace it with @(N+1).start. 3.2. If $N refers to the "remember_end" nonterminal, then replace it with @(N-1).end. 3.3. Remove all "remember_name" and "remember_end" nonterminal entries and rules. 3.4. Replace YY_TOKEN_START --> yylloc.raw_start. 3.5. YY_TOKEN_END replacements: rule: x1 x2 ... xN { YY_TOKEN_END --> @N.raw_end } rule: { YY_TOKEN_END --> @0.raw_end } 3.6. get_cpp_tok_start() replacements: rule: ... x { get_cpp_tok_start() --> yylloc.start } rule: x1 x2 ... xN { get_cpp_tok_start() + strlen(xN) --> @N.end } 3.7. get_cpp_ptr() replacements: rule: { get_cpp_ptr() --> @0.end } rule: { ... get_cpp_ptr() ... } x --> rule: { ... } x { @2.start } 3.8. get_tok_start() replacement: rule: x1 x2 ... xN { get_tok_start() --> @N.raw_start } 3.9. get_tok_end() replacement: rule: x1 x2 ... xN { get_tok_end() --> @N.raw_end } 3.10. get_prt() replacement: rule: x1 x2 ... xN { get_prt() --> @N.raw_end } 4. Remove unnecessary Lex_input_stream::get_tok_star
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.