WL#5967: Use Bison 'locations' for accessing statement text elements

Affects: Server-5.7   —   Status: Complete   —   Priority: Medium

MySQL uses a set of dummy rules to be able to access elements from the text string 
that constitutes the current query. These rules, named 'remember_name' and 
'remember_end' returns a pointer to the start and end of the current token. 

 remember_name expr remember_end select_alias
   $2->set_name($1, (uint) ($3 - $1), thd->charset());

Bison has a built-in mechanism called 'locations' that is designed for error 
reporting, but which is also usable for replacing remember_name/remember_end.

Using locations, the new syntax will be:

 expr select_alias
   $1->set_name(@1->start, (uint) (@1->end - @1->start), thd->charset());

Also the parser overuses lexical scanner's internals: Lex_input_stream class
functions such as get_ptr(), get_tok_start(), get_tok_end(), get_cpp_ptr(),
get_cpp_tok_start() etc.
Most of these function calls look confusing (because of dependency on the
current lookahead state etc), and Bison locations are natural and much more
flexible replacements for them.


Note: this is a pure refactoring WL, no existent functionality should be changed
[at this stage].
Location support is provided by the YYLTYPE structure.  Its default implementation 
     typedef struct YYLTYPE
       int first_line;
       int first_column;
       int last_line;
       int last_column;
     } YYLTYPE;

1. Remove unneeded line/column-related fields and extend this structure with
pointers to the start and end of the current token in the preprocessed and raw
     typedef struct YYLTYPE
       char *start; // token start in the preprocessed buffer
       char *end;   // the 1st byte after the token in the preprocessed buffer
       char *raw_start; // token start in the raw buffer
       char *raw_end;   // ...
     } YYLTYPE;

Other layouts of YYLTYPE buffer were attempted, but no performance impact
was seen. 

YYLTYPE is 16 bytes long on 32bit platforms and 32 bytes long on 64bit
platforms respectively.
Initially, the parser allocates stack for YYINITDEPTH (100) YYLTYPE
structures. Then, if the statement is long, the reallocated stack capacity
grows by 1000 structures up to MY_YACC_MAX (32000) structures for
really huge queries.
Thus, YYLTYPE stack size varies from 1600 bytes to 512000 bytes on
32bit platform and from 3200 bytes to 1024000 bytes on 64bit platforms

On 32bit platform sizeof(YYLTYPE) == sizeof(YYSTYPE) (see %union), so all
numbers are same for both "location" and "value" stacks.
Thus, since the "value" (YYSTYPE) stack size is not a problem for the server,
most likely the new "location" stack should not be a problem as well.

  The non-preprocessed buffer contains the input string as is,
  as it was received in the packet.
  The preprocessed buffer obviously contains results of preprocessor work.
  The preprocessor filters out:
    a. regular commentaries,
    b. "/*!", "/*!Mmmdd" (where M.mm.dd is a release number) and "*/" marks
       of conditional commentaries,
    c. bodies of conditional commentaries where a release number is bigger
       than the current one. 
   "Raw buffer" pointers are necessary for SP processing, since it is supposed
   that the parser saves SP's body with commentaries.

2. Add a support for the YYLTYPE structure stack (like we already have for state
and semantic action stacks).

3. Replace the use of the various get_tok_* etc methods in sql_yacc.yy with
operations on the locations structure:

3.1. If $N refers to the "remember_name" nonterminal, then replace it with

3.2. If $N refers to the "remember_end" nonterminal, then replace it with

3.3. Remove all "remember_name" and "remember_end" nonterminal entries and

3.4. Replace YY_TOKEN_START --> yylloc.raw_start.

3.5. YY_TOKEN_END replacements:

  rule: x1 x2 ... xN
    YY_TOKEN_END --> @N.raw_end

    YY_TOKEN_END --> @0.raw_end

3.6. get_cpp_tok_start() replacements:

  rule: ... x
    get_cpp_tok_start() --> yylloc.start

  rule: x1 x2 ... xN
    get_cpp_tok_start() + strlen(xN) --> @N.end

3.7. get_cpp_ptr() replacements:

    get_cpp_ptr() --> @0.end

  rule: { ... get_cpp_ptr() ...  } x
  rule: { ... } x { @2.start }

3.8. get_tok_start() replacement:

  rule: x1 x2 ... xN
    get_tok_start() --> @N.raw_start

3.9. get_tok_end() replacement:

  rule: x1 x2 ... xN
    get_tok_end() --> @N.raw_end

3.10. get_prt() replacement:

  rule: x1 x2 ... xN
    get_prt() --> @N.raw_end

4. Remove unnecessary Lex_input_stream::get_tok_star