WL#7199: True bottom-up server parser: common framework for the refactoring

Affects: Prototype Only   —   Status: Complete

This is a design WL for the common bottom-up parser refactoring framework.

This worklog implements one step in the overall task to "Refactor MySQL server
parser to build the AST in a natural "bottom-up" way".

One of design goals is to have larger syntax rules in the future, leading to
fewer (but larger) intermediate nodes in the parse tree. The argument is that
this will give fewer node allocations and hopefully also a faster grammar, as
there are fewer rules all together. 
No functional requirements.

Non-functional requirements:
* no query result changes;
* no performance degradation
Goal of this WL: provide base framework classes and a methodology to transform
the current SQL parser to a pure bottom-up parser by small separate and safe
steps.


* grammar simplifications:
  ** de-duplication of similar rules,
  ** removal of useless intermediate rules that
     a) have a single RHS and 
     b) the RHS consists of a single non-terminal symbol;
* reducing the number of shift/reduce grammar conflicts,

How the new grammar looks after the "bottom-up" refactoring:

* No usage of semantic action at the beginning and in the middle of
  rule right hand sides (RHS). I.e. the only one semantic action is
  allowed for every RHS and it's permitted at the end of RHS only.
* The only allowed things in the semantic action:
  ** memory allocation of a parse tree node,
  ** (very rare) feedback to the scanner.
  ** limited read-only access to things like
     thd->variables.character_set_client

We defer and move all other semantic things out of grammar rules, i.e.:

* most of error messaging,
* everything that depends on the context:
  ** context-dependent creation of Items,
  ** allocating and filling of LEX/SELECT_LEX/SELECT_LEX_UNIT structures,
* saving/restoring intermediate context-related information in LEX and
  Parser_state structures.

Thus, after the refactoring the typical grammar rule looks like this:

        lhs:
            nonterminal1 TERMINAL2 TERMINAL3 nonterminal4 ... nonterminalN
            {
              $$= new(mem_root) X(@$, $1, $4, @4, ... $N);
            }
          | ...
          ...
          ;

where X is a type of the particular parse tree node and "@$" is a Bison
"location" of the RHS's beginning, or, the same with a helper NEW_PTN macro:

        lhs:
            nonterminal1 TERMINAL2 TERMINAL3 nonterminal4 ... nonterminalN
            {
              $$= NEW_PTN X(@$, $1, $4, @4, ... $N);
            }
          | ...
          ...
          ;

And like this:

        lhs:
             TERMINAL1 { $$= TRIVIAL_CONSTANT1; }
           | TERMINAL2 { $$= TRIVIAL_CONSTANT2; }
           ...
           ;

for trivial cases where the allocating of parse tree nodes is not reasonable.

The topmost grammar rule returns the root object of a parse tree back to the
caller via an additional parameter of the MYSQLparse() function:

        %parse-param { class THD *thd }
        %parse-param { Parse_tree_node **tree_root }
        %start topmost_rule
        ...

        topmost_rule:
            query { *tree_root= $1; }
          ;
      
and the current definition of the MYSQLparse function:

        int MYSQLparse(void *thd);

has changed to:

        int MYSQLparse(class THD * thd, Parse_tree_node **tree_root);

with the help of the "%parse-param" Bison feature (bugfix for the bug#18017820
introduces the use of %parse-param).


The refactored calling code at the parse_sql() function is:

        ...
        Parse_tree_node *parse_tree;
        bool mysql_parse_status= MYSQLparse(thd, &parse_tree) != 0;
        mysql_parse_status|= parse_tree->contextualize(thd);
        ...

where the "parse_tree->contextualize(thd)" call does the contextualization of
the parse tree, i.e. its transformation to our current abstract syntax tree
(AST). In other word, this call wraps and subsequently executes all
context-dependent and context-modifying code that we moved out from old
semantic actions by previous steps of the refactoring.


Transformation
==============

Let's illustrate the idea and the transformation process by examples.

The simplest case
-----------------

        simplest_nonterm:
            TERMINAL { some code that reads and modifies LEX }
          ;

Generally speaking, a pure bottom-up parsing requires a pure
context-independent grammar (including semantic actions). The semantic action
above currently depends on the context that "calling" rules set, so let try to
transform the grammar to remove that dependency.
This sample rule doesn't depend on other terminals, so the way to make it
context-independent is obvious: we can lift its semantic action up and apply
it right after every reference to the "simplest_nonterm".

Before the transformation:

        simplest_nonterm:
            TERMINAL { simplest_nonterm_code; }
          ;

        caller1:
            smth ... simplest_nonterm
            {
              code-1;
            }
          | smth ... simplest_nonterm TERMINAL1
            {
              code-2;
            }
          | smth ...
            {
              code-3;
            }
            simplest_nonterm
          ;

After the transformation:

        simplest_nonterm:
            TERMINAL
            {}
          ;

        caller1:
            smth ... simplest_nonterm
            {
              simplest_nonterm_code;
              code-1;
            }
          | smth ... simplest_nonterm TERMINAL1
            {
              simplest_nonterm_code;
              code-2;
            }
          | smth ...
            {
              code-3;
            }
            simplest_nonterm
            {
              simplest_nonterm_code;
            }
          ;

Obviously, the grammars above are semantically equivalent like inlined
functions are equivalent to their non-inlined calls.


More complex case
-----------------

However, if the RHS of the "calling" rule is referencing other non-terminals
after the simplest_nonterm:

        caller2:
            smth ... simplest_nonterm nonterm2
            {
               code-20;
            }
          ;

we have to keep in mind nonterm2's semantic actions: we can't just place
simplest_nonterm_code before code-20, since simplest_nonterm_code may change
the global context that nonterm2 may still depend on it.

So, the more safe transformation is:

        caller2:
            smth ... simplest_nonterm
            {
              simplest_nonterm_code;
            }
            nonterm2
            {
              code-20;
            }
          ;

Unfortunately that transformation may be unsafe from the parser generator's
point of view: we are inserting a new semantic action in the middle of the
existing rule, so we can introduce a new grammar ambiguity (conflicts) here,
since the new semantic action may disable some necessary lookahead.
The best way to workaround such a trouble is to transform "nonterm2" first:

Before:

        nonterm2:
            something_transofmed  { nonterm2_code; }
          ;

        simplest_nonterm:
            TERMINAL { simplest_nonterm_code; }
          ;

        caller2:
            smth ... simplest_nonterm nonterm2
            {
               code-20;
            }
          ;

First transformation:

        nonterm2:
            something_transofmed {}
          ;

        simplest_nonterm:
            TERMINAL { simplest_nonterm_code; }
          ;

        caller2:
            smth ... simplest_nonterm nonterm2
            {
               nonterm2_code;
               code-20;
            }
          ;

Second transformation:

        nonterm2:
            something_transofmed {}
          ;

        simplest_nonterm:
            TERMINAL {}
          ;

        caller2:
            smth ... simplest_nonterm nonterm2
            {
               simplest_nonterm_code;
               nonterm2_code;
               code-20;
            }
          ;

So, the rule is: transform RHS from the rightmost nonterminal to the leftmost
one where applicable. However, this rule doesn't help much in cases like this:

        caller2:
            simplest_nonterm nonterm2 { ... }
          | nonterm2 simplest_nonterm { ... }
          ;

We can still find some workarounds there: transform simplest_nonterm and
nonterm2 simultaneously, or make sure that one of them doesn't modify the
context of other one and transform them in a safe order, or something else
like that. OTOH we can use more general approach there like for recursive
grammar (see below).


Recursive case
--------------

However, the grammar is not a tree but a cyclic graph, and we also have many
cases of direct and indirect recursion, where the described method of semantic
action code lifting doesn't work at all:

        // Direct recursion:

        nonterm3:
            nonterm3 '+' nonterm3 { code_31; }
          | TERMINAL3             { code_32; }
          ;

        // Indirect recursion between nonterm4 and nonterm5:

        nonterm4:
            TERMINAL4 nonterm5 { code_40; }
          ;

        nonterm5:
            nonterm4   { code_51; }
          | TERMINAL5  { code_52; }
          ;

        // Upper level:

        caller3:
            nonterm3
          | nonterm4
          ;


The obvious solution of that problem has two-steps:
1. build a parse tree: defer the execution of current semantic action code and
   replace it with the allocation of parse tree node objects;
2. when the parsing is done and we have a complete parse tree, "interpret" the
   tree as a program: do a depth-first traversal and execute former semantic
   action code parts in a correct order.

It seems logical to use Bison value stack to return newly create parse tree
nodes from refactored rules. Intermediate transformation of the parser:

        // Direct recursion:

        nonterm3:
            nonterm3 '+' nonterm3 { $$= new Node_31($1, $3); }
          | TERMINAL3             { $$= new Node_32; }
          ;

        // Indirect recursion between nonterm4 and nonterm5:

        nonterm4:
            TERMINAL4 nonterm5 { $$= new Node_40($2); }
          ;

        nonterm5:
            nonterm4   { $$= new Node_51($1); }
          | TERMINAL5  { $$= new Node_52; }
          ;

        // Upper level:

        caller3:
            nonterm3
            {
              $1->contextualize();
            }
          | nonterm4
            {
              $1->contextualize();
            }
          ;

Parse tree node classes (Node_31 etc) have:
a) constructors that save subtrees as member variables for further use, 
b) the virtual "contextualize" function that:
   * calls "contextualize" functions on previously saved subtrees (depth-first
     traversal) and
   * executes former semantic actions from the corresponding rule.

This approach does the work with a minimal overhead (MEM_ROOT allocations of
small parse tree node objects are cheap). The only problem is a transformation
of massive grammar parts with indirect recursions such as "expr" rule: it
would be nice to split huge transformations in smaller reviewable and testable
iterations.

The framework introduces a trick to transform such large interlinked grammar
parts step by step: the root class of parse tree node hierarchy has two kind
of virtual function for the parse tree contextualization:
1. the regular "contextualize()" function for deferred execution of
   context-sensitive code;
2. the intermediate "contextualize_()" and "contextualizeN()" helper functions
   (let describe usage details of contextualizeN later).

We use intermediate contextualize_() functions to wrap existent action code
and to call the wrapped code immediately at the same old place:

Before transformation:

        nonterm6:
            smth ...
            {
              code_60;
            }
          ;

          nonterm7:
             nonterm6 ...
             {
               code_70;
             }
           ;

           nonterm8:
             nonterm7 ...
             {
               code_80;
             }
           ;

Intermediate stage:

        nonterm6:
            smth ...
            {
              $$= new Node_60(...);
              $$->contextualize_(); // calls code_60
            }
          ;

The contextualize() functions in their intermediate state do nothing: at the
early stage we call them right after the contextualize_() wrappers:

        nonterm6:
            smth ...
            {
              $$= new Node_60(...);
              $$->contextualize_(); // calls code_60
              $$->contextualize(); // does nothing: just a placeholder
            }
          ;

then we lift contextualize() calls up (this is safe since they actually do
nothing) but keep contextualize_() at the old place:

        nonterm6:
            smth ...
            {
              $$= new Node_60(...);
              $$->contextualize_(); // calls code_60
            }
          ;

          nonterm7:
            nonterm6 ...
            {
              $1->contextualize(); // still does nothing, just a placeholder
              code_70;
            }
           ;
           ...

then wrap that lifted contextualize() call together with the code of upper
semantic actions:

        nonterm6:
            smth ...
            {
              $$= new Node_60(...);
              $$->contextualize_(); // calls code_60
            }
          ;

          nonterm7:
            nonterm6 ...
            {
              $$= new Node_70($1, ...)
              $$->contextualize_(); // calls $1->contextualize() and code_70
              $$->contextualize(); // does nothing, just another placeholder
            }
           ;
           ...

... and move newly created placeholder call up too:

          ...

          nonterm7:
            nonterm6 ...
            {
              $$= new Node_70($1, ...)
              $$->contextualize_(); // calls $1->contextualize() and code_70
            }
           ;
           
           nonterm8:
             nonterm7 ...
             {
               $1->contextualize(); // new place for the placeholder
               code_80;
             }
           ;

... and so on.

If there is a loop in the rule's RHS, we:
1. remember that rule and modify its action to return Parse_tree_node object
   (we simply ignore RHS' nonterminals for the first time);
2. when all of nonterminals in the RHS have converted to parse tree nodes,
   pass them as parameters to a parse tree node object constructor.

Before the transformation:

        nonterm9:
            nonterm10
            {
              code_90;
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              code_101;
            }
          | BAR
            {
              code_102;
            }
          ;

Step 1:

        // Decorate "nonterm9" as parse tree node, but remember that its
        // transformation is incomplete:

        nonterm9:
            nonterm10
            {
              $$= new Node_90; // don't care about nonterm10 in RHS: TBD later
              $$->contextualize_(); // code_90; TODO: call $1->contextualize()
              $$->contextualize(); // does nothing, lift this up on next step
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              code_101;
            }
          | BAR
            {
              code_102;
            }
          ;

        caller11:
            nonterm9
            {
              code_110;
            }
          ;

Step 2:

        // Lift the contextualize() call up from "nonterm9" RHS to "calling"
        // rules:

        nonterm9:
            nonterm10
            {
              $$= new Node_90; // don't care about nonterm10 in RHS: TBD later
              $$->contextualize_(); // code_90; TODO: call $1->contextualize()
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              $2->contextualize();
              code_101;
            }
          | BAR
            {
              code_102;
            }
          ;

        caller11:
            nonterm9
            {
              $1->contextualize();
              code_110;
            }
          ;

Step 3:

        // Transform "nonterm10" rule right hand sides to allocate parse tree
        // nodes:

        nonterm9:
            nonterm10
            {
              $$= new Node_90; // don't care about nonterm10 in RHS: TBD later
              $$->contextualize_(); // code_90; TODO: call $1->contextualize()
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              $$= new Node_101($2);
              $$->contextualize_(); // calls $2->contextualize() and code_101
              $$->contextualize(); // does nothing, lift up this on next step
            }
          | BAR
            {
              $$= new Node_102;
              $$->contextualize_(); // calls code_102
              $$->contextualize(); // does nothing, lift up this on next step
            }
          ;

        caller11:
            nonterm9
            {
              $1->contextualize();
              code_110;
            }
          ;

Step 4:

        // Lift contextualize() calls from noterm10 back to the its "caller"
        // nonterm9 rule:

        nonterm9:
            nonterm10
            {
              $$= new Node_90($1);
              $$->contextualize_(); // call $1->contextualize() and code_90
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              $$= new Node_101($2);
              $$->contextualize_(); // call code_101 and $2->contextualize()
            }
          | BAR
            {
              $$= new Node_102;
              $$->contextualize_(); // call code_102
            }
          ;

        caller11:
            nonterm9
            {
              $1->contextualize();
              code_110;
            }
          ;

Final step:

        // a. Remove all calls to contextualize_() in semantic actions;
        // b. Replace all implementations of contextualize() with bodies of
        //    corresponding contextualize_() functions.

        nonterm9:
            nonterm10
            {
              $$= new Node_90($1);
            }
            ;

        nonterm10:
            FOO nonterm9
            {
              $$= new Node_101($2);
            }
          | BAR
            {
              $$= new Node_102;
            }
          ;

        caller11:
            nonterm9
            {
              $1->contextualize();
              code_110;
            }
          ;

At this step the part of the grammar behind the nonterm9 rule is completely
context-free, and the caller11 rule can do a deferred contextualization.

Of course, this approach works for very long indirect recursion paths like the
"expr" rule too.


Transformation of mid-rule semantic actions
-------------------------------------------

The last remaining complexity in the grammar is a presence of semantic actions
at the beginning and in the middle of RHS:

        nonterm12:
            { code_120; }
            foo
            { code_121; }
            bar
            { code_122; }
          ;

In the common case we can't just merge code parts above into a single
contextualize_() call before making "foo" and "bar" truly context-free.
So, if "foo" or "bar" refers "nonterm12" in some direct or indirect way, we
have to defer the transformation of the "nonterm12" till the complete
transformation of the "nonterm12 <--> foo" or/and "nonterm12 <--> bar"
loops.

If there are few interlinked cases like that, we have to defer the
transformation of their multi-action rules too and then finally transform all
of them at once.

To make sure that we don't forget anything, we transform such rules with the
help of additional contextualizeN() functions:

        %union {
          ...
          class Node_120 *node_120;
        }

        ...

        nonterm12:
          {
            $$= Node_120; // save the result into intermediate $1
            $$->contextualize1(); // call code_120
          }
          foo
          {
            // save "foo" ($2) like constructor usually does and call code_121
            $1->contextualize2($2);
          }
          bar
          {
            $1->contextualize3($4); // save bar($4) and call code_122
            $$= $1;
            $$->contextualize(); // do nothing, just a placeholder
          }
        ;

Right before the final step of the cyclic grammar part transformation we
modify every involved multi-action rule this way:

1. merge all contextualizeN() functions into single contextualize_(),
2. move parameter passing from that contextualize_() to the constructor,

so rules get the desired single-action look:

        nonterm12:
            foo bar
            {
              $$= Node_120($1, $2);
              $$->contextualize_(); // calls code_120, code_121 and code_122
            }
          ;


Uncovered topics
================

1. Error messaging (see LLD for details).

2. Items: the framework helps to reuse Item object where applicable as a parse
   tree nodes (see LLD for details).

3. Typed nonterminals. All examples above shows the transformation of untyped
   rules (the type of RHS is NONE). However, many of nonterminals besides Items
   have a typed value.
   To return that value to "callers" we just embed a typed public member
   variable into a corresponding parse tree node class. We fill that variable
   with a value in the contextualize_/contextualize function.

   Before the transformation:

        %union {
          ...
          Some_type some_type;
        }

        %type  some_node

        some_rule:
            ...  { $$= some_value; }
          ;
        
        caller:
            some_rule ... { do something with some_value in $1; }
          ;

    After the transformation:

        %union {
          ...
          class Some_node *some_node;
        }

        %type  some_node

        some_rule:
            ...  { $$= new Some_node(...); }
          ;
        
        caller:
            some_rule ...
            {
              $1->contextualize();
              do something with some_value in $1->value;
            }
          ;
Parse tree hierarchy root class
===============================

All parse tree node classes are derived from the Parse_tree_node class:

class Parse_tree_node
{
...
private:
  /*
    False right after the node allocation. The contextualize/contextualize_
    function turns it into true.
  */
#ifndef DBUG_OFF
  bool contextualized;
#endif//DBUG_OFF
  /*
    Intermediate helper variable for the transformation debugging.
    True after the contextualize_() function call, i.e. "true" means, that the
    grammar rule that allocates this node is in the intermediate state of the
    refactoring and still context-sensitive.
  */
  bool transitional; // TODO: remove that after parser refactoring

public:
  /*
    Memory allocation operator are overloaded to use mandatory MEM_ROOT
    parameter for cheap thread-local allocation.
    Note: We don't process memory allocation errors in refactored semantic
    actions: we defer OOM error processing like other error parse errors and
    process them all at the contextualization stage of the resulting parse
    tree.
  */
  static void *operator new(size_t size, MEM_ROOT *mem_root) throw ()
  { return alloc_root(mem_root, size); }
  static void operator delete(void *ptr,size_t size) { TRASH(ptr, size); }
  static void operator delete(void *ptr, MEM_ROOT *mem_root) {}

protected:
  Parse_tree_node()
  {
#ifndef DBUG_OFF
    contextualized= false;
    transitional= false;
#endif//DBUG_OFF
  }

public:
   ...

  /*
    True if contextualize/contextualized function has done:
  */
#ifndef DBUG_OFF
  bool is_contextualized() const { return contextualized; }
#endif//DBUG_OFF

  /*
    Contextualization function to overload by subclasses.
    This class' contextualize() function just checks the call stack for the
    overflow, since the contextualization algorithm requires a recursive
    depth-first traversal of a parse tree.
    All overloaded "contextualize" functions call this function of the
    hierarchy root (directly or indirectly).
  */
  virtual bool contextualize(THD *thd);

  /**
   Intermediate version of the contextualize() function

   This function is intended to resolve parser grammar loops.

    During the step-by-step refactoring of the parser grammar we wrap each
    context-sensitive semantic action with 3 calls: 1. Parse_tree_node()
    context-independent constructor call, 2. contextualize_() function call to
    evaluate all context-sensitive things from the former context-sensitive
    semantic action code.  3. Call of dummy contextualize() function.

    Then we lift the contextualize() function call to outer grammar rules but
    save the contextualize_() function call untouched.

    When all loops in the grammar rules are resolved (i.e. transformed as
    described above) we: a. remove all contextualize_() function calls and b.
    rename all contextualize_() function definitions to contextualize()
    function definitions.

    Note: it's not necessary to transform the whole grammar and remove this
    function calls in one pass: it's possible to transform the grammar
    statement by statement in a way described above.
  */
  virtual bool contextualize_(THD *thd) { DBUG_ASSERT(!contextualized &&
  !transitional); transitional= true; contextualized= true; return false; }

  /**
    my_parse_error() function replacement for deferred reporting of parse
    errors

    @param      thd     current THD
    @param      pos     location of the error in lexical scanner buffers
  */
  void error(THD *thd) const;
};


Items
=====

We don't want unnecessary CPU/memory overhead, so we reuse Item object as
parse tree node objects where applicable. So, we derive the Item hierarchy
root class from the Parse_tree_node class:

class Item : public Parse_tree_node
{
protected:
  /*
    The is_parser_item field is to distinguish the way that we used to
    allocate the Item object.

    If we allocate the Item class object (actually, its successor) directly
    from the parser grammar, we need to use the context-independent Item(POS)
    constructor (see below). That constructor sets the is_parser_item field to
    true.  Other (context-sensitive) Item constructors must set the
    is_parser_item field to false.
    
    In many Item successor classes context-sensitive constructors share
    the same code with the overloaded itemize() function, so if we
    call such a constructor and apply the itemize() function, we can
    execute that shared code twice unexpectedly.

    With the help of the is_parser_item value we can recognize not
    parser-allocated Item object in the itemize() function and return
    immediately with success.
  */
  const bool is_parser_item; // true if allocated directly by the parser
public:
  ...
  /*
    Traditional constructor for Items: don't call it from parse tree nodes!
    Note: this constructor sets the "contextualized" flag to true.
  */
  Item();

  /*
    Extra constructor for parse tree node-aware Item classes.

    This constructor is a copy of the previous constructor excluding
    context-sensitive parts (free_list processing etc.): that parts go to the
    contextualize_/contextualize functions. The same thing we do when convert
    other Items to parse tree nodes: constructors do mostly nothing but save
    input parameters, contextualize/contextualize_ functions evaluate the rest
    of former constructor's stuff.
  */
  explicit Item(POS);
  ...

  /*
    The grammar rules that return Items in the old grammar still return Items
    in the refactored grammar. Some of these new Items are regular Item objects
    (we allocate that Items only once in the grammar where applicable). But some
    new Items are intermediate parse tree node objects, since we unable to
    decide what exact Item is needed to allocate during the context-free
    parsing. So we defer the allocation of some Item objects till the
    contextualization stage.
    The common contextualize/contextualize_ function call format is not
    suitable for such intermediate Items: sometimes we have to substitute them
    with other Item objects.
    Thus, we hide common contextualize/contextualize_ function and redefine
    them as itemize/_itemize functions to add an extra parameter like the
    fix_field function has.
    So, for parse tree Items we call these functions this way:
      
      item->itemize(thd, &item);
    
  */
private:
  /*
    Hide the contextualize*() functions: call/override the itemize()/itemize_()
    in Item class tree instead.
  */
  virtual bool contextualize(THD *thd) { DBUG_ASSERT(0); return true; }
  virtual bool contextualize_(THD *thd) { DBUG_ASSERT(0); return true; }

protected:
  /**
    Helper function to skip itemize()/itemize_() for grammar-allocated items

    @param [out] res    pointer to "this"

    @retval true        can skip itemize()
    @retval false       can't skip: the item is allocated directly by the parser
  */
  bool skip_itemize(Item **res)
  {
    *res= this;
    return !is_parser_item;
  }

public:
  /*
    The same as contextualize()/contextualize_() but with additional parameter
    
    This function finalize the construction of Item objects (see the Item(POS)
    constructor): we can access/change parser contexts from the itemize()
    function.

    @param        thd   current THD
    @param  [out] res   pointer to "this" or to a newly allocated
                        replacement object to use in the Item tree instead

    @retval false       success
    @retval true        syntax/OOM/etc error
  */
  virtual bool itemize(THD *thd, Item **item)
  {
    if (skip_itemize(res))
      return false;
    return super::contextualize(thd);
  }

  /*
    Intermediate version of the itemize() function

    During the step-by-step refactoring of the parser grammar we replace
    each context-sensitive Item() constructor call with 3 calls:
    1. Item(POS) context-independent constructor call,
    2. itemize_() function call to evaluate all context-sensitive things
       from former context-sensitive Item() constructor,
    3. Call of dummy itemize() function.

    Then we lift the itemize() function call to outer grammar rules but
    save the itemize_() function call untouched.

    When all loops in the Item grammar rules are resolved (i.e. transformed
    as described above) we:
    a. remove all itemize_() function calls and
    b. rename all itemize_() function definitions to itemize() function
       definitions.
  */
  virtual bool _itemize(THD *thd, Item **res)
  {
    if (skip_itemize(res))
      return false;
    ...
  }
  ...
};


NEW_PTN macro
==============

The NEW_PTN is just a wrapper aroung the "new (YYTHD->mem_root)" operator.

Lists
=====

Left-recursive lists are very common in the grammar:

        list:
            smth
            { head_code; }
          | list ',' smth
            { tail_code; }
          ;

so we introduce some helper classes and macros for their refactoring.


Item lists
----------

Typically an Item list has an uniform declaration:

        %union {
          ...
          Item *item;
          List *item_list;
        }

        %type  smth
        %type  list

        list:
            smth
            {
              $$= new List;
              $$->push_back($1);
              check for OOM errors;
            }
          | list ',' smth
            {
              $$->push_back($3);
              check for OOM errors;
            }
          ;

It seems reasonable to not introduce recursive Parse_tree_node wrappers for
each of "list" rule brances: we can use List to save parser-allocated
Items and then itemize that Items in place. (After the refactoring we are
going to replace List lists with memory-effective Mem_root_array objects).

After the refactoring that list looks this way:
        

        %union {
          ...
          Item *item;
          PT_item_list *item_list;
        }

        %type  smth
        %type  list

        list:
            smth
            {
              $$= NEW_PTN PT_item_list;
              if ($$ == NULL || $$->push_back($1))
                MYSQL_YYABORT;
            }
          | list ',' smth
            {
              if ($1 == NULL || $1->push_back($3))
                MYSQL_YYABORT;
              $$= $1;
            }
	  ;