WL#4904: Introduce Value Registers in Item class hierarchy

Affects: Server-9.x — Status: In-Design

Introduction
============

When parsing SQL statements, an Item object tree is created for each
expression. Each node of the tree represents a constant, a field, an
operator, a function call, or similar.  Today, four basic types are
supported for results of expression arithmetics (longlong, double,
my_decimal, and String).  Each Item subclass will need to handle
arguments (subtrees) of all supported types, and be able to deliver
result in all supported types.  Currently, each Item subclass need to
support four methods val_int(), val_real(), val_decimal(), and
val_str().  If a new type (e.g., DATETIME) were to be supported in
Item arithmetics, a new method would have to be added to each class
(either directly or through inheritance.)


Goals
=====

The purpose of this worklog is to introduce a Value object that can
represent a value of any type used in SQL expressions.  This task has the
following primary goals:

1) Have a centralized API to access and convert a single value (data)
   of SQL expressions. 

   Today, logic for converting between types are contained in val_xxx
   methods of Item class hierarchy.  In addition to making it
   difficult to add new types, it also lead to a lot of
   inconsistencies where conversion between types varies depending on
   context.

2) Have a centralized API for metadata operations.

   Today, when we evaluate properties of the expression, rules to
   calculate metadata are scattered across the Item class hierarchy.
   The goal is to move all functionality responsible for evaluation of
   result metadata to a central location, separated from the Item
   hierarchy.

3) Simplify addition of new expression result types to the Item
   hierarchy.

   The goal of this task is to have a single place to extend when
   adding support for a new expression result type, by separating this
   functionality from the Item class hierarchy.

In addition to the above primary goals, there are some secondary
quality goals:

4) No introduction of unnecessary copying and conversions of data in
   the execution layer. 

5) All changes to functional behavior must documented.

This task is also relevant for a few long-term MySQL goals that are
beyond the scope of this work.  However, we should keep in mind that
our design should not make these goals more difficult to achieve:

6) Avoiding unnecessary copying of data when passing values to/from
   the storage engine. 

7) Make the execution tree reusable from different connections.


Field Class Hierarchy
=====================

The goals discussed for the Item class hierarchy above, will also be
applied for the Field class hierarchy, which has similar issues with
respect to duplication of code for conversion between types.  Also,
since type metadata for expressions are often derived from field
metadata, it makes sense to also let a centralized API for metadata
operations handle the Field hierarchy.


Value Registers
===============

WL#4760 describes a design based on immutable Value objects that where
returned by value (e.g., "Value Item::value()").  In order to
reduce the cost copying a Value object, it only contained references
to larger values like string values.  This created issues with respect
to ownership of data that could not be easily solved (see WL#4760 for
more details).

An alternative to pass around immutable values, is to pass around
a reference to a storage location for a piece of data, i.e. similar to
the concept of a computer register.  For example, Item could in order
to evaluate an expression provide a method "Item::eval(Register*)",
and store the result of the evaluation in the register provided by the
caller.  This approach has some advantages:

  - It is clear that the caller has ownership to the data.

  - In most cases, the caller will know in what form it wants the
    result. Through metadata associated with the passed in register,
    the caller can specify the desired type etc. for the result.  This
    may avoid materializing intermediate results in other forms during
    the evaluation.

  - Registers can be reused for different values.  Hence, it does not
    require allocation of a new object for each value that is produced.

  - The caller may pre-allocate buffers to be used for the result and
    avoid dynamic allocation of memory during the evaluation.

This task will explore the use of Value registers.

Requirements
============

A walk-through of some examples of SQL statements revealed the
following set of requirements.
(Ref.
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=29772) It
should be possible to:

Req 1) Construct a value for a parser constant - avoiding extra
       copying, if possible.  E.g., "SELECT "long text";

Req 2) Store an instance of class Value in the protocol, avoiding
       extra copying and unnecessary switch (type), if possible.
       (Under "protocol" here we meant the client-server protocol, and
       class Protocol in particular.)

Req 3) Get a value object from the storage engine, and later on store
       it in the protocol.

Req 4) Evaluate the result metadata and store it in the protocol (send
       it to the client) *before evaluating a value*.

Req 5) Know the resulting type of the expression before evaluating the
       expression.  The logic that is responsible for that operation
       needs to be moved to a central location.

Req 6) Have a central location for type data conversion
       functionality. This functionality should be accessed through an
       API. 

Req 7) Have a central location for the implementation of
       operations. Since we would like to simplify addition of
       new expression result types, implementation of e.g.
       the operation "+" must be moved to a central location.
       (Other similar operations are: +, -, /, *, mod, div, <, >, <=,
        >=, <>, |, ||, &, &&, ^, etc.)

Req 8) To change expression result metadata between executions of a
       statement.  Expression result metadata is constant during a
       single execution, but may change between executions since
       parameter types as in "SELECT ?, ?;" may change from execution
       to execution.

       Note: This requirement affects where we can store expression
             metadata and where we allocate memory for it.

	     We would like to avoid storing result metadata in the
	     Item objects, since one of our (non-primary) goals is to
	     make the execution tree reusable from different
	     connections, and Sqltype_metadata will be specific to
	     each connection.

	     We would also like to avoid storing metadata in the Value
	     object, since we may want to create and manipulate with
	     instances of metadata objects prior to evaluation of the
	     expression, i.e., perhaps, before we have instances of
	     class Value.

Req 9) Convert a Value to int, char, string, etc.
         
       There are times when we will need to convert a value into a
       builtin C++-type:

         1) We need to convert the result of an expression to
	    TRUE/FALSE in WHERE and HAVING clauses. NULL is false.

	 2) We need to convert the result an expression to unsigned
	    longlong in LIMIT clause.

	 3) We need to be able to store the result of an expression in
	    the Protocol, therefore Value needs to publish a
	    send(Protocol *) method, or an analogue.

	 4) Today, we use val_str(), val_int() and other val_* methods
  	    of class Field currently in simplistic engines, such as
  	    CSV or ARCHIVE, to retrieve the data from table->record[].


SUBTASKS
========

1) Centralized API for metadata operations (WL#5081)
   (Requirements 4 and 5 above)
2) Value register implementation (WL#5082)
   (Req. 1, 6, and 9)
3) Introduce Value registers in Field classes (WL# TBD)
   (Req. 3) 
4) Introduce Value registers in Item classes (WL# TBD)
   (Req. 8)
5) Introduce Value registers in Protocol (WL# TBD)
   (Req. 2)
6) Centralized API for operations on Values (WL# TBD)
   (Req. 7)

More subtasks to come ...