WL#4760: Introduce Value Object

Affects: Server-9.x   —   Status: In-Design

Introduce a value objects class to be used by Item/Field classes.

The main value of this task is to centralize the data conversion between types.
This will ease the addition of new types (e.g., Timestamp) since data
conversion methods will no longer have to be added for each Item/Field class.
Also, by changing Item/Field classes and associates so that Value objects are
passed around instead, the amount of type conversion may be reduced since type
conversion is not done until it is actually needed. 

See also WL#4154
NOTE: The design described in this work log has been abandoned.  The
      immutable Value objects contain references to objects allocated
      outside the Value objects (e.g., String objects).  As the prototype
      showed, this created issues with respect to ownership of data.  

      A redesign based on Value registers is described in WL#4904, and
      the work will be continued in the context of that worklog.  The
      basic principles from this work log will be continued (e.g.,
      separating values and type metadata from the Item/Field
      classes.)


The Value class is intended to be used by Item and Field class hierarchies.
Each class will have a value() method that return a Value object that
represents the value of the object.  The Value class will contain
methods to convert the value to the currently supported basic types
(int, double, decimal, and string). 


INTERFACE
=========

The Value class interface should look something like this:

class Value {
public:
  // Constructors
  Value(String*, const Sqltype_metadata&);
  Value(int, const Sqltype_metadata&);
  Value(double, const Sqltype_metadata&);
  Value(my_decimal*, const Sqltype_metadata&);
  Value(const Sqltype_metadata&);  // For null values

  bool is_null(const Sqltype_metadata&);

  // Data conversion methods
  long long to_int() const;
  String* to_String(String*) const;  // No alloc, write into provided String
  my_decimal* to_decimal(my_decimal*) const;
  double to_real();
}


CLASS RESPONSIBILITIES
======================

This tries to define the classes that are involved:

Item
    Manage computations and conversions. 

Field
    Store and read values from record fields. Hides storage engine
    specific issues. 

Value
    Represent a Value and do type-specific conversions.

Sqltype_metadata 
    Hold metadata associated with a value of a specific type. Know how
    to convert between types.


METADATA
========

In order to be able to correctly convert values to the 4 basic types,
some extra information that is currently stored in the Item/Field
objects will be needed.  This metadata will be extracted from
Item/Field classes and put in a separate object that can be passed by
reference to the Value contructors.  Since all metadata is not
necessarily relevant for all types of Field/Item objects (e.g.,
numeric fields need different metadata from string fields), to save
space, a hierarchy of metadata structs will be created.


OTHER REQUIREMENTS
==================

This work touches a central piece of MySQL's query infrastructure.
This means that performance is very important, and it follows that:
    - One should try to limit the number of allocated Value objects
    - Polymorphism (i.e., virtual methods) should be avoided

PLAN
====

The work will be done in the following steps:

1. Implement Value class with associated unit test.
2. Introduce a value() method in all Field classes.  The approach will be to
   return the Value object by value.  E.g. if the val_xxx methods today
   do "return result;", the value method will do "return
   Value(result);".
3. Change val_xxx methods to call value().to_xxx.
4. Eliminate val_xxx use within Field classes.
5. Repeat steps 2.-4. for the Item classes.
6. Remove external use of val_xxx methods.
7. Remove val_xxx methods.  

In order to partition this work into smaller tasks, step 5 through 7
will be covered by WL#4904.

Time plan for step 1 through 4:
Until Week 10: Create Value class and unit test
Week 11,12: Introduce Value object in Field classes. (Step 2&3) above.
Week 13: Add data structure for storing metadata needed by Value object
Week 14: Stockholm meeting, Eliminate val_xxx use within Field classes (Step 4)
Week 15: Vacation
Week 16-17: Eliminate val_xxx use within Field classes (Step 4)
            CommunityOne presentation
Week 18-21: Finalize work:  Architectural review, patch clean-up, review.

Estimated completion date: June 1 (2009)

References
==========

Classical Value object design pattern:
http://c2.com/cgi/wiki?ValueObject
Follow the links for additional documentation and discussion of pro et cons.

Double dispatch design pattern:
http://en.wikipedia.org/wiki/Double_dispatch

Falcon's implementation of Value:
http://bazaar.launchpad.net/~mysql/mysql-server/mysql-6.0-falcon/annotate/head%3A/storage/falcon/Value.h

Note, that Falcon's Value is not a Value in the classical sense (as defined
in Portland Patterns Repository).

The entry point at PHP language variable implementation, zval, in php/Zend/zend.h

Value CLASS
===========

The public interface is shown in HLS.  

The Value class will contain the following private data:

  const Sqltype_metadata& metadata;

  union val
  {
    longlong int_val;
    double real_val;
    my_decimal* decimal_val;
    String* string_val;
  } val;

metadata refers to the metadata object that contains the information
that is needed for type conversions, including the type of the value
(more info below).  

val contains the actual value, which can be stored in four different
formats (longlong, double, my_decimal and String). For a given Value
object, metadata.type will indicate which format is used. As given by
the interface, there is one constructor for each of the value formats,
parameters are the value and the associated metadata object. Note that
for decimal and string, only pointers to the objects are stored.  It
is assumed that the creator's of Value objects (Field and Item
classes) ensures that the life time of the values are at least as long
as the lifetime of the value object.  There will also be a constructor
with  only a metadata parameter.  This will create an object that
represent a NULL value.

There will be four to_xxx() methods for converting between the four
basic formats.  Each conversion method () will then contain a switch
on the type for converting to the desired format.  The type is
determined based on the metadata information passed to the Value
object in its constructor.  (TODO: Add example).

There is some concern that the switch will be unnecessary overhead in many
scenarios since the actual type is already known by the caller.  Hence, it will
be considered to add get_xxx() methods which returns the value without checking
that it is of the assumed type (except for an assert in debug mode).

Falcon also contains a Value class, and during prototyping, some
problems has been due to this name collision (e.g. the Falcon Value
destructor was called on server Value objects).  There are several
alternatives on how to avoid such issues:

  1. Call our Value class something else (e.g., My_value)
  2. Rename Falcon's Value class
  3. Use a namespace to protect server names from storage engines.

The proposal is alternative 3; use a namespace.  (Value is too good a
name to let storage engines have it).  The namespace is currently
called mysql, but a less general name should probably be considered
(maybe something package related).


METADATA
========

Metadata that is needed for type conversion will be store separately
from the Item/Field classes in order to be able to pass such metadata
to Value objects. A struct, Sqltype_metadata, acts as the base class
for the metadata and contain metadata that is needed for all data
types (e.g., the type of the value).

Since not all metadata is relevant for all data types, "substructs" of
Sqltype_metadata is used in order to not have to store the union of
all metadata that is needed for all data types.  For fields there will
be two different metadata types in use, Numeric_metadata and
String_metadata.  Note that in order to simplify the metadata
hierarchy, Numeric_metadata will contain a precision field which is
only relevant for decimal data, and String_metadata will contain a
TYPELIB pointer which is only relevant for enums.

How type conversion is done may be different for different field types
even if the basic representation (int, real, decimal, string) is the
same.  E.g., timestamp values and enum values are represented as
longlong values, but the conversion to string is different from
integer values. Hence, the type information stored in the metadata
need to be more granular.  One alternative would be to use
enum_field_types (ie., SQL types), as defined by Field::real_type(),
to represent the type of a value.  However, that is a finer
granularity than actually needed since it distinguishes between types
that should be treated the same (e.g., integers of different sizes).
This is a problem since the number of case statements in the switch
statements that do the type conversion will increase.  To
overcome this, a new enum will be created for representing the type of
a Value.

A separate type value will be used to represent null values.  This
should be OK since the original type of a NULL value is not relevant
for type conversions.  Hence, it should no be necessary to store
information about NULL value separately.


DIVISION OF RESPONSIBILITIES BETWEEN Value AND Sqltype_metadata
===============================================================

Since value and metadata is separated in different classes, it becomes
a design question where the type conversion should be placed; with the
value or with the metadata.  There are several alternatives:

1. Do all type conversion in the Value class.  This means that each
   to_xxx() method will do a switch on metadata.type and do
   conversions when necessary.  In order to access the necessary
   metadata, one will need to downcast the metadata reference to the
   metadata subtype used for this type.

2. Do type conversions that do not need metadata (except type) in the
   value class and forward all other type conversions to the metadata
   class.  This way, the overhead of an extra function call will only
   occur when metadata is needed.

3. Do all type conversions within the metadata class, regardless of
   whether extra metadata is needed.  This, way the type conversion
   will be more uniform.  If type conversion is not needed, metadata
   methods will not be called.

One advantage of alternatives 2 and 3 is that type conversions is
encapsulated together with the metadata that describes how this
conversion should be done.  Also, one will be free to base the
resolving of type either on a switch statement or on virtual
methods.  (For most type conversions the overhead of a virtual method
call will not be significant.)  Another advantage is that the
Value:to_xxx methods will be smaller and better suited for in-lining.
(Actually, this could be achieved within Value class, too)

On the other hand, doing the type conversion in a metadata class,
means exposing the representation of values to the metadata.

Currently, Implementor is leaning towards alternative 2, but he may
easily be convinced to change his mind.


CHANGES TO Field CLASSES
========================

1. Introduce value() methods to return Value objects. The methods will
   look like the current val_xxx method (without type conversion),
   except that what is currently returned will be passed to a Value
   constructor.  Example from Field_tiny:

    Value Field_tiny::value()
    {
      ASSERT_COLUMN_MARKED_FOR_READ;
      int tmp= num_meta.unsigned_flag
               ? (int) ptr[0] : (int) ((signed char*) ptr)[0];
      return Value((longlong) tmp, num_meta);
    }     

   (num_meta is the meta data object, see below)

2. Remove val_xxx() methods.  Replace calls to such methods with calls
   to value().to_xxx().  However, one should also consider whether the
   to_xxx() is actually needed at this point, or whether the value
   object could be used instead.  (E.g., returning Value object instead
   of integer.) 

3. Move metadata that are needed for type conversion from Field
   classes to Sqltype_metadata and its subclasses (as described
   above).  The direct subclasses of Field (Field_num, Field_Str, and
   Field_bit) will embed an object of (a subclass of) Sqltype_metadata
   that contains the metadata that is relevant for this type of data.
   Field::metadata is set to point to this object.  

4. Field::metadata is protected so get-methods will be added to access
   the previously public fields.

5. Constructors must set metadata.type to the corresponding type.


CHANGES TO OTHER CLASSES
========================

1. Replace calls to val_xxx() methods with calls to value().to_xxx().
   However, one should also consider whether the to_xxx() is actually
   needed at this point, or whether the value object could be used
   instead.  (E.g., returning Value object instead of integer.)

2. Change access to previously public Field members that have been
   moved to metadata (e.g., field_length) to use new access method
   instead.

MISSING TESTING  (Moved to WL#5050)
===============

The following val_xxx methods is not covered by the main test suite.  Tests for
these methods should, if possible, be added:

Field_double::val_int()
Field_float::val_int()
  More test cases will be added to main.cast test to test these methods.

Field_blob::val_int()
Field_varstring::val_int()
  No way has been found to activate these methods. 

So far, no way has been found to instantiate Field_null with SQL.
Field_date, Field_decimal are obsolete and will not be tested.