WL#4904: Introduce Value Registers in Item class hierarchy
Affects: Server-9.x — Status: In-Design — Priority: Medium
Introduction ============ When parsing SQL statements, an Item object tree is created for each expression. Each node of the tree represents a constant, a field, an operator, a function call, or similar. Today, four basic types are supported for results of expression arithmetics (longlong, double, my_decimal, and String). Each Item subclass will need to handle arguments (subtrees) of all supported types, and be able to deliver result in all supported types. Currently, each Item subclass need to support four methods val_int(), val_real(), val_decimal(), and val_str(). If a new type (e.g., DATETIME) were to be supported in Item arithmetics, a new method would have to be added to each class (either directly or through inheritance.) Goals ===== The purpose of this worklog is to introduce a Value object that can represent a value of any type used in SQL expressions. This task has the following primary goals: 1) Have a centralized API to access and convert a single value (data) of SQL expressions. Today, logic for converting between types are contained in val_xxx methods of Item class hierarchy. In addition to making it difficult to add new types, it also lead to a lot of inconsistencies where conversion between types varies depending on context. 2) Have a centralized API for metadata operations. Today, when we evaluate properties of the expression, rules to calculate metadata are scattered across the Item class hierarchy. The goal is to move all functionality responsible for evaluation of result metadata to a central location, separated from the Item hierarchy. 3) Simplify addition of new expression result types to the Item hierarchy. The goal of this task is to have a single place to extend when adding support for a new expression result type, by separating this functionality from the Item class hierarchy. In addition to the above primary goals, there are some secondary quality goals: 4) No introduction of unnecessary copying and conversions of data in the execution layer. 5) All changes to functional behavior must documented. This task is also relevant for a few long-term MySQL goals that are beyond the scope of this work. However, we should keep in mind that our design should not make these goals more difficult to achieve: 6) Avoiding unnecessary copying of data when passing values to/from the storage engine. 7) Make the execution tree reusable from different connections. Field Class Hierarchy ===================== The goals discussed for the Item class hierarchy above, will also be applied for the Field class hierarchy, which has similar issues with respect to duplication of code for conversion between types. Also, since type metadata for expressions are often derived from field metadata, it makes sense to also let a centralized API for metadata operations handle the Field hierarchy. Value Registers =============== WL#4760 describes a design based on immutable Value objects that where returned by value (e.g., "Value Item::value()"). In order to reduce the cost copying a Value object, it only contained references to larger values like string values. This created issues with respect to ownership of data that could not be easily solved (see WL#4760 for more details). An alternative to pass around immutable values, is to pass around a reference to a storage location for a piece of data, i.e. similar to the concept of a computer register. For example, Item could in order to evaluate an expression provide a method "Item::eval(Register*)", and store the result of the evaluation in the register provided by the caller. This approach has some advantages: - It is clear that the caller has ownership to the data. - In most cases, the caller will know in what form it wants the result. Through metadata associated with the passed in register, the caller can specify the desired type etc. for the result. This may avoid materializing intermediate results in other forms during the evaluation. - Registers can be reused for different values. Hence, it does not require allocation of a new object for each value that is produced. - The caller may pre-allocate buffers to be used for the result and avoid dynamic allocation of memory during the evaluation. This task will explore the use of Value registers.
Requirements ============ A walk-through of some examples of SQL statements revealed the following set of requirements. (Ref. https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=29772) It should be possible to: Req 1) Construct a value for a parser constant - avoiding extra copying, if possible. E.g., "SELECT "long text"; Req 2) Store an instance of class Value in the protocol, avoiding extra copying and unnecessary switch (type), if possible. (Under "protocol" here we meant the client-server protocol, and class Protocol in particular.) Req 3) Get a value object from the storage engine, and later on store it in the protocol. Req 4) Evaluate the result metadata and store it in the protocol (send it to the client) *before evaluating a value*. Req 5) Know the resulting type of the expression before evaluating the expression. The logic that is responsible for that operation needs to be moved to a central location. Req 6) Have a central location for type data conversion functionality. This functionality should be accessed through an API. Req 7) Have a central location for the implementation of operations. Since we would like to simplify addition of new expression result types, implementation of e.g. the operation "+" must be moved to a central location. (Other similar operations are: +, -, /, *, mod, div, <, >, <=, >=, <>, |, ||, &, &&, ^, etc.) Req 8) To change expression result metadata between executions of a statement. Expression result metadata is constant during a single execution, but may change between executions since parameter types as in "SELECT ?, ?;" may change from execution to execution. Note: This requirement affects where we can store expression metadata and where we allocate memory for it. We would like to avoid storing result metadata in the Item objects, since one of our (non-primary) goals is to make the execution tree reusable from different connections, and Sqltype_metadata will be specific to each connection. We would also like to avoid storing metadata in the Value object, since we may want to create and manipulate with instances of metadata objects prior to evaluation of the expression, i.e., perhaps, before we have instances of class Value. Req 9) Convert a Value to int, char, string, etc. There are times when we will need to convert a value into a builtin C++-type: 1) We need to convert the result of an expression to TRUE/FALSE in WHERE and HAVING clauses. NULL is false. 2) We need to convert the result an expression to unsigned longlong in LIMIT clause. 3) We need to be able to store the result of an expression in the Protocol, therefore Value needs to publish a send(Protocol *) method, or an analogue. 4) Today, we use val_str(), val_int() and other val_* methods of class Field currently in simplistic engines, such as CSV or ARCHIVE, to retrieve the data from table->record. SUBTASKS ======== 1) Centralized API for metadata operations (WL#5081) (Requirements 4 and 5 above) 2) Value register implementation (WL#5082) (Req. 1, 6, and 9) 3) Introduce Value registers in Field classes (WL# TBD) (Req. 3) 4) Introduce Value registers in Item classes (WL# TBD) (Req. 8) 5) Introduce Value registers in Protocol (WL# TBD) (Req. 2) 6) Centralized API for operations on Values (WL# TBD) (Req. 7) More subtasks to come ...
Copyright (c) 2000, 2016, Oracle Corporation and/or its affiliates. All rights reserved.