WL#10358: Implement table value constructors: VALUES

Affects: Server-8.0   —   Status: Complete

At least two of major databases support the standard syntax for table value constructors (TVC): DB2 and MS SQL.

Example 1:

 INSERT ... VALUES (1, 2), (3, 4);

Example 2:

 SELECT ... FROM (VALUES (1, 2), (3, 4), (SELECT x, y FROM t1)) AS t2(a, b);

While we have a partial support for the same in the INSERT statement, we have to emulate TVC in other contexts with unions:

Updated example 2:

 SELECT ... FROM (      SELECT 1 AS a, 2 AS b
                  UNION SELECT 3, 4
                  UNION SELECT x, y FROM t1) t;

It would be nice to implement the standard TVC syntax shortcut to use instead of awkward emulations with unions.

Example 3:

 VALUES(1,'one'),(2,'two');

The above is equivalent to a UNION in the current syntax:

 SELECT 1, 'one'
 UNION
 SELECT 2, 'two';

See also: BUG#77639 for this feature request.

Note: The current MySQL server has a conflicting non-standard feature: the VALUES() function, see WL#6312. That function is a syntactical showstopper for the current WL.

Workaround: the standard also accepts a verbose form of table value constructors:

 VALUES ROW(1, 2),...

This syntax form should not conflict with the existent non-standard VALUES() "function", so we can implement "VALUES ROW(...) ..." in parallel with WL#6312.

F-1
Allow <table value constructor> clause inside a <simple table> clause, similar to how a <query specification> clause is allowed.
F-2
A <table value constructor> consists of one or more <row value constructor> clauses. Each <row value constructor> clause must start with the keyword ROW, due to an existing conflict with the VALUES(<expression>) clause.
F-3
The special syntax for INSERT statements (INSERT ... VALUES ... and INSERT ... SELECT) shall remain as-is. The existing VALUES and SELECT parts of the syntax will not be replaced with a <table value constructor> clause.
INSERT ... VALUES ROW
will be accepted and will be converted into the same semantic structure as
INSERT ... VALUES
by PT_insert::make_cmd().
F-4
The number of columns in a <row value constructor> within a <table value constructor> must be between 1 and 4096. An exception:
INSERT ... VALUES ROW
allows empty ROW objects (ie an empty set of parentheses). The latter is for consistency with the existing
INSERT ... VALUES ()
syntax.
F-5
The number of <row value constructor>s in a <table value constructor> must be at least 1 and is limited upwards by available memory for preparation.
F-6
Allow <explicit table> clause as valid syntax. The syntax
"TABLE <table name>"
is equivalent to the <query specification> clause
"SELECT * FROM <table name>".
The <explicit table> clause is allowed inside a <simple table> clause and in INSERT statements where it can be used similar to a <query specification> (a SELECT clause).
F-7
A <table value constructor> without outer references and non-deterministic function references can be considered as a constant table during evaluation.
F-8
A <table value constructor> with outer references or non-deterministic function references is re-evaluated when the <query expression> that owns the <derived table> that contains the <table value constructor> starts executing. NOTE: Supporting outer references is a stretch goal.
F-9
The reserved word NULL is accepted as a value in a ROW object. The data type of this specific NULL value is MYSQL_TYPE_NULL when used standalone and BINARY(0) when used as a source for a CREATE TABLE statement.
F-10
The columns of a <table value constructor> are named "column_0", "column_1", etc. According to the SQL standard, naming of such columns is implementation dependent. These column names are propagated as column to query expressions, and may also be used to name columns in CREATE TABLE statements when such query expressions are used as source.
F-11
The reserved word DEFAULT is accepted as a value in a ROW object, but only when a is used as a source in an INSERT statement. The behaviour of DEFAULT should mirror that of the existing INSERT .. VALUES() syntax.
NF-1
There shall be no general performance degradations by this worklog.
NF-2
There are no specific performance requirements for statements containing a <table value constructor> clause.

Contents


Parsing

Parsing the new syntax should be straightforward.

The <explicit table> syntax simply generates a SELECT_LEX object with empty WHERE clause, HAVING clause, GROUP BY clause, ORDER BY clause, WINDOW clause, LIMIT clause, and with the specific table as the single source table. For this clause, resolving, optimization and execution is unchanged.

The <table value constructor> syntax generates a SELECT_LEX object with the row values list attached to it. A new parse tree node, PT_table_value_constructor, is added. PT_table_value_constructor is a subclass of PT_query_primary, and additionally contains a row_value_list of the parsed row values list.

Parsing of the old INSERT .. VALUES remains as-is. INSERT .. VALUES ROW is parsed as <table value constructor>, but is converted into the same semantic structure as INSERT .. VALUES in PT_insert::make_cmd().

Resolving

A new resolver function, SELECT_LEX::prepare_values(), is implemented to resolve the <table value constructor> SELECT_LEX object. All Item objects within the table value constructor are resolved as normal with Item::fix_fields.

Resolved <table value constructor> objects that contain more than one row are handled by a new item, Item_values_column. The intention is threefold:

 - The type of a column must be aggregated across all rows, much like what is
   already done for UNION. Item_values_column is a subclass of Item_type_holder
   and combines types with Item_type_holder::join_types().
 - Properties regarding outer references, non-deterministic functions and
   const-for-execution items are collected. Item_values_column::used_tables()
   returns the contents of a new member variable, m_aggregated_used_tables,
   which is summed during resolving with
   Item_values_column::join_used_tables(), similarly to
   Item_type_holder::join_types().
 - A reference to the currently output contents of a column during iteration is
   contained in m_value_ref. This is explained further in the section on
   execution.

Resolving of m_having_cond is duplicated verbatim from SELECT_LEX::prepare(). PREPARE statements containing IN subquery predicate statements may rewrite the SELECT_LEX of the <table value construtor> to use HAVING, requiring it to be resolved on the second preparation.

Optimization

It is currently not known whether any optimization must take place for a table value constructor.

It may be reasonable to treat the table value constructor as a materialized derived table and build an index over it. Doing this may enable more efficient join strategies. This can be more useful when the table value constructor comprises a fairly large table.

Execution

A new iterator, TableValueConstructorIterator, is implemented for scanning the rows of <table value constructor>. The iterator is passed the row values list, as well as the the field list of the JOIN object of the SELECT_LEX that contains the TVC.

The iterator must output multiple rows without being materialized, and does not scan any tables. An indirection is implemented with Item_values_column, as the executor outputs what is contained in join->fields (either directly, or indirectly through ConvertItemsToCopy).

join->fields is thus filled with the Item_values_column objects that are created during SELECT_LEX::prepare_values. TableValueConstructorIterator::Read() will fetch the next row by changing the contents of each Item_values_column::m_value_ref to the next row contained in its row values list.