In addition to real columns, which are physically stored, allow virtual columns which are computed when referenced. For example, if you have FirstName and SecondName columns, you may add a computed column FullName as CONCAT(FirstName," ",SecondName). According to Peter Zaitsev: "Some customers asked about this feature, and it is rather useful generally allowing more simple syntax than VIEWS in some cases."
There's no immediate need to change the original title
"Computed virtual columns as MS [i.e. Microsoft]
SQL server has". But we might choose the term
"generated column" rather than "computed column",
and such columns are not necessarily virtual,
and the recommendations in this document assume we will
choose standard SQL as a model, rather than SQL Server.
Think of the Holy Roman Empire, which wasn't holy,
wasn't Roman, and wasn't an empire. Titles aren't specifications.
SQL Server
----------
SQL Server 2008 allows:
<computed_column_definition> ::=
column_name AS computed_column_expression
[ PERSISTED [ NOT NULL ] ]
where PERSISTED means "store calculation results when inserting
or updating".
Microsoft's example:
"
CREATE TABLE mytable
( low int, high int, myavg AS (low + high)/2 ) ;
"
There are some restrictions and additional clauses, see
SQL Server 2008 "CREATE TABLE" syntax:
http://msdn.microsoft.com/en-us/library/ms174979.aspx
Oracle
------
Oracle 11g allows:
column [datatype] [GENERATED ALWAYS] AS (column_expression)
[VIRTUAL]
[ inline_constraint [inline_constraint]... ]
For details see Oracle 11g CREATE TABLE statement:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_7002.htm
DB2
---
DB2 9.1 allows:
GENERATED ALWAYS AS (generation-expression)
See DB2 UDB 9.1 CREATE TABLE statement:
http://www.informatik.uni-bonn.de/~tb/Lehre/ws01/vRDBMS/db2s0/frame3.htm#sqls0621
Standard SQL
------------
SQL:2008 non-core (optional) feature T175 "Generated columns" allows:
GENERATED ALWAYS AS (generation expression)
For example:
CREATE TABLE t (a INT, b INT, c INT GENERATED ALWAYS AS (a + b));
Here the value of the generated column c is derived from a and b.
GENERATED and ALWAYS are non-reserved words. It's interesting that if we
made GENERATED ALWAYS into [GENERATED ALWAYS], that is, if we made the
words optional, as Oracle does, then we'd still be able to parse correctly.
Since Oracle and DB2 are both following the standard (with slight variations
as usual), and since the standard specification is clearer than vendor manuals,
this specification will follow the standard specification wherever practical.
Terminology
-----------
Consider the statement:
CREATE TABLE t (Column_1 CHAR(5) GENERATED ALWAYS AS (RAND()+5), Column_2 CHAR(5));
Column_1 is a "generated column". A column which is defined with a
GENERATED ALWAYS clause is a generated column.
Column_2 is a "base column". Any column which is not a generated column is a
base column.
The clause (RAND()+5), including the parentheses, is a "generation expression".
The clause GENERATED ALWAYS AS (RAND()+5) is a "generation clause".
A "virtual column" is a column which is not stored, it must be calculated
whenever the row is accessed.
VIRTUAL
-------
The SQL standard does not have a way to specify whether a generated column
is virtual or not. This is common; the standard often ignores storage
considerations. We'd be better off if we could distinguish, since there are
some advantages to generated not-virtual columns (we can guarantee that
you'll get the same value twice even if generation expressions are
non-deterministic), and some advantages to generated virtual columns
(they require little or no storage space).
So we want to add a non-standard bit to the syntax to say "it's [not] virtual".
The Oracle syntax "[GENERATED ALWAYS] AS (expression) VIRTUAL" would be
dangerous, because in Oracle it doesn't mean anything if you leave out
the word VIRTUAL, it still will be virtual.
We could try "GENERATED VIRTUALLY AS (expression)" or some such thing,
but that appears to deviate too much from the standard.
The SQL Server syntax "AS expression PERSISTED" has an unEnglish air.
So let's say that the MySQL syntax for a generation clause is
GENERATED ALWAYS AS (generation expression) [[NOT] VIRTUAL]
and VIRTUAL is the default. That is, you have to say NOT VIRTUAL explicitly
if you want the generated values to actually be stored in the database.
The functionality is deliverable with triggers, so expect gripes about
redundancy.
Column definition details
-------------------------
In standard SQL, a column definition looks like (omitting irrelevancies):
<column definition> ::=
<column name>
[ <data type> ]
[ <default clause> | <identity column specification> | <generation clause> ]
[ <column constraint definition>... ]
[ <collate clause> ]
In MySQL, a column definition currently looks like this:
column_definition:
data_type [NOT NULL | NULL] [DEFAULT default_value]
[AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
[COMMENT 'string'] [reference_definition]
If we follow the standard rules for analogous MySQL clauses, then:
The data type is optional. If it's specified, then the generation expression
must have a result that's assignable to the column, but that's okay, MySQL
is very loose about what's assignable. If data type is not specified,
then the generation expression result decides what the data type is.
For example:
CREATE TABLE t (s1 GENERATED ALWAYS AS (5)); /* data type is INTEGER */
CREATE TABLE t (s1 BIGINT GENERATED ALWAYS AS (5)); /* conversion happens */
CREATE TABLE t (s1 TINYINT GENERATED ALWAYS AS (555555)); /* uh-oh. */
In the last example, we must be able to assign, which means we must always
be able to truncate, so we cannot pay attention to sql_mode strictness.
The default clause and identity column specification (in MySQL the analogue
is auto_increment) cannot be specified along with generation clause.
These are examples of syntax errors:
CREATE TABLE t (s1 INT AUTO_INCREMENT GENERATED ALWAYS AS (5), KEY(s1));
CREATE TABLE t (s1 INT GENERATED ALWAYS AS (5) DEFAULT 5);
The constraint clauses should follow the generation clause. But we can't move
"NOT NULL | NULL" or "reference definition" (which are both constraints)
from their established places. It appears then that the best position for
a generation clause is after data type, and before NULL | NOT NULL.
The collate clause, in MySQL, must be associated with a data type clause.
So, in MySQL, a column definition in future, if and only if it includes
a generation clause, looks like this:
column_definition:
[data_type]
GENERATED ALWAYS AS (value expression) [[NOT] VIRTUAL]
[NOT NULL | NULL]
[UNIQUE [KEY] | [PRIMARY] KEY]
[COMMENT 'string'] [reference_definition]
This column_definition variant is legal in CREATE TABLE, in ALTER TABLE
... ADD, in ALTER TABLE ... MODIFY, and in ALTER TABLE ... CHANGE.
Indexes
-------
Recommendation: don't allow generated virtual columns in indexes.
That's properly a job for the implementor of WL#1075 "Add support for functional
indexes".
Since PRIMARY and UNIQUE and REFERENCES specifications depend on indexes,
it follows that table definitions like this are illegal:
CREATE TABLE t (s1 INT GENERATED ALWAYS AS (5) NOT NULL UNIQUE); /* illegal */
On the other hand, a generated not-virtual column does not require a
functional index, and so a column defined with "GENERATED ALWAYS AS
(generation expression) NOT VIRTUAL" may be in indexes, and in unique- or
foreign-key definitions.
Sometimes this might mean that it's impractical to ALTER a column from
NOT VIRTUAL to VIRTUAL.
What can be in the generation expression
----------------------------------------
A generation expression may contain a literal, a built-in function, a function
defined with CREATE FUNCTION, an operator, or a reference to any base column
within the same table. That's all. No subqueries. No references to any
generated column. No parameters. No variables ... well, maybe certain
variables, we'll see.
If the expression contains a function, that function should be scalar and
deterministic. That is the standard requirement, and it's reasonable,
especially where we allow indexing. However, we don't really enforce
determinism. All we do is check whether a user-defined function is defined
with DETERMINISTIC. For replication purposes, we'll need to ensure that
generation expressions that contain RAND() etc. will have a deterministic
RAND() value in the binlog.
If the expression contains a function defined with CREATE FUNCTION, then
the function, and any functions it invokes, may not reference any tables,
may not change persistent data, must be checked for appropriate privileges
at time of column access, Properly speaking the function should be created
with CREATE FUNCTION ... DETERMINISTIC NO SQL, but we won't require those keywords.
Assignments
-----------
For both INSERT and UPDATE, the standard allows assigning DEFAULT. Nothing else.
For example:
INSERT INTO t (generated_column) VALUES (DEFAULT); /* legal */
UPDATE t SET generated_column = DEFAULT; /* legal */
INSERT INTO t (generated_column, base_column) VALUES (NULL, NULL); /* illegal */
UPDATE t SET generated_column = base_column; /* illegal */
Since REPLACE works like INSERT, we also allow REPLACE ... VALUES (DEFAULT);
We will allow the following non-standard syntax:
CREATE TABLE t (s1 GENERATED ALWAYS AS (5)) AS SELECT 6;
That is, for a situation where we can't avoid an apparent assignment to
a generated column, we allow the syntax but ignore the source value.
The same consideration applies for INSERT ... SELECT.
The order of assignments has to be irrelevant. Therefore, for this situation:
CREATE TABLE t (base_column INT, generated_column BLOB GENERATED ALWAYS AS
(base_column));
INSERT INTO t VALUES (5, DEFAULT);
UPDATE t SET s2 = DEFAULT, s1 = 6;
the value of t.s2 will be 6. Hmm, there's a contradiction with what's said
elsewhere, eh?
For view-definition purposes, consider a generated column to be "updatable",
since it is possible to assign to it, although the only thing you can assign is
DEFAULT.
ALTER TABLE ... DROP EXPRESSION
-------------------------------
The standard optional feature F385 "Drop column generation expression clause"
allows:
ALTER TABLE t ALTER COLUMN generated_column DROP EXPRESSION;
This turns the generated column into a base column.
We won't implement this. It's sufficient to say
ALTER TABLE t MODIFY generated_column data_type;
and the generation clause, since it's not explicitly specified, disappears.
This sort of ALTER TABLE statement may change a column from virtual to
not-virtual, and therefore will be slow and will require more storage.
Dropping / altering / renaming objects that the generation expression depends on
--------------------------------------------------------------------------------
Before allowing the renaming/altering of a base column, it's now
necessary to check: is this column referenced in any generation expression?
If so, ALTER is illegal.
ALTER ... DROP base_column has a cascade effect, that is, it causes dropping
of any generated columns that depend on the base column.
Before allowing the dropping of a function, it's now
necessary to check: is this function, directly or indirectly, referenced
in any generation expression?
If so, DROP is illegal.
Foreign keys
------------
It is legal to use not-virtual generated columns in parent or child tables.
For example:
CREATE TABLE t1 (s1 INT GENERATED ALWAYS AS (5), PRIMARY KEY (s1));
CREATE TABLE t2 (s1 INT GENERATED ALWAYS AS (5) REFERENCES t1 (s1));
Since foreign-key checking needs indexes, and since we don't allow indexes
for not-virtual columns, this works only for not-virtual generated columns.
A generated column may not be in a foreign-key reference that includes
ON UPDATE CASCADE, or ON UPDATE|DELETE SET NULL, or ON UPDATE|DELETE SET DEFAULT.
Probably we won't support foreign keys with generated columns until we have
WL#148 Foreign keys: Implement Foreign Keys (all storage engines).
Triggers
--------
A trigger may not refer to NEW.column_name or OLD.column_name if column_name
is a generated column. This may not be what other DBMSs do.
Partitions
----------
Recommendation: allow partitioning by generated columns, for example:
CREATE TABLE t (s1 INT, s2 INT GENERATED ALWAYS AS (ABS(s1))
PARTITION BY LIST (s2) (PARTITION p1 VALUES IN (1);
This cannot be used to work around any partitioning limitation.
For example, the MySQL Reference Manual has a section
"Partitioning Limitations Relating to Functions"
http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-functions.html
which does not say you can use UPPER. Therefore this is illegal:
CREATE TABLE t (s1 CHAR(1), s2 CHAR(1) GENERATED ALWAYS AS (UPPER(s1)))
PARTITION BY KEY(s2);
It might be objected that a generated not-virtual column does not have to be
subjected to the partitioning limitations relating to functions.
When is the generation expression evaluated?
--------------------------------------------
A generation expression is evaluated immediately after a BEFORE trigger is
evaluated. Therefore it is evaluated before a constraint check. For example:
CREATE TABLE t (s1 INT,
s2 GENERATED ALWAYS AS (WHEN s1 IS NULL THEN 5 ELSE s1 END));
CREATE TRIGGER t_bu BEFORE UPDATE ON t FOR EACH ROW SET s1 = NULL;
CREATE VIEW v AS SELECT s1 FROM t WHERE s2 = 5 WITH CHECK OPTION;
INSERT INTO v VALUES (NULL); /* results left as an exercise for the reader */
There is no evaluation at CREATE/ALTER time. Thus some niladic functions,
such as CURRENT_USER, will always depend on an evaluation at the time we
access the row. This is somewhat different from the way we handle
CURRENT_USER for views.
Query rewrite
-------------
Suppose we create a table thus:
CREATE TABLE t (s1 INT, s2 GENERATED ALWAYS AS (s1+5));
We expect that like "SELECT s2 ..." will have exactly the same effect as
"SELECT (s1+5) ...". Since s2 is not stored, the query may be rewritten
with the simple rule: replace column-name with (generation-expression).
However, this should only be exposed when necessary.
For example, after
CREATE VIEW v AS SELECT s2 FROM t;
SHOW CREATE VIEW v;
The user will see "s2", not "s1+5", for the view definition.
CREATE TABLE ... LIKE
---------------------
The standard allows:
CREATE TABLE t1 LIKE t2 { INCLUDING | EXCLUDING } GENERATED;
The default is EXCLUDING GENERATED.
Generated columns are only copied if one specifies INCLUDING GENERATED.
INFORMATION_SCHEMA
------------------
INFORMATION_SCHEMA.COLUMNS has two new columns:
IS_GENERATED VARCHAR 'NEVER' or 'ALWAYS'
GENERATION_EXPRESSION VARCHAR NULL or generation expression
We also need a way to indicate our non-standard VIRTUAL usage.
There is no proposal here to change DESCRIBE or SHOW FIELDS.
Timestamp
---------
Ordinarily, with many caveats and special cases, the default value for
a MySQL TIMESTAMP column may be the current date and time, as if we'd said
"column_name TIMESTAMP GENERATED ALWAYS AS (CURRENT_TIMESTAMP)". Since
we long ago passed the maximum number of rules that mortals can keep
in their heads about timestamps, we'll just disallow all generation
expressions other than (CURRENT_TIMESTAMP).
Potential Benefits
------------------
A specification is not the place to tout or propagandize.
Maximally, one can point out that, although one can get usually get
the same benefits or more by using views, a generated column is in
a base table, and therefore: it can be used for partitions and indexes
if it's not-virtual, it can be be defined without needing CREATE VIEW
privileges, and maybe someday for some storage engines you can get
statistics that could be useful with optimizing.
References
----------
dev-private thread "Re: WL#411 Computed virtual columns as MS SQL server has"
[ mysql intranet ] /secure/mailarchive/mail.php?folder=4&mail=25144
Contribution "MySQL virtual columns preview":
http://forge.mysql.com/wiki/MySQL_virtual_columns_preview
BUG#46491 Patch: Virtual columns (WL#411)
http://bugs.mysql.com/bug.php?id=46491
Despite the title, this feature request refers to the Contribution
"MySQL virtual columns preview", which does not follow WL#411.
