WL#2866: Materialized Views (reduced support)

Affects: Server-7.1   —   Status: Assigned

In September 2005, at a meeting in Chicago, there 
was sentiment for materialized views, Oracle-like, 
but without refreshing. This preliminary description 
is taken from an old worklog task, WL#941 "Views". 
 
Feature: Permanent Materialized Views  
-------------------------------------  
  
This feature goes by different names: AST or MQT in DB2, Indexed Views  
in SQL Server, and Materialized Views in Oracle (an older Oracle  
incarnation was "Snapshots"). The idea is: a view (usually a grouped  
view or join view) can be materialized permanently, so that  
queries of summary data will be quick. In other words, the "temporary  
table" that we produce with the temporary-table algorithm isn't a  
temporary table, it's a permanent table.  
  
It's hard to maintain the summary if anyone changes the base table.  
I am proposing: we would not "refresh" permanent materialized views      
frequently -- instead, we would allow them to become stale. When the               
server restarts, the permanent materialized views empty. When a      
permanent materialized view is queried for the first time, it is                
populated.      
               
The above proposal is unsophisticated. Our competitors allow periodic      
refresh, and can update the permanent materialized view by checking  
log entries. At one time I believed that there would be a standard,
but "CREATE MATERIALIZED VIEW" isn't in SQL:2008.
      
A comment from a major customer is: "Usually database server restarts
never or restarts once within x months. Therefore it is of no use to
wait for the next restart for refreshing the view. On the other hand:          
the first application preparing the view will not like      
to wait for its preparation." At a meeting with that customer,        
the decisions were:      
(a) this is not relevant for the main product, and not        
    needed in the data warehouse environment      
(b) "it is a permanent temporary table that is crontabbed",        
(c) "this is not really a view",      
(d) Permanent Materialized Views should be materialized      
   (persistent) forever,      
(e) There should be options for resynching, as part of the  
    VIEW definition.
Support creation of materialized views, with only the bare 
minimum -- no automatic refresh, no query rewrite.

Syntax, stated simply
---------------------

CREATE [IGNORE] MATERIALIZED VIEW 
 materialized_view_name 
 [(column_list)]
 [table_options]
 [partition_options]
 AS select_statement ;

Example: 
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t1; 
The effect is like "CREATE TABLE mv AS SELECT * FROM t1;", 
except that the new object 'mv' is considered to be a view 
(although it's really more like a base table). 

Syntax, with detail notes
-------------------------

CREATE
                                   /* Note 1 */
 [IGNORE]                          /* Note 2 */
 MATERIALIZED VIEW                 /* Note 3 */
 materialized_view_name            /* Note 4 */
 [(column_list)]                   /* Note 5 */
 [table_options]                   /* Note 6 */
 [partition_options]               /* Note 7 */
 AS                                /* Note 9 */
 select_statement ;                /* Note 10 */

Note 1: The syntax in this area differs from CREATE VIEW:
no "CREATE OR REPLACE MATERIALIZED VIEW ..." (see also WL#3129)
no "CREATE ALGORITHM = temptable MATERIALIZED VIEW ..."
no "CREATE MATERIALIZED VIEW ... AS SELECT ... WITH CHECK OPTION"
no "CREATE DEFINER=current_user MATERIALIZED VIEW ..."
no "CREATE SQL SECURITY DEFINER MATERIALIZED VIEW ...".
We also disallow options that are possible with CREATE TABLE:
no "CREATE TEMPORARY MATERIALIZED VIEW ..."
no "CREATE MATERIALIZED VIEW IF NOT EXISTS ..."
(but see also WL#3129 "Consistent clauses in CREATE and DROP").

Note 2: See the later section "The IGNORE Clause".

Note 3: MATERIALIZED (with a Z) is the only legal spelling.
MATERIALIZED is not a new reserved word.

Note 4: materialized_view_name is the name of a [viewed]
table so it follows the same rules. It can be qualified.
Its maximum length is 64 Unicode characters. It's in the
same namespace as base tables and views, so if there's a
base table named 'x', you can't create a materialized view
named 'x'. (This is also true in Oracle.)

Note 5: the "column_list" is a comma-delimited list of
column names, comparable to "CREATE VIEW ... (column_list)".
But we allow these MySQL-specific additional clauses
which are comparable to clauses in CREATE TABLE:
* COMMENT 'string'
* [COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}]
* [STORAGE {DISK|MEMORY}]

Note 6: The "[table_options]" clause is like the "[table_options]" 
clause in CREATE TABLE. See the description of CREATE TABLE: 
http://dev.mysql.com/doc/mysql/en/create-table.html
Originally the thought was that [table_options] would come
after "AS SELECT ...", but that might be harder to parse, and
people are used to "CREATE TABLE ... [table_options] AS SELECT ...".
Certainly the table_option "ENGINE=" makes sense, for example
"CREATE MATERIALIZED VIEW v (column1) ENGINE=FALCON AS SELECT 5;".
Other sensible table options might be
AVG_ROW_LENGTH, CHECKSUM, COMMENT, DATA DIRECTORY, DELAY_KEY_WRITE,
INDEX DIRECTORY, MAX_ROWS, MIN_ROWS, PACK_KEYS, ROW_FORMAT, TABLESPACE.
But the implementor may choose to ignore any or all table options.
 
Note 7: The [partition_options] clause is accepted.
This may introduce a form of constraint, causing
either CREATE MATERIALIZED VIEW or REFRESH TABLE
to fail.

Note 8: It's "AS SELECT" not "[AS] SELECT". AS is not optional.

Note 9: The select_statement rules are the same as for
"CREATE VIEW ... AS SELECT". Since this is like an
ALGORITHM=TEMPTABLE view, there are no restrictions.
Oracle would require a selection including a primary key.

Example showing every legal clause:

CREATE IGNORE MATERIALIZED VIEW test.`Materialized View`
(column1, column2) ENGINE=FALCON AS SELECT * FROM base_table;
 
Privileges 
---------- 
 
Required privileges: same as for views. See the manual description: 
http://dev.mysql.com/doc/mysql/en/create-view.html 
 
(This differs from Oracle. Oracle requires two privileges: 
CREATE [ANY] MATERIALIZED VIEW and CREATE [ANY] TABLE.) 

TODO note: Peter will look into 'definer' semantics.
The privilege required for REFRESH probably must
be more than SELECT on materialized_view_name, it
should be SELECT on underlying objects unless
DEFINER is user.
 
Other statements that may refer to materialized views directly 
-------------------------------------------------------------- 

These statements will work: 
 
CREATE INDEX index_name ON materialized_view_name ...; 
 
DROP INDEX index_name ON materialized_view_name ...; 
 
DROP VIEW materialized_view_name; 
(This differs from Oracle. Oracle has a statement 
"DROP MATERIALIZED VIEW".) 
 
GRANT SELECT ON materialized_view_name ...; 
 
REVOKE SELECT ON materialized_view_name ...; 

SELECT ... FROM materialized_view_name; 
(All forms of SELECT will work. The materialized view 
looks just like any 'base table', internally.) 

CREATE VIEW ... AS SELECT ... FROM materialized_view_name;

LOCK TABLES ... materialized_view_name ...;
(LOCK TABLE materialized_view_name READ prelocks only the
materialized_view_name. LOCK TABLE materialized_view_name
WRITE prelocks all the tables involved in the view definition.)

These statements will not work:
CREATE TABLE (... FOREIGN KEY ... REFERENCES materialized_view_name ...)
HANDLER materialized_view_name OPEN ... 
RENAME TABLE materialized_view_name ...
But these statements will work:
ALTER TABLE materialized_view_name ... (for
partition changes and renaming),
ANALYZE, CHECK TABLE, CHECKSUM, OPTIMIZE, REPAIR.
 
And all forms of data-change statement will not work! 
INSERT INTO materialized_view_name ... -- no! 
UPDATE materialized_view_name ... -- no! 
DELETE FROM materialized_view_name ... -- no! 
LOAD DATA ... materialized_view_name ... -- no! 
TRUNCATE materialized_view_name ... no! 
REPLACE materialized_view_name ... no! 

It is illegal to refer to a materialized view in any 
data-change statement, directly or indirectly 
(that is, you can't refer to a materialized view in a function 
or stored procedure or trigger which an update/insert/delete 
uses). The intent of this restriction is to avoid replication. 
Also, data-change effects would be cancelled during refresh. 

So this is a difference from DB2, which allows updatable
materialized views, or in DB2 dialect "user-maintainable MQTs".

Refresh 
------- 
 
We allow materialized views to get out of date (preferred 
word: "stale"). It is possible to bring a single one back 
up to date with: 

REFRESH [IGNORE] [MATERIALIZED] VIEW materialized_view_name; 
 
Privileges required: SELECT on materialized_view_name. 
 
Example: 
REFRESH VIEW mv; 
The effect is like "DELETE FROM mv;" followed by 
"INSERT INTO mv SELECT view_select_clause". Thus 
the refreshing is always complete. There are no 
tricks for keeping a materialized view up to date 
by changing it when the underlying tables change. 
 
If materialized_view_2 is based on materialized_view_1, 
then "REFRESH MATERIALIZED VIEW materialized_view_2" does not guarantee 
that materialized_view_2 is up to date, because 
materialized_view_1 might also be out of date. 
 
Refreshing can happen according to a schedule, if you 
use CREATE EVENT.
 
Oracle has no REFRESH statement. Oracle has clauses in the 
CREATE MATERIALIZED VIEW statement for refreshing, 
including scheduled refreshes or "on commit" refreshes.
DB2 has a REFRESH TABLE statement but we decided not to copy DB2
http://publib.boulder.ibm.com/infocenter/db2luw/v8//index.jsp?topic=/com.ibm.db2.udb.doc/core/c0009318.htm

REFRESH is not a "data manipulation statement".

The IGNORE Clause
-----------------

It is possible to cause an error due to partitioning.
For example:
CREATE TABLE t (s1 INT);
INSERT INTO t VALUES (1),(2),(3);
CREATE MATERIALIZED VIEW mv
  PARTITION BY LIST (s1) (PARTITION P1 VALUES IN (1,3))
  AS SELECT * FROM t;
The result is:
ERROR 1526 (HY000): Table has no partition for value 2
But we have decided that partition errors can be
ignorable, see WL#4103 "Define IGNORE". So, for both
CREATE MATERIALIZED VIEW and REFRESH, we allow IGNORE.
The effect would be: the row is skipped, there is no error.

Since there is a syntax "ALTER IGNORE TABLE" (with the
word IGNORE preceding the word TABLE), the syntax is
CREATE IGNORE MATERIALIZED VIEW and REFRESH IGNORE MATERIALIZED VIEW
rather than
CREATE MATERIALIZED VIEW IGNORE and REFRESH MATERIALIZED VIEW IGNORE.

If one says CREATE IGNORE, then that is a characteristic
of the table which affects subsequent statements.
The metadata will show that the view was created with IGNORE.
REFRESH TABLE will have the same effect as REFRESH IGNORE TABLE.

ALTER [IGNORE] TABLE materialized_view_name is also legal.

Query Rewrite 
------------- 

We won't do query rewrite.

Although "query rewrite" is the more common term, MySQL
will use "materialized view rewrite", due to majority vote
during the March 2008 Stockholm meeting (in favour Timour
and Konstantin and Mikael, against Igor and Peter).
This will have significance for later worklog tasks.

Metadata 
-------- 
 
We will treat materialized views like views, therefore 
SHOW CREATE VIEW and SHOW CREATE TABLE will show materialized views, 
SHOW [OPEN] TABLES will show materialized views, 
SELECT * FROM INFORMATION_SCHEMA.VIEWS will show materialized views. 

In INFORMATION_SCHEMA.TABLES, TABLE_TYPE = 'MATERIALIZED VIEW'.
And INFORMATION_SCHEMA.TABLES.UPDATE_TIME = time of last
refresh, since no other form of update is possible.

In INFORMATION_SCHEMA.VIEWS, the CHECK_OPTION value is always 
'NONE' and IS_UPDATABLE is always 'NO' because materialized 
views are not updatable. We will want three more columns: 
* VIEW_TYPE CHAR(12).
Possible values: 'MATERIALIZED' or 'ORDINARY' 
* LAST_REFRESH_TIME TIMESTAMP.
Possible values: the timestamp of the CREATE 
MATERIALIZED VIEW statement, or the timestamp of the 
last REFRESH statement. Perhaps this isn't necessary, 
since you can say "CREATE MATERIALIZED VIEW ... AS 
SELECT CURRENT_TIMESTAMP ..." if you want.
* IGNORABLE CHAR(3).
Possible values: 'YES' or 'NO'.

There is no way to find out whether the materialized view 
is up to date, because we don't have triggers or timestamps 
of-last-change for every database object and variable that 
the materialized view might refer to.

This differs from Oracle, which would have separate metadata
tables for materialized versus ordinary views, for example
ALL_MVIEWS.
http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/statviews_1140.htm#sthref1243

Backup and Replication
----------------------

Since you can't update a materialized view:
* There will be no effect on replication.
  REFRESH MATERIALIZED VIEW won't be replicated.
* BACKUP DATABASE won't backup materialized view contents.

It is true that materialized view contents will be
different on different machines, because the SELECT
is not deterministic. But that happens with any view.

It is true that in the Oracle world one of the uses of
materialized views is selective replication. But that
is not our interest for this worklog item.

Examples
--------

CREATE TABLE t (s1 INT);
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (2);
INSERT INTO t VALUES (3);
CREATE MATERIALIZED VIEW mv1 AS SELECT * FROM t;
CREATE VIEW v (view_column_1) AS SELECT COUNT(*) FROM t;
CREATE IGNORE MATERIALIZED VIEW mv4 AS SELECT * FROM t,v;
CREATE MATERIALIZED VIEW mv5
  (column1,
   column2 COMMENT 'comment',
   column3 FORMAT DEFAULT,
   column4 STORAGE DISK)
  ENGINE=MEMORY
  PARTITION BY KEY(column1)
  AS
  SELECT 1,2,3,4,COUNT(view_column_1)
  FROM v,information_schema.tables
  GROUP BY column1, column2;
ALTER MATERIALIZED VIEW mv5 REMOVE PARTITIONING;
CREATE INDEX i ON mv5 (column2);
DROP INDEX i ON mv5;
GRANT SELECT on mv5 TO joe@localhost;
REVOKE SELECT ON mv5 FROM joe@localhost;
ANALYZE TABLE mv5;
CHECK TABLE mv5;
CHECKSUM TABLE mv5;
REPAIR TABLE mv5;
INSERT INTO t VALUES (4);
/*
  Now materialized views may be out of date,
  because they refer ultimately to table t,
  and table t has a new row. So refresh them.
  There are 5 separate statements because we
  didn't allow "REFRESH TABLE table [,table...];".
*/
REFRESH MATERIALIZED VIEW mv1;
REFRESH MATERIALIZED VIEW mv2;
REFRESH MATERIALIZED VIEW mv3;
REFRESH MATERIALIZED VIEW mv4;
REFRESH MATERIALIZED VIEW mv5;
SELECT * FROM INFORMATION_SCHEMA.VIEWS WHERE table_name = 'mv5';
SELECT * FROM mv5;
DROP VIEW mv5;

References 
---------- 

Since the request was for "materialized views" we'll follow 
Oracle for most things, but some study of the DB2 and SQL Server 
features could be profitable. 
 
"CREATE MATERIALIZED VIEW" (Oracle11g)
http://download-uk.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_6002.htm#i2063793

"Materialized Query Tables" (DB2 v8.2) 
http://publib.boulder.ibm.com/infocenter/db2help/index.jsp?topic=/com.ibm.db2.udb.doc/core/c0009318.htm

"Improving Performance with SQL Server 2005 Indexed Views" 
http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx 

"Refactoring MySQL to Support Materialized Views"
http://www.cs.uwaterloo.ca/~hsmit/reports/cs746_report.pdf