WL#3701: Updatable UNION ALL Views

Affects: Server-7.0 — Status: Un-Assigned

Description
High Level Architecture

After "CREATE VIEW v AS SELECT ... UNION ALL SELECT ...",
v may be an updatable view.

An updatable view is a view which can be the subject of
a data-change statement such as UPDATE or DELETE.
Some updatable views are also "insertable views" or
"insertable-into views", which can be the subject of
statements such as INSERT, and possibly REPLACE.
Restrictions exist and are described in WL#941 "Views".

Some UNION ALL views are theoretically updatable, and
can be updated in DB2 and SQL Server (not Oracle 11g).
(Ordinary UNION views without UNION ALL have implied
DISTINCT clauses so are by definition non-updatable.)  

The minimal feature
-------------------

The minimal feature would be like MySQL's MRG_MYISAM.

Example #1

CREATE TABLE Part1 (emp_id INT, emp_name CHAR(10), emp_joined DATE);
CREATE TABLE Part2 (emp_id INT, emp_name CHAR(10), emp_joined DATE);
CREATE VIEW Union_view AS      
  SELECT emp_id, emp_name, emp_joined FROM Part1               
  UNION ALL      
  SELECT emp_id, emp_name, emp_joined FROM Part2;

The above example shows a UNION ALL view of two base tables.
SELECT from Union_view already works.
UPDATE or DELETE of Union_view could be transposed to an UPDATE
or DELETE of Part1 followed by an UPDATE of Part2.
INSERT of Union_view could be transposed to an INSERT
of either Part1 or Part2, depending on a new CREATE VIEW
clause "INSERT_METHOD = {NO | FIRST | LAST}".

We could have arbitrary restrictions such as:
* Part1 and Part2 must be base tables with the  same definition
* Part1 and Part2 must have no references to each other caused
  by triggers or foreign keys
* Union_view must have a select list which refers to all
  columns in Part1 and Part2 and no other columns and no
  other clauses

This would deliver the same functionality as MySQL's MRG_MYISAM
(MERGE) tables
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
with the advantage that the syntax is familiar and extendable.
But the underlying code would not necessarily be any better.

The regular feature
-------------------

The regular feature would be like what DB2 and SQL Server
can do.
               
Example #2  
               
CREATE TABLE Part1 (      
  emp_id INT PRIMARY KEY, emp_name CHAR(10), emp_joined DATE);              
      
CREATE TABLE Part2 (               
  emp_id INT PRIMARY KEY, emp_name CHAR(10), emp_joined DATE);      
               
CREATE VIEW Union_view AS      
  SELECT emp_id, emp_name, emp_joined FROM Part1 WHERE emp_id < 500               
  UNION ALL      
  SELECT emp_id, emp_name, emp_joined FROM Part2 WHERE emp_id >=500;           
      
In the example above, there are two tables which have the same               
definitions but different primary-key ranges -- the hallmarks      
of "the poor man's horizontal partition". There is a rule,
not visible in the CREATE statements, that Part1 and Part2 do
not overlap. Actually Part1 and Part2 should have CHECK clauses
to enforce this, but MySQL wouldn't support CHECK clauses (WL#929).  
               
Now consider what we could do with:      
INSERT INTO Union_view VALUES (391,'Joe',DATE '1999-04-02')               
      
We can see that this row must go into table Part1, rather than table                
Part2, because the emp_id value is less than 500. So the statement      
becomes:               
INSERT INTO Part1 VALUES (391,'Joe',DATE '1999-04-02');      
  
Clearly, then, Union_view is an insertable view. Now consider:      
UPDATE Union_view                
  SET emp_joined = DATE '1994-04-05' WHERE emp_id = 1234;      
               
We can see that this is an update of table Part2, because the emp_id      
value is >= 500. So the statement becomes:               
UPDATE Part2      
  SET Part2.emp_joined = DATE '1994-04-05' WHERE emp_id = 1234;               
      
What if the UPDATE statement doesn't have such a WHERE clause?               
Consider:      
UPDATE Union_view                
  SET emp_name = 'Sasha' WHERE emp_joined = '1911-11-11';      
               
This has to be done by splitting into two parts as we do for SELECTs:      
UPDATE Part1                
  SET emp_name = 'Sasha' WHERE emp_joined = '1911-11-11';      
               
UPDATE Part2      
  SET emp_name = 'Sasha' WHERE emp_joined = '1911-11-11';               
      
Worst of all, what if the UPDATE statement changes primary-key
column values?               
Consider:      
UPDATE Union_view SET emp_id = '123' WHERE emp_id = '1234';               
      
This means the row must migrate from Part2 to Part1, so this might               
work:      
INSERT INTO Part1 SELECT * FROM Part2 WHERE emp_id = '1234';     
      
UPDATE Part1 SET emp_id = '123';               
      
DELETE FROM Part2 WHERE emp_id = '1234';              
      
This is getting complex. It would be simpler to call the statement               
illegal.      
               
So we can support updatable UNION ALL views with these restrictions:      
(a) each branch of the union is for a separate table, to ease               
parallelism      
(b) all separate tables have the same definitions, including a                
primary key (Joerg Bruehe suggests that columns needn't have the
same data types, merely assignment-compatible data types)
(c) each branch of the union contains a WHERE clause with the                
primary key      
(d) the branches' WHERE clauses are mutually exclusive               
(e) an UPDATE which changes the primary key is illegal.      
               
If we include "union number" information in the matchlist of retrieved      
rows, then we can lift restriction (d).               
      
These restrictions aren't nice, but other vendors' manuals        
show that they have some restrictions too.      
(SQL Server) "It must be clear which data belongs to which partition."      
(DB2) "The underlying table or view of each branch must be in a
separate hierarchy (i.e., a view may not have multiple branches
with their underlying tables or views in the same hierarchy)."               
               
In C.J. Date's book "An Introduction To Database Systems" (Chapter 17)      
there is a discussion of a solution which would be less restrictive.               

A few years after this specification was written (as part of
WL#941 Views), MySQL introduced a partitioning feature.
Partitioning has some advantages which overlap the advantages
of UNION ALL views; in fact the SQL Server manual, when discussing
UNION ALL views (in a section named "Partition views"), recommends
partitioning be considered instead. However, partitioning doesn't
have the same characteristics as MySQL's MERGE tables, see WL#5910.

There was a discussion in a dev-private thread with the suggestion
that WL#929 CHECK constraints should precede this task.
http://vilje01.norway.sun.com/mailarchive/mail.php?folder=4&mail=17784