MySQL :: Derived Tables in MySQL 5.7

In MySQL 5.7, we have enhanced the optimizer so that derived tables and views are handled more consistently.

Just to recap: A derived table is a subquery that can take the place of a table in the FROM clause of an SQL statement. A derived table that is embedded in the query is sometimes called an unnamed derived table. It is also referred to as simply a “subquery in FROM clause”.

It is also possible to replace a derived table with a view. This is why views sometimes are called named derived tables.

These two query sets are thus equivalent:

CREATE VIEW v1 AS SELECT * FROM t1;
SELECT * FROM v1 JOIN t2 USING (a);

1 2	CREATE VIEW v1 AS SELECT * FROM t1; SELECT * FROM v1 JOIN t2 USING (a);

SELECT * FROM (SELECT * FROM t1) AS dt1 JOIN t2 USING (a);

1	SELECT * FROM (SELECT * FROM t1) AS dt1 JOIN t2 USING (a);

For a long time, there has been a difference in the handling of views and derived tables: While it has been possible to merge views into the outer query, a derived table would always be materialized and accessed as a temporary table in the outer query.

This is costly for two reasons: creating and reading the materialized temporary table takes time, and it prohibits pushing down conditions from the outer query to the derived table.

In MySQL 5.7, we have consolidated how views and derived tables are handled: Both are resolved when the query they are used in is resolved. (Resolving means the process of looking up table and column names in the dictionary, checking that queries are semantically correct, etc.) We also decide whether to merge or materialize the derived table/view at the same place. In fact, this is almost the same place as where subqueries are transformed into semi-join operations, so the decision of whether we should do a semi-join transform, merge a derived table or view, or apply both transforms, is now taken when the containing query is resolved.

The criteria for when to merge a derived table is the same as for views, merging is not supported if the subquery contains:

UNION clause
GROUP BY
DISTINCT
Aggregation
LIMIT or OFFSET
an assignment to user variables

We can show this with EXPLAIN. The query

SELECT * FROM (SELECT * FROM t1) AS dt1 JOIN t2 USING (a);

1	SELECT * FROM (SELECT * FROM t1) AS dt1 JOIN t2 USING (a);

is explained like this in 5.6:

| 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | Using where |
| 1 | PRIMARY | <derived2>| ref | <auto_key0> | <auto_key0> | 5 | test.t2.a | 2 | Using index |
| 2 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 1 | NULL |

and like this in 5.7:

| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |

1 2	\| 1 \| SIMPLE \| t1 \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 1 \| 100.00 \| NULL \| \| 1 \| SIMPLE \| t2 \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 1 \| 100.00 \| Using where \|

How to control whether to merge or materialize

A user has always been able to determine whether to merge or materialize a view by specifying the ALGORITHM clause when creating the view: ALGORITHM=MERGE will merge the view into the outer query if possible, whereas ALGORITHM=TEMPTABLE will always materialize the view. There is also a third option: If no ALGORITHM value is specified, the view is handled similar to ALGORITHM=MERGE.

Since derived tables are specified directly in the query, we cannot specify an algorithm when creating them. We have therefore added a new optimizer switch named derived_merge that can be used to control whether to merge or materialize.
optimizer_switch=’derived_merge=on’ will merge the derived table if possible, optimizer_switch=’derived_merge=off’ will materialize the derived table always.

The optimizer_switch is also applicable to views: If no ALGORITHM is specified for the view, the decision is taken according to the value of derived_merge.

ORDER BY in derived table or view

ORDER BY in the subquery/view is handled differently in 5.7 compared to 5.6: It is ignored unless the outer query is non-aggregated and contains only one table reference (i.e the reference to this derived table or view). It is also ignored if the outer query itself contains an ORDER BY clause.

In this set of statements:

CREATE VIEW v1 AS SELECT * FROM t1 WHERE x > 100 ORDER BY y;
SELECT * FROM v1;

1 2	CREATE VIEW v1 AS SELECT * FROM t1 WHERE x > 100 ORDER BY y; SELECT * FROM v1;

the ORDER BY specification is propagated to the outer query, since v1 is the only table reference in the SELECT statement, the query is not aggregated and does not contain an ORDER BY clause.

The old behavior was doubtful: Propagating ORDER BY to an aggregated outer query is rarely possible. If the outer query is a join query, it is also hard to imagine that it is useful to propagate the ORDER BY to the outer query (if the query contains two view references, both with ORDER BY, which of them should be used?). But we kept the behavior for the simple case of a single non-aggregated table, since this is a quite common practice in existing applications.

Derived tables inside views:

Previously, it was not possible to create a view that contained a derived table in its FROM clause, but this restriction has been lifted in 5.7.

Example:

CREATE VIEW v AS
SELECT t1.a, t1.b, dt.c
FROM t1 JOIN (SELECT * FROM t2) AS dt ON t1.a=dt.a;

SELECT * FROM v WHERE c > 10;

CREATE VIEW v AS

SELECT t1.a, t1.b, dt.c

FROM t1 JOIN (SELECT * FROM t2) AS dt ON t1.a=dt.a;

SELECT * FROM v WHERE c > 10;

Here is the explain for this query in 5.7:

| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |

1 2	\| 1 \| SIMPLE \| t1 \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 1 \| 100.00 \| NULL \| \| 1 \| SIMPLE \| t2 \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 1 \| 100.00 \| Using where \|

For use in data change statements, we have chosen to make derived tables readonly when used as part of a join view. This has a few consequences:

A join view (a view over two or more tables that are joined) cannot be used in a DELETE statement, so there is no change here.
A view containing a derived table cannot be used as a target table in an INSERT statement.
UPDATE generally allows to update any updatable tables of a join view. We have copied this behavior for UPDATE of views containing derived tables.

The view created above cannot be inserted into, since it contains a derived table:

INSERT INTO v(a, b, c) VALUES (1, 10, 100);
ERROR 1471 (HY000): The target table v of the INSERT is not insertable-into

1 2	INSERT INTO v(a, b, c) VALUES (1, 10, 100); ERROR 1471 (HY000): The target table v of the INSERT is not insertable-into

The view cannot be deleted from, since it is a join view:

DELETE FROM v WHERE a > 0;
ERROR 1395 (HY000): Can not delete from join view 'test.v'

1 2	DELETE FROM v WHERE a > 0; ERROR 1395 (HY000): Can not delete from join view 'test.v'

The view can be updated, as long as the non-derived part of it is updated. This UPDATE statement is accepted:

UPDATE v SET b= b + 1 WHERE a > 0;

1	UPDATE v SET b= b + 1 WHERE a > 0;

But this statement is invalid, since the column “c” to be updated is from the derived table:

UPDATE v SET c= c + 1 WHERE a > 0;
ERROR 1288 (HY000): The target table t2 of the UPDATE is not updatable

1 2	UPDATE v SET c= c + 1 WHERE a > 0; ERROR 1288 (HY000): The target table t2 of the UPDATE is not updatable

This was feature request #12755 (Subquery in FROM clause of views).

Derived tables in select list of views

In earlier versions of MySQL, having a subquery in the select list of a view meant that the view had to be materialized. This restriction has been lifted in 5.7.

CREATE VIEW v AS
SELECT a, (SELECT COUNT(*) FROM t2) AS c
FROM t1;

CREATE VIEW v AS

SELECT a, (SELECT COUNT(*) FROM t2) AS c

FROM t1;

This was explained in 5.6 as:

| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t1 | ALL | NULL | NULL | NULL | NULL | 1 | NULL |
| 3 | SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | NULL |

In 5.7 we can see the view is merged into the outer query:

| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |

1 2	\| 1 \| PRIMARY \| t1 \| NULL \| ALL \| NULL \| NULL \| NULL \| NULL \| 1 \| 100.00 \| NULL \| \| 3 \| SUBQUERY \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| NULL \| Select tables optimized away \|

This was feature request #60417 (Slow views by using subquery as column)

Refactored resolver order

Earlier versions of MySQL analyzed derived tables and views when tables were opened. We continue to refactor MySQL, and an important part of that is to analyze, optimize and execute queries in strict phases. As a natural part of that we now resolve derived tables and view references in “natural” order, by analyzing them when looking at table references for the outer query.

Refactored column privilege checking

Analysis of how we do column privilege check revealed a complex algorithm and a few corner case bugs. We have therefore tightened the column privilege checks:

Column privileges are not checked when resolving the derived table/view, but rather when the outer query is resolved and we know the context of the derived table.
Previously, the required privileges for columns were assigned to a GRANT_INFO object connected to a table. But different columns require different privileges (a column in a WHERE clause requires SELECT privilege, whereas a column that is target for insertion requires INSERT privilege). In 5.7, the column privilege is always passed as argument to the privilege checking function, giving better control to the caller.
Due to the above refactoring, we could eliminate the field GRANT_INFO::orig_want_privilege, and GRANT_INFO::want_privilege is only used in debug build for consistency checking.

Refactored read_set/write_set processing

Each table involved in a query has registered one set of columns that is read and one set of columns that are written, called read_set and write_set respectively.

We have consolidated the setting of these values into a function TABLE::mark_column_used() and a walker Item::mark_field_in_map(). Like for privilege checking, we delay setting of columns into read_set and write_set until when the outer query is resolved. Thus, we know the columns from the derived table that are actually used in the outer query and need not copy unused columns from the storage engine.

Example:

CREATE TABLE t1(a INTEGER, b BLOB);
CREATE VIEW v1 AS SELECT * FROM t1;
SELECT a FROM v1;

CREATE TABLE t1(a INTEGER, b BLOB);

CREATE VIEW v1 AS SELECT * FROM t1;

SELECT a FROM v1;

With the current way of setting read_set, it is simple to make sure we only retrieve the “a” column and not the “b” blob.

Preparing for future extensions

Since derived tables now are resolved within proper context, it will become simpler to implement outer references in derived tables. More importantly, implementing LATERAL derived tables may also become simpler, since these are depending heavily on references to tables from the embedding outer query.