WL#883: Non-recursive WITH clause (Common Table Expression)
Affects: Server-8.0
—
Status: Complete
WHAT ==== Allow statements like WITH query_name AS (subquery) SELECT * FROM query_name; i.e. some sort of statement-local temporary table. WHY? ==== 1) Ok, a derived table already achieves that, with SELECT * FROM (subquery) AS query_name; but a derived table cannot be referenced more than once, so if you want to join it with itself, you must duplicate it: SELECT * FROM (subquery) AS query_name1, (same subquery) as query_name2; and you get *two independent derived tables* (query_name1 and query_name2), thus two temporary tables, and two evaluations of the subquery (one evaluation to fill query_name1, one to fill query_name2). On the opposite, a CTE can be referenced more than once: WITH query_name AS (subquery) SELECT * FROM query_name as query_name1, query_name as query_name2; As it's two references to *one CTE*, this allows the optimizer to create one single temporary table (query_name), and do one single evaluation of the subquery to fill it: less disk usage, less CPU usage ; then query_name1 and query_name2 just "point" to query_name and read it. 2) slightly easier to read: with a derived table, you have to first locate the derived table's definition deeply nested in the top query, and then you can go back up and start reading the top query. With a CTE, you read from top to bottom in one pass. 3) A CTE can reference another CTE , which a derived table cannot do; this allows to build a tmp result over other tmp results WITH res1 AS (subquery1), res2 AS (subquery using res1), res3 AS (subquery using res1) SELECT ... FROM res2, res3; 4) A complete example: # One row per day, with amount sold on that day: create table sales_days(day_of_sale DATE, amount INT); # Data: insert into sales_days values ('2015-01-02', 100), ('2015-01-05', 200), ('2015-02-02', 10), ('2015-02-10', 100), ('2015-03-02', 10), ('2015-03-18', 1); # Find best and worst month: with # first CTE: one row per month, with amount sold on all days of month sales_by_month(month,total) as (select month(day_of_sale), sum(amount) from sales_days where year(day_of_sale)=2015 group by month(day_of_sale)), # second CTE: best month best_month(month, total, award) as (select month, total, "best" from sales_by_month where total=(select max(total) from sales_by_month)), # 3rd CTE: worst month worst_month(month, total, award) as (select month, total, "worst" from sales_by_month where total=(select min(total) from sales_by_month)) # Now show best and worst: select * from best_month union all select * from worst_month; +-------+-------+-------+ | month | total | award | +-------+-------+-------+ | 1 | 300 | best | | 3 | 11 | worst | +-------+-------+-------+ 5) It is a prerequisite for "recursive WITH" (WL#3634) which is much requested. This WL is about the non-recursive. BUT CAN'T I DO THE SAME WITH EXISTING MEANS? ============================================ - with "create temporary table" to hold the tmp result? But: you'll have more round-trips (create temporary, send query, drop temporary), you'll need the privilege to create a tmp table, and you'll face BUG#10327 (your table cannot be referenced twice). - with "create view" to hold the tmp result? But you'll have more round-trips, need a privilege to create the view, and multiple references will cause multiple materializations. If you forget a DROP, the next CREATE will fail. You'll need to handle name conflicts with other sessions. Creating a view will alter the DD, which can be slow due to durability constraints. - with "create table" to hold the tmp result? Same problems. DO OTHERS HAVE CTEs? ==================== yes: standard SQL99, DB2, SQL Server, Oracle, SQLite, PostgreSQL. MISC ==== Bugs: BUG#16244 with 36 subscribers and BUG#32174 with 4 subscribers.
F-1 support the WITH syntax: - at the start of the SELECT, UPDATE, DELETE statement, - in subqueries used by any statement, - within general query expressions used by INSERT SELECT, REPLACE SELECT, CREATE VIEW and CREATE TABLE SELECT. F-2 Allow a list of CTEs per query expression. F-3 Allow a CTE to be built depending on another CTE. F-4 A CTE should not contain any outer reference F-5 CTEs, derived tables, views should accept an optional list of column names; for views, those names should be visible in SHOW CREATE VIEW and INFORMATION_SCHEMA.COLUMNS. To support this, a column VIEW_COLUMN_NAMES is added to the TABLES table of the Data Dictionary; this will prevent on-the-fly upgrades from 8.0.0 to 8.0.1, which isn't supported per Geir's decision (reported in email from DmitryL to Guilhem on 2016-10-26). F-6 merge() and no_merge() hints should apply to CTEs; if a CTE is referenced multiple times in a query, the hints should allow to selectively merge (or not) certain references only. (FYI those hints have been introduced for views and derived tables by WL#9307). F-7 if a CTE is materialized by a query it should be materialized once for this query, even if that query references it several times. F-8 in error messages and in the manual, CTEs will be spelled "Common Table Expression" F-9 in EXPLAIN, it should be clear that a CTE is materialized only once (see F-7). F-10 Like with derived tables,: - MySQL should recognize functional dependencies in (non-recursive) CTEs. - EXPLAIN WITH ... SELECT ... should not fill the tmp table of a materialized CTE - relevant indexes should be automatically added to the materialized CTE if the Optimizer thinks that they will speed up the top query's access to the CTE. NF-1 performance shouldn't be lower than when using a derived table
WL#8083: Introduce <query expression> parser rule
WL#8907: Parser refactoring: merge all SELECT rules into one.
WL#9307: Enabling merging a derived table or view through a optimizer hint
WL#8907: Parser refactoring: merge all SELECT rules into one.
WL#9307: Enabling merging a derived table or view through a optimizer hint
In SQL2011: optional feature T121, description inparagraph. "WITH cte AS (SELECT ...) SELECT * FROM cte" is similar to CREATE VIEW cte AS SELECT ...; SELECT * FROM cte; DROP VIEW cte; That is, WITH introduces an internal temporary view which can be referenced in the FROM clause of a SELECT statement, and which disappears when the SELECT ends. It's a statement-local view, it is implicitely created and dropped, and requires no create/drop privileges. "Query name" is the standard term. Microsoft and IBM use the name "common table expressions" ("CTE"), Oracle uses "subquery factoring clause". Our preferred term in this WL, in the documentation, in error messages, in blogging: CTE. Below, statements like "we won't support it" mean "we won't support it in first released versions, but design won't prevent any future implementation, which we may consider later". Syntax ====== SQL2011's corresponds, in MySQL, to "a UNION" (possibly degenerated to a single SELECT query block); it has the form: ::= [ ] where is a single query block or a UNION of query blocks. Such can then serve as a standalone SELECT statement, or as input for a subquery, a derived table's subquery, a cursor... The can be prefixed with a : ::= WITH ::= [ { }... ] ::= [ ] AS represents one Common Table Expression. And can be used as a table reference in . Example: WITH cte AS (SELECT column1 FROM Table1) SELECT column1 FROM cte; We call the part inside "AS(...)" the "subquery of the CTE". Parentheses after AS are required. A CTE has no schema name prefix when defined or referenced (i.e. no "mydb.cte"). There can be at most one WITH clause per . It can list several CTEs: WITH t1 AS (SELECT ...), t2 AS (SELECT ...) SELECT * FROM t1,t2; t2's subquery can reference t1; not the other way around. t1's subquery cannot reference itself (or then it's recursive which is WL#3634). The final query expression body can reference t1/t2 directly, and/or in a subquery or derived table: WITH t1 AS (SELECT ...), t2 AS (SELECT ...) SELECT (SELECT * FROM t1)...; t1 can be referenced multiple times in the query expression body which has the WITH clause: WITH t1 AS (SELECT ...) SELECT * FROM t1, t1 AS t2; WITH t1 AS (SELECT ...) SELECT * FROM t1 WHERE a IN (SELECT * FROM t1 AS t2); The same CTE name may not appear more than once in the list of a WITH clause: no WITH t1 AS ..., t1 AS ... rule is starting with SELECT so doesn't have WITH, so WITH followed by WITH on the same level is illegal, for example: WITH t1 AS (SELECT * FROM t0) WITH t2 AS (SELECT * FROM t1) SELECT * FROM t2; is illegal: after the "WITH t1 AS (SELECT * FROM t0)", as there is no comma, we expect , which shouldn't start with WITH. Column lists. There is an optional clause [ ] which names columns of the CTE (without this, columns get their names from the SELECT list of the CTE's subquery). For example WITH q (column1) AS (SELECT a FROM t1) SELECT column1 FROM q; In the Standard, such column naming is supported for derived tables and views too. This WL adds support to CTEs, derived tables and views. A subquery is a , a view's definition contains a , a cursor's definition contains a , so a WITH clause inside them is legal, examples: SELECT * FROM t1 WHERE t1.column1 = (WITH q AS (SELECT ...) SELECT ... FROM q); # WITH in scalar subquery INSERT INTO t1 WITH q AS (SELECT ...) SELECT ... FROM q; # WITH in INSERT SELECT CREATE TABLE t1 AS WITH q AS (SELECT ...) SELECT ... FROM q; # WITH in CREATE TABLE CREATE VIEW v AS WITH q AS (SELECT ...) SELECT ... FROM q; DECLARE c CURSOR FOR WITH q AS (SELECT ...) SELECT ... FROM q; We also support WITH as a prefix to the UPDATE syntax, and to the DELETE syntax. PG and SQL Server allow that, Oracle and SQL2011 do not. This allows to update/delete a base table or view based on some criteria which references a CTE. The CTE is read-only, like a derived table. Examples: Single-table UPDATE: WITH q as (SELECT ...) UPDATE t SET t.a=(SELECT q.b FROM q); Multi-table UPDATE: WITH q as (SELECT ...) UPDATE t, q SET t.a=q.b WHERE...; Single-table DELETE: WITH q as (SELECT ...) DELETE FROM t WHERE t.a IN (SELECT * FROM q); Multi-table DELETE: WITH q as (SELECT ...) DELETE t FROM t,q WHERE ...; Behaviour ========= Name conflicts: following the existing rules of derived tables, CTEs hide base tables and named temporary tables (those created by 'CREATE TEMPORARY') having the same name. The order of lookup of a table reference located in a query block, to identify the table, is: 1. Look for derived table in current query block 2. Look for CTE in current query block 3. Look for derived table in outer query block 4. Look for CTE in outer query block ... n-1 Look for derived table in outermost query block n. Look for CTE in outer-most query block. Giving priority to the derived table over the CTE (i.e. step 1 before 2) is logical, as it comes from the FROM which is a part of the i.e. closer to the SELECT list than is; the Oracle DBMS does that. If in the subquery of a CTE there is ORDER BY without LIMIT: ORDER BY is pointless, then we do as for derived tables: if we merge this CTE, remove ORDER BY silently unless the CTE is the only table in the FROM clause of the outer query block. WITH and functions. Consider the sequence: CREATE TABLE t1(a INT); CREATE FUNCTION f () RETURNS INT RETURN (SELECT COUNT(*) FROM t1); WITH t1 AS (SELECT * FROM t2) SELECT f() FROM t3; When the time comes to evaluate function f(), in "SELECT COUNT(*) FROM t1", is t1 the CTE that was defined in the query block that causes the invocation of f(), or the base table? Answer: the base table. Otherwise, the caller may influence strongly the behaviour of the function by making it use another table than what the function's author intended; moreover, a function operates in a database defined at creation time, and CTEs have no base table. With 'CREATE TEMPORARY' tables, things are different, but that won't apply to CTEs. Like with derived tables, MySQL recognizes functional dependencies in (non-recursive) CTEs. When to materialize or merge the CTE? Oracle has a "/*+ materialize */" hint, an indication that the CTE should be materialized: WITH t2 AS SELECT /*+ materialize */ * FROM t1 SELECT * FROM t2 a, t2 b WHERE a.s1 = b.s1; In WL#9307 such hint is implemented and affects derived tables and views (see WL#9307 for more details); in this WL we make it apply to CTEs too, as in: with qn as (select * from t1) select /*+ no_merge(qn) */ * from qn; When the user hasn't specified any hint, the default behaviour is that we do for CTEs like for views and derived tables in 5.7 i.e. by default we most often merge. Recursive CTEs are always materialized, as any derived table containing UNION. If materialized, a CTE is materialized only once (the optimizer trace contains "reusing_tmp_table" as indication). See the LLD for a proposed new EXPLAIN output for materialized CTEs. In CREATE ALGORITHM=TEMPTABLE/MERGE VIEW AS WITH (subquery) SELECT ...; the ALGORITHM clause does not influence the merging/materialization of the underlying CTE, only that of the "..." part. Like with derived tables: - EXPLAIN WITH ... SELECT ... does not fill the tmp table of a materialized CTE - relevant indexes are automatically added to the materialized CTE if the Optimizer thinks that they will speed up the top query's access to the CTE. The storage engine hosting the tmp table is MEMORY or InnoDB. MyISAM is not supported (as in 8.0 there is a plan to remove MyISAM for any internal tmp table): if internal_tmp_disk_storage_engine=myisam, any attempt to use a query name which results in a materialized MyISAM table returns an error. Maximum size of a materialized CTE's result: exactly like for a materialized derived table: influenced by @@tmp_table_size and @@max_heap_table_size. If @@big_tables=1, table is created in InnoDB; otherwise it's created in MEMORY and if it grows too big it is converted to InnoDB. http://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tmp_table_size Unsupported queries =================== In the standard, outer references are allowed in the CTE's subquery and in the derived table's subquery: SELECT * FROM t1 WHERE t1.a IN (WITH qn AS (SELECT t1.a) SELECT * FROM qn); We never supported outer references in derived table's subquery; this WL won't support outer references in CTE's subquery. Perhaps later. PG supports WITH( ) like: WITH foo AS (DELETE FROM U ... RETURNING *) INSERT INTO T SELECT * FROM foo; (delete rows from U and store them in T). SQL2011 has something similar, as it allows, in the subquery defining a WITH (and only there, per 7.15 Syntax rule 1), a to be a : the subquery then executes a DELETE/INSERT/UPDATE: INSERT INTO T WITH foo AS (SELECT * FROM OLD TABLE (DELETE FROM ...)) SELECT * FROM foo; We won't support it. Workaround: start a transaction, then, in WITH, just SELECT the rows: INSERT INTO T WITH foo AS (SELECT * FROM ...)) SELECT * FROM foo; then DELETE them in a follow-up statement using the same WHERE clause as the subquery of WITH. Requirements on storage engine used for materializing the CTE ============================================================= If a query has more than one reference to a certain materialized CTE, the references are readers of the temporary table. They can do any type of read access (table scan, index scan/lookup on any index, index merge...). It is critical that a reader is not disturbed by the other readers. Modifications to MEMORY and Innodb storage engines ================================================== See WL#3634. Modifications to the data dictionary ==================================== As this WL allows to specify a column name list for views (and derived tables and CTEs), and views are persistent objects, it adds a column to the relevant table in the DD, storing the list of column names for the view. This particular change must be pushed to 8.0 before the DD format is frozen in 8.0. References ========== Feature requests for WITH: BUG#16244 BUG#20712 BUG#23156 BUG#48610 BUG#60940 "Oracle SQL-99 WITH clause" http://www.dba-oracle.com/t_sql99_with_clause.htm "SQL Server - Using Common Table Expressions" http://msdn2.microsoft.com/en-us/library/ms190766.aspx https://msdn.microsoft.com/en-us/library/ms175972.aspx http://www.postgresql.org/docs/9.5/static/queries-with.html DB2 syntax http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000879.htm
See the HLS for a definition of a CTE. Abbreviations: DT = derived table. NONRECURSIVE ============ (I) in parsing: - When seeing the WITH syntax, a PT_with_clause is created: a list of PT_common_table_expr-s, each element containing a name (string), a query_expression (=subquery) PT_subquery (non-contextualized), the raw text of that expression ("SELECT etc"), and an optional list of explicitely specified column names; this accounts for the syntax portion: WITH some_cte(colname1, colname2) AS (SELECT etc) See sql_yacc.yy, rule "with_clause:" . - when a table reference is seen in FROM of a query block of a query expression, add_table_to_list() tries to match it with some name of a with_list_element visible to that query expression (i.e. in same nesting level or in outer level); if it succeeds, it replaces the table name with a derived table whose defining subquery is the PT_subquery found in the PT_common_table_expr. If later a second reference to the CTE is found, the same process is followed except that the defining subquery for this 2nd reference is not the same PT_subquery as for the first: instead it is produced by reparsing the raw text of the subquery (found in the PT_common_table_expr). This is so that every derived table has its own PT_subquery. See add_table_to_list() which calls find_common_table_expr(). - After the reference has been made a derived table, its defining PT_subquery is then contextualized. See find_common_table_expr(). - If find_common_table_expr() can't match, it looks at the PT_common_table_expr-s of the outer query expressions, if any. See find_common_table_expr(). (II) table opening, early privilege checking: Nothing special as references are derived tables: uses the normal procedure of derived tables: underlying tables are opened and checked. (III) resolution: - resolve_derived(): using the normal derived table resolution process - merge_derived(): using the normal derived table merging process; if column names were provided they are used as names for field_translator-s (See create_field_translation()). - setup_materialized_derived(): using a modified derived table materialization process (called here "shared materialization"): multiple references to the same CTE, if materialized, share the same tmp table: multiple TABLE_LISTs and TABLEs, but one TABLE_SHARE, one physical instance in the engine, one single set of materialized rows, shared between all references even in different query blocks. Note that this sharing is within the bounds of a single execution and of a single connection: shared table is still dropped at end of execution. In detail: * setup_materialized_derived() creates TABLE and TABLE_SHARE except if a TABLE_SHARE has already been created for another DT materializing this same CTE, in which case we simply create a new TABLE based on the existing TABLE_SHARE (using open_table_from_share()). * if column names were provided they are used as column names for the tmp table. (IV) optimization (if materialized): - "Derived key creation" code is modified to fit with "shared materialization": every reference to the CTE, in any query block, can add keys it thinks are appropriate. They all get added to the single tmp table. See the long comment at the start of add_derived_key(). (V) execution: - create_derived(): modified as follows: * if we find another TABLE (of another DT materializing this same CTE) which has already been in create_derived() (i.e. has is_created()==true), it means the table exists in the engine: we simply open a "handler" (ha_heap/innodb) to the created table by calling open_tmp_table() on our TABLE_SHARE and set is_created() to true. * otherwise, we do the normal create_derived() job. - materialize_derived(): modified as follows: * if we find another TABLE (of another DT materializing this same CTE) which has already been materialized (i.e. has materialized== true), we just set our TABLE::materialized to true, as our TABLE can have access to the already-materialized rows. * otherwise, we do the normal materialize_derived() job. * if this materialization job meets the size limit of MEMORY engine, as usual the MEMORY table is converted to INNODB; this implies also closing/reopening the other instances. (VI) cleanup Tmp tables for materialized CTEs are destroyed in SELECT_LEX::cleanup(); derived tables and views are, too (the list THD::derived_tables is removed). (VII) In Opt trace. CTE is shown just like view/DT. Optimizer trace shows when there is shared materialization ("reusing_tmp_table" + only one "creating_tmp_table") (VIII) in EXPLAIN Like a view/DT, a merged CTE is not shown as a node, only its underlying tables appear in the top query's plan. The rest of this paragraph applies to a materialized CTE. For a DT, where the "materialized_from_subquery" node follows the reference. CTEs are similar, but for a CTE which is referenced multiple times, there is thus no duplication of "materialized_from_subquery" nodes (it would give the impression that the subquery is executed multiple times, and clutter the screen). Only one reference to the CTE has a complete "materialized_from_subquery" with the description of the plan of its subquery; other references have a reduced "materialized_from_subquery". See "INTENDED EXPLAIN OUTPUT" below for an example. The same applies to EXPLAIN TRADITIONAL: other references' subqueries are not shown. (IX) printing - SELECT_LEX_UNIT::print() is augmented to print its WITH clause. RECURSIVE ========= Terminology: in with QN(col) as ( select 1 union all select col+1 from QN where col<10) ^XX select * from col; ^YY XX is a recursive reference to QN, YY is a nonrecursive reference to QN. "select 1" is the nonrecursive member of UNION; "select col+1" is the recursive member of UNION. (I) in parsing. - recursiveness is detected - a nonrecursive reference to a recursive CTE is handled as a nonrecursive CTE: replaced with a derived table. - during contextualization of that derived table, when find_common_table_expr() spots a recursive reference it replaces it with a "dummy" derived table (defined by the dummy subquery "(select 0)"). (II) table opening and early privilege checking: this recursive reference is automatically transparent (it's a derived table without any underlying table). (III) in resolution of the defining subquery of the CTE: - resolve_derived() checks the form of the query expression (non-recursive members must precede recursive members, etc - it calls prepare() on the subquery, which - a) resolves non-recursive members - b) creates the TABLE (or copies an existing one - "shared materialization" applies too) (a recursive CTE is always materialized as it contains a UNION), in setup_materialized_derived_tmp_table() - c) turns the dummy derived tables into copies of that TABLE (created by open_table_from_share()). - d) resolves recursive members (which is finally possible as the recursive table references are now set up: for example their column types are now known per (b)). (IV) In Optimization - in TABLE_LIST::fetch_number_of_rows(), the estimated count of rows in a recursive reference is set to the count of rows returned by the non-recursive query blocks as estimated by the Optimizer; and it's always at least two rows to be not be treated as a constant table (it will "grow" during execution) - as the CTE contains a union, its estimated_rowcount is the sum of the estimation of union's members; as at least one such member involves a join over a recursive reference (of row count >=2, see above), this estimated_rowcount is generally >=2. Unless the member's WHERE condition is known to always be false, in which case the CTE may have estimated_rowcount<=1 and be a const table for the outer query. - the recursive reference in a query block is declared to be a dependency of all other tables, so it goes first in plan (which is necessary for the algorithm to work). - the recursive reference is read with table scan, in the same order as rows were inserted into it. - derived keys can exist on the CTE, can be used by nonrecursive references, but not to access recursive references (the access method of recursive references must remain table scan). (V) In execution (select_lex_unit::execute): - every write to the tmp table increases a counter C which is the count of all rows existing in the tmp table (accessible through unit->query_result()->row_count()) - the start value of C (0, initially) is cached into OLD_C - non-recursive members are executed, fill the tmp table of the derived table. C thus holds the count of non-recursive rows. For example, let's say those rows are '1' and '2' (C=2). Then there is a loop: - the first recursive query block executes; as soon as a result row '3' is calculated it is appended to the tmp table; so the tmp table is '1-2-3'; this new row is visible to the query block (loop-back); after processing '2' the query block sees '3'; the query block thus produces more rows, etc, until it produces nothing. For example assume it has produced '3', '4' and '5': C is 5 now. - the same is done for the 2nd, 3rd (etc) recursive block; say they produced '6', so C is 6 - the new count NEW_C is set to C (6) and is compared to OLD_C (2): - if NEW_C == OLD_C: terminate. Otherwise, there are more rows to process: set OLD_C to NEW_C and loop. Note that then the next loop iteration resumes the reads where they had left. So they will directly come to new rows. - because the recursive reference is first in the join plan, this algorithm works. - most of the logic of the algorithm is put in a class Recursive_executor which drives what happens in select_lex_unit::execute(). - If the tmp table is converted from MEMORY to INNODB, for recursive references (readers) having an already open read cursor, we re-position their cursor onto the same INNODB row as the MEMORY row it was on; this is possible as we manage a count of rows-read-so-far for every read cursor: qep_tab::m_fetched_rows). It is natural to ask why, above, if NEW_C>OLD_C "there are more rows to process" - didn't we already hit EOF in the recursive query block, so didn't we read all rows already? Most likely we did, but there are two cases which require this repetition: - when there are two or more recursive query blocks QB1 and QB2, the execution of QB1 produced rows, until QB1 hit EOF, then control passed to QB2, the execution of QB2 produced rows, then QB1 should be run again to see if it should produce rows from the QB2-produced rows. And then QB2 should be run again. And so on. That's the case in our example: QB1 hit EOF, but QB2 later added a row '6', which QB1 must look at. - even if there's a single recursive query block: if join buffering is used on the non-recursive references, rows are first cached, production of rows is delayed to after sub_select() has hit EOF, then the join buffer is flushed and rows are written to the tmp table so the query block needs to run again to see those rows. - if we are not in these two cases, then the repetition is indeed superfluous, but it does not hurt much: sub_select( ) simply hits EOF immediately, then NEW_C==OLD_C, and so it terminates. (VI) cleanup. - as for non-recursive CTE. (VII) In EXPLAIN and opt trace. If @@optimizer_trace_features has repeated_subselect=off, don't trace the 2nd execution of the recursive members, and following. OTHER RELATED CHANGES Ability to specify a list of column names for a CTE, and a DT is added: WITH qn(column names) AS (select...) or FROM (select...) dt(column names) This already existed for views: CREATE VIEW v(column_names) AS ... but during development of this WL it was found that the existing implementation for views had a bug so it was changed to use the same impl as the new one for CTE/DT. As a consequence: - the list of column names for the view is given a place in the data dictionary (a new column VIEW_COLUMN_NAMES). - SHOW CREATE VIEW is adapted. - Column name management in sql_view.cc is reworked: if column names are provided, check_duplicate_names() should search for duplicates in those names, not in the names of selected expressions. - LEX::view_list isn't needed anymore and is removed. MERGE/NO_MERGE hint is made to apply to CTEs. Added end_psi_batch_mode_if_started(): used when the tmp table overflows and gets converted from HEAP to INNODB: closing HEAP requires ending the mode; which is later re-enabled on innodb. QEP_TAB::materialized moves to TABLE (makes it easier for shared materialization, a table can consult its siblings' materialization status). TABLE_SHARE::ref_count was so far not used for materialized tmp tables, now it is; it allows storage engines to detect that such table is involved in shared materialization, which they need to know in order to take some concurrency-related measures). To create a new PT_subquery, a new token GRAMMAR_SELECTOR_CTE is introduced (inspired by GRAMMAR_SELECTOR_GCOL). TABLE_SHARE::uniques was unused, is removed. A new helper in class TABLE: /// Save names of materialized table @see reset_name_temporary void save_name_temporary() { view_db.str= db; view_db.length= db_length; view_name.str= table_name; view_name.length= table_name_length; } HEAP engine has optimizations when it knows the table is used only by this thread and with one single reference: a) automatically drops itself on close b) doesn't link it in any list so any other open() doesn't see it and creates a new table. This pre-existed and is triggered by open flag HA_OPEN_INTERNAL_TABLE. But for a materialized CTE, we need (a) and not (b); HEAP code is modified to look at TABLE_SHARE::ref_count. InnoDB uses that distinction too. InnoDB gets a few changes from Jimmy so that an internal tmp table can have one writer and multiple readers which don't disturb each other and which see the newly written rows. INTENDED EXPLAIN OUTPUT explain format=json with qn as (select * from t limit 2) select * from qn as qn1, qn as qn2; What we do NOT want to show, is that the plan for materializing "qn" appears twice, like this: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "21.40" }, "nested_loop": [ { "table": { "table_name": "qn1", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "1.20" }, "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.20", "prefix_cost": "1.20", "data_read_per_join": "8" }, "used_columns": [ "a" ] } } } } }, { "table": { "table_name": "qn2", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "1.20" }, "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.20", "prefix_cost": "1.20", "data_read_per_join": "8" }, "used_columns": [ "a" ] } } } } } ] } } It should rather be like this: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "21.40" }, "nested_loop": [ { "table": { "table_name": "qn1", "access_type": "ALL", "rows_examined_per_scan": 2, "rows_produced_per_join": 2, "filtered": "100.00", "cost_info": { "read_cost": "10.10", "eval_cost": "0.40", "prefix_cost": "10.50", "data_read_per_join": "32" }, "used_columns": [ "a" ], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "1.40" }, "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 2, "rows_produced_per_join": 2, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.40", "prefix_cost": "1.40", "data_read_per_join": "16" }, "used_columns": [ "a" ] } } } } }, { "table": { "table_name": "qn2", "access_type": "ALL", "rows_examined_per_scan": 2, "rows_produced_per_join": 4, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "10.10", "eval_cost": "0.80", "prefix_cost": "21.40", "data_read_per_join": "64" }, "used_columns": [ "a" ], "materialized_from_subquery": { "sharing_temporary_table_with": { "select_id": 2 } } } } ] } } Note the second "materialized_from_subquery" which tells the user it's "sharing_temporary_table_with" the CTE which has query block select#2. And EXPLAIN TRADITIONAL should be: +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------------+ | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL | | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using join buffer (Block Nested Loop) | | 2 | DERIVED | t | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+---------------------------------------+ Notice that there is a single DERIVED row, which is the first reference to the CTE. The second reference is not explained. PRIMARY SELECT is said to read twice.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.