WL#3634: Recursive WITH (Common Table Expression)
Affects: Server-8.0 — Status: Complete
WHAT? ===== Support recursive CTE. Example: Numbers from 1 to 10: with recursive qn as (select 1 as a union distinct select 1+a from qn where a<10) select * from qn; prints 1 to 10. Fibonacci numbers: with recursive qn as (select 1 as n, 1 as un, 1 as unp1 union all select 1+n, unp1, un+unp1 from qn where n<10) select * from qn; n un unp1 1 1 1 2 1 2 3 2 3 4 3 5 5 5 8 6 8 13 7 13 21 8 21 34 9 34 55 10 55 89 # 89 is the 10th Fibonacci number Hierarchy traversal like parts explosions or ancestors-descendants; here's a company's org chart: CREATE TABLE EMPLOYEES ( ID INT PRIMARY KEY, NAME VARCHAR(100), MANAGER_ID INT, INDEX (MANAGER_ID), FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) ); INSERT INTO EMPLOYEES VALUES (333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL) (198, "John", 333), # John is 198 and reports to 333 (=Yasmina) (692, "Tarek", 333), (29, "Pedro", 198), (4610, "Sarah", 29), (72, "Pierre", 29), (123, "Adil", 692); # Give me the org chart with the management chain for every employee (= the path from CEO to employee): WITH RECURSIVE EMPLOYEES_EXTENDED(ID, NAME, PATH) AS ( SELECT ID, NAME, CAST(ID AS CHAR(200)) FROM EMPLOYEES WHERE MANAGER_ID IS NULL UNION ALL SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID) FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID ) SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH; ID NAME PATH 333 Yasmina 333 198 John 333,198 29 Pedro 333,198,29 4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina 72 Pierre 333,198,29,72 692 Tarek 333,692 123 Adil 333,692,123 Except by creating a large stored procedure, it is not possible to achieve this result without this feature. It is similar in idea to Oracle's CONNECT BY. DO OTHERS HAVE IT? ================== yes: standard SQL99, DB2, SQL Server, Oracle, SQLite, PostgreSQL. MISC ==== Slides 40-59 of http://fr.slideshare.net/MarkusWinand/modern-sql In SQL2011 it is optional feature T131/132 described in paragraph
. It is possible to emulate this with a generic SP: http://guilhembichot.blogspot.co.uk/2013/11/with-recursive-and-mysql.html but this SP has to create persistent tables, requiring privileges, requiring proper cleanup, etc.
F-1 support the WITH RECURSIVE syntax F-2 Recursive query blocks should have a "recursive" mark in EXPLAIN F-3 Same reference practice as for non-recursive CTEs, with the only addition that a recursive CTE is allowed to reference itself. For example, mutually recursive CTEs are not supported (t1 referencing t2 referencing t1).
See WL#883 for necessary background. Syntax ====== Coverage in this description will only be about the necessary addition for WITH that enables recursion. This is for a minimal requirement -- recursive queries can be very complex. A WITH clause is, as in WL#883, a list of CTEs. One or more of those CTEs may reference itself in its subquery: it's a "recursive CTE". A WITH clause may define a mix of non-recursive CTEs and recursive CTEs; if it has recursive ones it must have the RECURSIVE keyword. The elementary description, with one single (recursive) CTE, is: WITH RECURSIVE cte AS (SELECT ... FROM table_name /* "seed" SELECT */ UNION [ALL | DISTINCT] SELECT ... FROM cte,table_name) /* "recursive" SELECT */ SELECT ... FROM cte; The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with". The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't. Execution does like this: - non-recursive query block is evaluated, result goes into an internal tmp table - if no rows, exit - (A): recursive query block is evaluated over the tmp table's lastly inserted rows, and it produces new rows which are appended to the tmp table (if UNION ALL; only distinct not-already-there rows if UNION DISTINCT) - if the last step didn't produce new rows, exit - goto (A) This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring. The following constraints are from the SQL standard, or exist in important DBMSs; to simplify our implementation some constraints are added and marked with star-ed notes: - the CTE's subquery is a chain of unioned query blocks, which can be split in two parts: a head which contains no reference to the CTE, and a tail which has references to the CTE. The head's query blocks are named "non-recursive query blocks"; the tail's query blocks are named "recursive query blocks". - It is forbidden to have RQBi UNION DISTINCT RQBj UNION ALL RQBk where RQBx are consecutive recursive query blocks (*) - If there is more than one recursive query block, in one iteration they all operate on the data produced by the previous iteration, and their results are concatenated. - a recursive query block cannot have aggregate functions, GROUP BY - a recursive query block cannot have ORDER BY, LIMIT, SELECT DISTINCT (**) - the query expression of the CTE cannot have a global ORDER BY, LIMIT (***) - a recursive query block must reference the CTE only once, this reference must be in the FROM clause, not in any subquery or derived table. For a recursive query block, the execution plan should access the CTE first, so the reference to the CTE should neither be on the right (left) side of a LEFT (RIGHT) JOIN, nor forced by join hints (STRAIGHT_JOIN) to be non-first. - types of the CTE's result columns are inferred from the types of the union of non-recursive query blocks only (recursive query blocks are ignored), and they are all NULLable. - Notes: (*) Note that Oracle/DB2/MSSQL don't allow UNION DISTINCT between recursive query blocks, which is a stronger limitation. (**) I could neither see the meaning of those in a recursive query block, nor find examples on the web. (***) In examples on the web, the ordering is usually done in the outer query. LIMITing in the subquery could have the intention to stop the recursion early, but it seems to be instead done by having an increasing "depth level" column in the query. If needed, a workaround is to wrap in a second CTE: WITH RECURSIVE qn AS (SELECT nonrec1 UNION ALL SELECT rec ORDER BY x LIMIT y) SELECT * FROM qn; -> WITH RECURSIVE qn_no_order AS (SELECT nonrec1 UNION ALL SELECT rec), qn AS (SELECT * FROM qn_no_order ORDER BY x LIMIT y) SELECT * FROM qn; More Syntax =========== Optional column name list: in the standard it's mandatory for WITH RECURSIVE (not mandatory for non-recursive WITH). We won't require it, as it seems reasonable if the user instead uses AS alias clauses in his first non-recursive query block: with recursive qn(a) as (select 1 union all select 1+a from qn where a<10) select * from qn; can be written as with recursive qn as (select 1 as a union all select 1+a from qn where a<10) select * from qn; In the list of column names, there must be no duplication. Materialization =============== As any derived table containing UNION, a recursive CTE is always materialized. Even if referenced multiple times, it's materialized once. Unsupported queries =================== First see the starred notes in "Syntax" section. In SQL2011 mutual recursion between N CTEs is possible. It is complex. Oracle and PG don't allow it, we won't. We haven't found any example of someone using this, on the Web. SQL2011 allows a CTE to be dependent on a CTE coming after it in the WITH RECURSIVE clause: with cte1 as (select * from cte2), cte2 as (...)... we don't allow this. cte2 is allowed to depend on cte1 (we thus apply the same rule as for non-recursive WITH).
: in SQL2011 can be added at the end of the WITH clause. It allows 1) to order the CTE's result rows depth-first or breadth-first in the query expression having the WITH clause 2) to mark starts of cycles (to stop infinite recursion). Oracle has this clause. PG doesn't; they give tips in the docs how to emulate that in the query, http://www.postgresql.org/docs/current/static/queries-with.html ; we will do like them. SQL Server only has a MAXRECURSION hint (a number): https://technet.microsoft.com/en-us/library/ms175972%28v=sql.105%29.aspx which is 100 by default. We will not have it: if a query takes too long, the existing max_execution_time will kill it, or a manual KILL QUERY will. It's also possible for the user to stop recursion by having a counter in the recursive query. It's for these reasons that PG devs have refused such hint. Automatic cycle detection: Oracle automatically detects cycles: it throws an error when it sees a row which is the same as a row already present in the result; it does not detect infinite-loop queries not due to a cycle, like those generating an infinite sequence of numbers). PG doesn't detect anything. We won't detect anything. Recursive VIEW: In SQL99 A view can be created with CREATE RECURSIVE VIEW: create recursive view v as select 1 as a union select a+1 from v where a<10; is equivalent to create view v as with recursive qn (select 1 as a union select a+1 from qn where a<10) select * from qn; RECURSIVE view is not merge-able in the outer query and cannot have CHECK OPTION (See 11.32 ). We will not support it; workaround: use the equivalent CREATE VIEW above; it will also be non-mergeable (because based on UNION), and even though it will support CHECK OPTION it will not be insertable/updatable (due to UNION again) so CHECK OPTION will in fact be ineffective. Requirements on storage engine used for materializing the CTE ============================================================= These are in addition to those of WL#883. The table may have indexes. During materialization of the recursive CTE, the readers do table scan only (rnd_init(), then rnd_next() or rnd_pos()+rnd_next()). The writer does: - write_row() - no update - no delete - index_read_map()/index_next_same() (only if UNION DISTINCT with hash_field). It is critical that a reader cursor: (a) is not disturbed by the other readers' rnd_init/rnd_next() (b) is not disturbed by the writer's write_row() or the writer's index lookups. (c) has visibility of rows which the writer has previously written: reader must be able to catch up continuously, and must receive rows in the order they were inserted (otherwise it will hopelessly miss rows or process rows twice). As an exception, when the reader reports EOF, it is not required to see later-written rows. (d) is not disturbed by a conversion from too-large in-memory to on-disk (create_ondisk_from_heap() - thus the on-disk engine must offer a way to re-position the reader's scan on the on-disk row matching the in-mem row where the reader was before the conversion). To help this, the Server is able to tell the number of the in-mem row in insertion order. (e) is able to start a scan both from the first row, and from a certain row indicated with rnd_pos(), then continue with rnd_next(). (f) An implication of (e) is that positions reported by position() must be stable: even if a write happens later, the position of the row mustn't be changed. Modifications to MEMORY and Innodb storage engines ================================================== As the CTE is materialized either in MEMORY or in InnoDB "intrinsic tables", for those two engines to support the requirements above, small code changes are done in the engines. When later MEMORY is replaced by InnoDB "in-memory tables" some changes to the latter may be needed. MEMORY doesn't support (e) of the previous paragraph if a hash_field is used (which happens in practice when there is UNION DISTINCT and the concatenation of columns is >500 bytes); we then use InnoDB. Modifications to the optimizer trace ==================================== The existing flag 'repeated_subselect' in @@optimizer_trace_features is already used to disable tracing of the non-first executions of a subquery; we also make it disable tracing of the non-first executions of a recursive query block. CONNECT BY ========== Equivalence of WITH with Oracle's non-standard CONNECT BY: http://rajeshwaranbtech.blogspot.co.uk/2012/12/recursive-with-clause-to-implement.html References ========== http://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2077142 https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0059217.html https://msdn.microsoft.com/en-us/library/ms175972.aspx https://wiki.postgresql.org/wiki/CTEReadme https://books.google.co.in/books?id=wyhXvU0Eyg0C&pg=PA352&hl=en#v=onepage&q&f=false sums up SQL99 quite well.
see LLD of WL#883.
Copyright (c) 2000, 2020, Oracle Corporation and/or its affiliates. All rights reserved.