MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs)

[Update of April 2017: the features discussed here are now also available in the official release 8.0.1 of MySQL.]

[Note: this is the first post in a series; there is also a second post].

The MySQL development team just published a Labs release of the MySQL Server (available under “MySQL Server 8.0.0 Optimizer”).
A prominent feature of this release, which I developed, is [Recursive] Common Table Expressions, also known as

  • [recursive] CTE,
  • [recursive] subquery factoring,
  • WITH [RECURSIVE] clause.

Three years ago I had shown a way to emulate CTEs in a blog post, but what we have now in MySQL is the real thing, no ersatz!

So this is the first of several blog posts which will tour all details about this new feature.

A derived table is a subquery in the FROM clause, in bold font below:

SELECT … FROM (SELECT …) AS derived_table;

Derived tables exist in MySQL since long. It’s accurate to say that non-recursive CTEs are “improved derived tables”. Here is a first example:

We have a CTE cte4, which is built from other CTEs cte3 and cte2 which itself is built from cte1. As one reader of this post (Vasiliy) commented, the same can be achieved with derived tables:

However if you read both queries, the CTE-based one flows more nicely as it defines table after table, “linearly”, and we see the sentences forming as we read from top to bottom, whereas the derived-table-based query looks “tree-like” and “turned inside-out” (see how the first  word, “This “, is deeply nested). CTEs help writing a readable query, which is  a win for future maintenance.

Due to the limited size of a blog post, I’ll cover non-recursive CTEs more in depth in a next post – I hope the example above has wet your appetite. Today, I’ll rather look at recursive CTEs which, in my humble opinion, are even more innovative to traditional SQL than
non-recursive CTEs are.

A recursive CTE is a set of rows which is built iteratively: from an initial set of rows, a process derives new rows, which grow the set, and those new rows are fed into the process again, producing more rows, and so on, until the process produces no more rows.

The simplest possible syntax is to include this

into your SELECT, INSERT, UPDATE, DELETE statement, or inside any SELECT subquery.

Let’s walk through a first example, producing integers from 1 to 10:

If we compare it to the syntax rule, we see that:

  • the CTE is named my_cte
  • the CTE’s definition is included as prefix of SELECT statement SELECT * FROM my_cte
  • the initial set is SELECT 1 AS n so it’s a row containing “1”. The meta-data of this row is used to define the column of my_cte: therefore, my_cte has one column of type INT (the type of “1”), named n due to AS n.
  • the process which takes rows in input and produces new rows in output, is described by
    SELECT 1+n FROM my_cte WHERE n<10
    which means: take rows in my_cte which match n<10, and for each of them,
    produce a row with an incremented n.
  • the definition of my_cte is clearly recursive, as it contains a reference to my_cte (in the FROM clause of the second SELECT). That’s how you can distinguish a recursive CTE from a non-recursive one.

So MySQL will take these steps:

  • iteration 0: create the initial set of rows, noted S0: S0={1},
  • iteration 1: run process on result of iteration 0 (on S0): this produces new set S1={1+1}={2},
  • iteration 2: run process on result of iteration 1 (on S1): this produces S2={1+2}={3},
  • and so on,
  • iteration 9: run process on result of iteration 8 (on S8): produces S9={1+9}={10},
  • iteration 10: run process on result of iteration 9 (on S9): no row of S9 matches n<10, so produce nothing, and that triggers the termination of the loop,
  • the final result in my_cte, which the outer SELECT will see, is the union of S0, S1, … and S9: {1,2,…,9,10}.

The initial set of rows is sometimes referred to as “the non-recursive SELECT”, “the anchor SELECT” or “the seed SELECT”. The SELECT describing the row production process is “the recursive SELECT”; it is “recursive” because it reads my_cte.

Here is a more complete syntax:

Thus:

  • you can define the initial set as a union of several SELECTs
  • you can define the process to produce new rows as several SELECTs whose result is
    union-ed at every iteration.
  • this CTE definition can be followed by another CTE definition, which may use the first CTE.
  • in a single WITH clause you can mix non-recursive CTEs and recursive CTEs (just note that as long as your clause has at least one recursive CTE it must start with the words WITH RECURSIVE).
  • the CTE’s column names, instead of being specified with AS aliases in the first SELECT, can be put right after the CTE’s name: cte_name(n).

Let’s take our 1-to-10 example, reduce it to 1-to-6 to save screen space, name the column using the my_cte(n) syntax, and use the result of my_cte to create a table:

The CTE can also be used in INSERT (and REPLACE):

In UPDATE (single-table and multi-table):

And in DELETE (single-table and multi-table); notice how the CTE is referenced from a subquery:

And in a similar DELETE, where the CTE is defined in the subquery itself, not anymore in front of DELETE:

Pretty flexible, isn’t it?

A word of caution: what would happen if I forgot to type the WHERE n<6 clause? It looks like the query would run forever, as it would always produce new rows (no reason to stop at 6, anymore). But, starting from MySQL 8.0.3, there is an upper bound on the number of allowed iterations (1000), so the query would stop with an error when it has done 1000 iterations; and that bound can be increased or decreased by setting variable @@cte_max_recursion_depth. In older versions, I would have to rely on other means: if using the mysql command-line client, I could stop that by typing Control-C; with another client, I would open another session and use KILL ; or I could put this at the start of my session:

so that the runaway query aborts automatically after 10 seconds, if the WHERE clause wasn’t correct.

That’s all for today. In next posts, I will continue on this topic, but there is already enough here to allow you to experiment with the feature. Before you do, please take note of two things which can save your time:

  1. If you use a recursive CTE to create longer and longer strings (for example by using CONCAT() in the recursive SELECT), remember that the CTE’s column type is determined based on the  non-recursive SELECTs only: so, in the non-recursive SELECTs, make the column wide enough with CAST (… AS CHAR(<length>)), or your long strings may not fit.
  2. Inside the recursive CTE definition (the part in AS (…)), some syntax constraints must be respected (the reason for which will become clear in next posts):
    • all SELECTs must be connected with UNION ALL or UNION DISTINCT 1
    • a recursive SELECT mustn’t contain GROUP BY, aggregate functions
      (like SUM), ORDER BY, LIMIT, DISTINCT (this rule doesn’t apply to the non-recursive/anchor/seed SELECT)
    • a recursive SELECT must reference the CTE only once and only in its
      FROM clause, not in any subquery. Of course, it can additionally reference other tables than the CTE and join them with the CTE, which can be very useful to build hierarchies (for example, if we have a table of bosses and employees and want to answer the question “who reports directly or indirectly to Mrs. X?”). If used in a JOIN like this, the CTE must not be on the right side of a LEFT JOIN.

That’s it! And, as always,

1. MySQL 8.0.1 supports both types of unions; the older Labs release supported only UNION ALL.