MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs), Part Two - how to generate series

[Update of April 2017: the features discussed here are now also available in the official release 8.0.1 of MySQL.]

Here is the second in a series of posts about CTEs, a new feature of MySQL 8.0, available in this Labs release. Here is the first post; there is also a third post. The first one ended with:

Inside the recursive CTE definition (the part in AS (…)), some syntax constraints must be respected […]

  • a recursive SELECT mustn’t contain GROUP BY, aggregate functions
    (like SUM), ORDER BY, LIMIT, DISTINCT (this rule doesn’t apply to the non-recursive/anchor/seed SELECT)
  • a recursive SELECT must reference the CTE only once and only in its
    FROM clause, not in any subquery. Of course, it can additionally reference other tables than the CTE and join them with the CTE, which can be very useful to build hierarchies (for example, if we have a table of bosses and employees and want to answer the question “who reports directly or indirectly to Mrs. X?”). If used in a JOIN like this, the CTE must not be on the right side of a LEFT JOIN.

So it is time to give some justifications for these constraints, which come from the SQL standard (except the exclusion of ORDER BY, LIMIT and DISTINCT, which is MySQL-specific, though also present in several well-known RDBMSs). Consider this example, producing numbers from 1 to 10:

The recursive SELECT looks like it’s operating on the set of rows of my_cte, as it selects from my_cte. But in fact, it must act as if it operated on each row, one by one, in turn, in isolation from any other row of my_cte. You have to view the recursive SELECT as a process which, from one row of iteration N, without knowledge of other rows of iteration N, produces some new rows for iteration N+1, and repeats the job for another row of iteration N, until it has processed all rows of iteration N one by one, after which it processes one row of iteration N+1, and so on.

With that rule in mind, let’s realize that:

  • GROUP BY needs all rows to put them into groups,
  • So does an aggregate function like SUM,
  • ORDER BY needs all rows to sort them,
  • LIMIT needs to eliminate rows based on the count of other rows,
  • DISTINCT needs to eliminate rows based on the existence of other rows,
  • Referencing my_cte twice in the FROM clause needs a row from my_cte and another row from my_cte, to join them,
  • Referencing my_cte in the FROM clause and in a subquery needs a row from my_cte for FROM and other rows of my_cte to evaluate the subquery,
  • Referencing my_cte in the right side of a LEFT JOIN needs all rows of my_cte, to know if at least one joins with the left side or if a NULL complement must be made.

Given that the recursive SELECT has access to only one row at a time, it is clear why all cases above cannot be allowed in a recursive SELECT.

This SQL rule has two advantages: first, it gives database systems the freedom to generate rows in any order of steps; second, it forbids quite a few unreasonable queries.

To see how this influences the patterns of queries, let’s now build a recursive query to generate Fibonacci numbers, which are 1, then 1, then 1+1 (=2), then 1+2 (=3), then 2+3 (=5), then 3+5 (=8) – the (N+2)th number is the sum of the (N+1)th and the (N)th.

A first approach would be: put the two first numbers, 1 and 1, in a seed SELECT:

then generate all other numbers with

My intention is: grab the (N)th and (N+1)th numbers, which are already in my_cte, and sum them; so: grab a row from my_cte, containing the (N)th number, and a row from my_cte, containing the (N+1)th number, and sum them; to grab the two rows at the same time I use a SQL JOIN, but I wonder how to make sure the JOIN pairs the right numbers. I need to add, as ON condition, something like “the row of cte_n_plus_1 must be the number right after the row of cte_n”. Otherwise I’ll also be summing the 5-th and the 9-th numbers, or whatever! So I need to add ordinal numbers to the table. Ok, so let’s add ordinal numbers to the seed SELECT:

and now we can write this recursive SELECT:

Alas, this won’t work, as it’s referencing my_cte twice in the FROM clause, which is forbidden as it violates the rule which says: “handle one row at a time”. Thus, I have to change my query so that all necessary information to calculate a new number is found in one row and doesn’t require two rows; for that, I’ll just store the N-th and (N+1)th number in one row:

and use this recursive SELECT:

Putting it all together, and adding a condition to stop around 500:

That’s it; the f column contains the desired Fibonacci sequence.

While we saw above that a recursive CTE can’t be joined with itself, it can perfectly be joined with other tables. Let’s use that to produce all 4-digit bit strings:

A first, non-recursive CTE, digits, holds the two building blocks 0 and 1. Then the recursive CTE, strings, starts with an empty string, then concatenates the result with the 0 and 1 of digits, and so on, so the expected result is: empty string, 0, 1, 00, 01, 10, 11, 000, etc, up to 1111.

The real result is ERROR 1406 (22001): Data too long for column ‘s’ at row 2.

At this point you may want to read towards the end of the previous blog post (search for “longer and longer”) which mentions that the type of column strings.s is determined from the non-recursive SELECT only i.e. SELECT ” AS s : and the type of the empty string () is CHAR(0)… Inserting ‘0’ into CHAR(0) would cause a character truncation, so: error!

A detail though: it fails because I’m using strict mode, which is the default starting from MySQL 5.7, and because strict mode hates truncation. My statement is a SELECT, and strict mode usually sends only warnings if a truncation happens during SELECT (whereas it sends errors during INSERT/UPDATE/DELETE), but we think it makes sense that SELECT become as strict as other statements in the future, and WITH RECURSIVE is one such case, so: error!

Back to our problem, I just have to make the column wider with a CAST and now it works:

With the examples above, you now have material to “program” many number-generating or string-generating queries. In the next post, we will see how to use recursive queries to explore hierarchical data (X-is-a-child-of-Y).

And, as always: