MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Support for LATERAL derived tables added to MySQL 8.0.14

In the just-released MySQL 8.0.14 I added a feature called LATERAL derived tables.

The manual describes the syntax and has examples of how the feature can be used to find greatest values in tables. In the present post I’m going to consider a different problem solved by LATERAL: let’s say that we have a bunch of nodes, and want to make a “random graph”, by connecting every node to other nodes.

So we start with a table of nodes:

create table nodes(id int);

which will fill with 20 nodes, using a recursive Common Table Expression (another feature which I had added to MySQL 8.0.1 🙂 ) :

insert into nodes
with recursive cte(n) as
(
  select 1
  union all
  select n+1 from cte where n<20
)
select * from cte;

Now, let’s create the random edges on our graph. They are directional (have a “from” and a “to”).

create table edges (from_id int, to_id int);

For each origin node, let’s pick up two random target nodes (two… for now), and connect them to the origin node.

If we do this with an ordinary derived table storing the two target nodes:

insert into edges(from_id, to_id)
select origin_nodes.id, target_nodes.id
from nodes as origin_nodes,
(
  select id
  from nodes
  order by rand()
  limit 2
) as target_nodes;

this derived table is calculated (materialized) once and only once, at start of execution of the INSERT query. In pseudo-code, this does:

select two random target nodes, store into target_nodes
for each row R in origin_nodes:
  join R with target_nodes
  insert the result into edges

So all nodes get connected to the same two target nodes chosen at the start:

That wasn’t the desired result; so scratch that:
delete from edges;

Before throwing the previous SELECT query away, let’s take note of its execution plan with EXPLAIN:

To solve our problem, we need something which is re-calculated for each origin node, like in this pseudo-code:

for each row R in origin_nodes:
  select two random target nodes, store into target_nodes
  join R with target_nodes
  insert the result into edges

To achieve this, we need to tell MySQL that the selection of two random target nodes is something that depends on the current origin node, so that it’s repeated every time. So we make the target nodes artificially depend on the origin node:

insert into edges(from_id, to_id)
select origin_nodes.id, target_nodes.id
from nodes as origin_nodes,
     (
       select id
       from nodes
       order by rand()+0*origin_nodes.id
       limit 2
     ) as target_nodes;

This 0*origin_nodes.id won’t change values of the ORDER BY clause, but it
makes this clause, and thus the derived table target_nodes, depend on
the current row of origin_nodes.

But ordinary derived tables are not allowed to depend on previous tables
of the FROM clause, so when we run the above query we get:

ERROR 1054 (42S22): Unknown column 'origin_nodes.id' in 'order clause'

So we make it a LATERAL derived table, which by definition is: a derived table
allowed to depend on previous tables of the FROM clause.

insert into edges(from_id, to_id)
select origin_nodes.id, target_nodes.id
from nodes as origin_nodes,
     LATERAL (
               select id
               from nodes
               order by rand()+0*origin_nodes.id
               limit 2
             ) as target_nodes;

Now it works as desired, target nodes vary:

And EXPLAIN shows:

Note the new indications:
DEPENDENT DERIVED: the derived table depends on another table;
Rematerialize on the line of origin_nodes: every time we read a row from origin_nodes, MySQL re-materializes the derived table derived2 (which is the MySQL-internal name for the derived table target_nodes).

Let’s make our problem more complicated: so far we connected each origin node to two random target nodes; what if I want the number of target nodes to be random too? Say, connect each node to a number of random nodes between zero and four?

Looking at our previous successful query:

insert into edges(from_id, to_id)
select origin_nodes.id, target_nodes.id
from nodes as origin_nodes,
     lateral (
               select id
               from nodes
               order by rand()+0*origin_nodes.id
               limit 2
             ) as target_nodes;

we now have to make the “2” in LIMIT 2 a random value between 0 and 4.
Alas, LIMIT isn’t allowed to vary so we’re going to give up on LIMIT and do it slightly differently. To build the 0-to-4 target nodes we will

  • select all nodes as targets,
  • number them in random order (using the ROW_NUMBER window function),
  • filter them out based on their row_number, thanks to a WHERE clause
    which compares the row_number to a random maximum value between 0 and 4.

delete from edges;
insert into edges(from_id, to_id)
select origin_nodes.id, target_nodes.id
from nodes as origin_nodes,
lateral (
          select id,
        row_number() over (order by rand()+0*origin_nodes.id) as rn
          from nodes
          limit 4
        ) as target_nodes
where target_nodes.rn<(rand()*5);

As you can see, for input node of id 1 we have three target nodes (id 2, 4, 8), for input node 2 we have one (id 20), for 3 we have one, for 4 we have two, … Each input node has a varying number of target nodes, as desired.

Summing up, in MySQL we now have LATERAL, so every time you think “for each row let’s do this” you can probably find a solution with a LATERAL derived table.

I hope you’ll find this feature useful. Thank you for using MySQL!