MySQL 8.0 introduces Common Table Expressions (CTE). My colleague Guilhem has written several blog posts on how to use CTEs , and you can also read about it in the MySQL 8.0 Reference Manual. In this blog post, I will focus on how using a CTE instead of a view or a derived table can improve performance.
As usual, my example query will be a query from the DBT-3 benchmark. This time, we will look at Query 15, the Top Supplier Query:
1
2
3
4
5
|
SELECT s_suppkey, s_name, s_address, s_phone, total_revenue FROM supplier, revenue0 WHERE s_suppkey = supplier_no AND total_revenue = (SELECT MAX(total_revenue) FROM revenue0) ORDER BY s_suppkey; |
This query is referring a view, revenue0
, twice; both in the FROM clause and in a subquery in the WHERE clause. The view is defined as follows:
1
2
3
4
5
6
|
CREATE VIEW revenue0(supplier_no , total_revenue) AS SELECT l_suppkey, SUM(l_extendedprice * (1 - l_discount)) FROM lineitem WHERE l_shipdate >= '1996-07-01' AND l_shipdate < DATE_ADD('1996-07-01', INTERVAL '90' DAY) GROUP BY l_suppkey; |
This view finds the total revenue for each supplier during a specified 90 day period. Since the view contains a GROUP BY clause, MySQL can not merge the view into the query where it is used. Instead, it will execute the view and store the result in a temporary table. Visual EXPLAIN in MySQL Workbench shows the following query plan for Query 15:
From the diagram we see that the view is executed and materialized twice; once for each reference.
To use a CTE instead of a view, we just put the view definition it in front of the query, using a WITH clause:
1
2
3
4
5
6
7
8
9
10
11
|
WITH revenue0(supplier_no , total_revenue) AS ( SELECT l_suppkey, SUM(l_extendedprice * (1 - l_discount)) FROM lineitem WHERE l_shipdate >= '1996-07-01' AND l_shipdate < DATE_ADD('1996-07-01', INTERVAL '90' DAY) GROUP BY l_suppkey ) SELECT s_suppkey, s_name, s_address, s_phone, total_revenue FROM supplier, revenue0 WHERE s_suppkey = supplier_no AND total_revenue = (SELECT MAX(total_revenue) FROM revenue0) ORDER BY s_suppkey; |
Visual EXPLAIN shows that the CTE is now only materialized once:
In other words, MySQL will use a single temporary table for multiple references to a CTE. For Query 15, the materialization is by far the most time consuming part of the execution. Hence, the execution time is almost cut in half by doing the materialization only once: