MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
MySQL 8.0: Query Optimizer Takes Data Buffering into Account

In earlier versions of MySQL, the query optimizer did not distinguish between data that was cached in the database buffer and data that had to be read from disk. The main reason was that the optimizer had no information about whether a table would have to be (partially) read from disk or already was present in the buffer pool. In MySQL 8.0, this has changed. InnoDB will now provide buffer estimates per table and index.

The default values for the configurable cost constants that were added in MySQL 5.7, have also been changed to reflect different costs of memory and disk access:

CostMySQL 5.7MySQL 8.0
Read a random disk page1.001.00
Read a data page from memory buffer1.000.25
Evaluate query condition0.200.10
Compare keys/rows0.100.05

The settings for these cost constants can be configured by updating the tables mysql.server_cost and mysql.engine_cost. To easier determine what are the default settings for these constants, we have in MySQL 8.0 added a column that will show the default value.

Example: DBT-3 Query 8

To show the benefits of condition filtering, we will look at Query 8 of the DBT-3 benchmark:

Query 8 is called National Market Share Query, and it finds the market share in Europe for French suppliers of a given part type. You do not need to understand this query in detail. The main point is that 8 tables are joined, and that it is important to find an efficient join order for the query to perform well.

Below we show two possible query plans for this query:

Plan A
Plan B

The good thing about Plan A is that it processes the part table early. This is good because the only high selectivity condition in this query is on part type. The drawback of plan A is that it uses a secondary index on the largest table, lineitem, while Plan B only use primary key indexes. Executing these query plans in MySQL 8.0.3 on a DBT-3 scale factor 10 database, gives the following execution times:

 In-memoryDisk-bound
Plan A5.8 secs9 min 47 secs
Plan B77.5 secs3 min 49 secs

We see that when all data is in memory (innodb_buffer_pool_size=32G), Plan A is definitely the best plan. However, when the buffer pool is smaller (innodb_buffer_pool_size=1G), Plan B is better. This is because when using Plan A in a disk-bound scenario, the use of a secondary index on the lineitem table will give more random disk access. Also, when using a secondary index, the same table page may have to be read from disk multiple times since it may have been evicted from the buffer pool since the previous access.

If we look at what query plan the Query Optimizer will use for Query 8 in different versions of MySQL, we see the following:

 In-memoryDisk-bound
MySQL 5.6Plan B
MySQL 5.7Plan A
MySQL 8.0Plan APlan B

In MySQL 5.6, Plan B was chosen for this query. This changed in MySQL 5.7 since the optimizer started take into account the filtering effect of conditions on non-indexed columns. Changing to Plan A reduced the execution time with over 90% when all data is in memory. However, in the disk-bound scenario, the execution takes 2.5 times longer than in 5.6. In MySQL 8.0 you get the best of both worlds. Plan A will be used when all data is in memory, and Plan B will be used when most data need to be fetched from disk.

Thank you for using MySQL !