WL#8737: IO aware defaults for optimizer cost constants

Affects: Server-8.0   —   Status: Complete

WL#7340 introduced separate cost constants for access to data that is in a
memory buffer and data that needs to be read from a secondary storage device. To
reduce the impact that the worklog would have on performance of queries
(performance regressions) it was decided to keep the existing default values for
cost constants in WL#7340 and move changing default values to a separate worklog.

This worklog will change the default values to use different cost constants for
the cost of accessing data that is in memory and data that needs to be read from
disk.  The changes will be based on experiments with MySQL in both disk-bound
and cpu-bound settings.  However, to reduce the risk of regressions, the changes
will be somewhat conservative.

(Initially this worklog contains text copied from WL#7340 related to adjusting
cost constants.)

User Documentation
==================

*
http://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html#mysqld-8-0-1-optimizer
* http://dev.mysql.com/doc/refman/8.0/en/cost-model.html#cost-model-database
(comment at end of section)
OVERVIEW:
=========

This worklog implements the following main functionality:

1. Adjust the values for the cost constants "io_block_read_cost" and
   "memory_block_read_cost" to take into account that it is cheaper to read a
   block in a main memory buffer compared to reading from disk.

DETERMINING DEFAULT VALUES FOR COST CONSTANTS
=============================================

The following alternatives are considered for determining default values for the
new "memory_block_read_cost" and the existing "io_block_read_cost" cost constant:

 Alt 0. Let "memory_block_read_cost" have the same default value as
        "io_block_read_cost". With this alternative, this worklog will not
        cause any changes in query plans and thus avoid regressions. Advanced
        users can change these values based on their load and our
        recommendations.

 Alt 1. Keep the current default value for "io_block_read_cost" but set a lower
        default value for "memory_block_read_cost" to reflect that accessing
        data that likely is in a memory buffer is cheaper than reading from
        disk.

 Alt 2. Increase the default value for "io_block_read_cost" based on the
        assumption that with the estimates from the handler that data is on
        disk, the current value is too low. Set the default value for
        "memory_block_read_cost" to be lower than the current value for
        "io_block_read_cost".

The decision is to implement alternative 2 (in WL#7340 alternative 0 was
chosen). This is the alternative that likely
will provide the best overall performance improvements but is also the
alternative that has the potential to produce the most severe performance
regressions. Since the guesstimates that the handler produces for whether a
table or index has pages in memory or on disk is just based on a heuristic, we
should be fairly conservative when setting the new default values for these two
cost constants. When the storage engines produce better estimates, these cost
constants should likely be re-evaluated.

The following strategy will be used for finding new default values for
"memory_block_read_cost" and "io_block_read_cost":

1. Compare the performance between two alternative access methods that are
   highly dependent on the cost of data access but has very different access
   pattern to the data: table scan versus range scan on a secondary index. We
   will do this comparison for:

   a) different cost constant values
   b) where all data is either in the InnoDB memory buffer or "all" data needs
      to be read from disk.

   For these cases, we will use the cost constant value that causes the cost
   model do the "best" switch between table scan and range scan to provide a
   suggestion for new default values.

2. Use DBT3 to validate that the selected cost constants produce improvements
   but do not introduce regressions. This will be done both for a setting where
   the entire DBT3 database is in memory and where most of it is only on disk.

See the low level design section for the result of this evaluation.

Note: We would like to keep the existing definition of what is the "unit" for
cost estimates to be the "cost of doing one random io". So if we use alternative
2 and find a new default value for "io_block_read_cost" to be larger than 1.0
(e.g. 3.0), then we will do a final adjustments for the default value of all
cost constants by dividing the existing default value by the new value for
"io_block_read_cost" (e.g. divide all default cost constant values by 3.0).
Determining candidates for new default values for cost constants:
=================================================================

To test alternative cost constants on a single table, the following table is used:

CREATE TABLE t1 (
  pk INTEGER PRIMARY KEY,
  i1 INTEGER NOT NULL,
  i2 INTEGER NOT NULL,
  i3 INTEGER NOT NULL,
  i4 INTEGER NOT NULL,
  i5 INTEGER NOT NULL,
  i6 INTEGER NOT NULL,
  i7 INTEGER NOT NULL,
  i8 INTEGER NOT NULL,
  c1 CHAR(200),
  INDEX k1 (i1),
  INDEX k2 (i2),
  INDEX k3 (i3),
  INDEX k4 (i4),
  INDEX k5 (i5),
  INDEX k6 (i6),
  INDEX k7 (i7)
) ENGINE=InnoDB;

This table is filled with 10 million records. The average record size is 302
bytes. The total size of the table is about 4.7 GB (2.9 GB data and 1.6 GB
indexes). 

To get the execution time for a table scan we use the following query:

 SELECT * FROM t1 WHERE i8 = 2;

To get execution times for range scans on a secondary table we use:

 SELECT * FROM t1 WHERE i7 > LOWER_LIMIT and i7 < UPPER_LIMIT and i8=2;

where LOWER_LIMIT AND UPPER_LIMIT is set to produce the wanted size of the range
interval. In the tests below this query is run with range intervals from 1
record up to 2 million (20% of the records in the table).

Estimate for "memory_block_read_cost":
======================================

To get a proposal for a default value for the "memory_block_read_cost" cost
constant we use a server with a 36 GB InnoDB database buffer and run the
following tests:

a) Use the single table test:

   The results from running the above range scan query with
   "memory_block_read_cost" values 1.0, 0.5 and 0.2 are presented in the figure
   named scan_range_memory.png. The figure shows the execution times for the
   query as we increase the number of records in the range scan from 0% to 20%
   of the table. 

   From this figure we see:

   - with 1.0 as the value for "memory_block_read_cost" (i.e. this is the same
     behavior as before this worklog), we see that the optimizer switches from
     doing range scan to do full table scan when the range interval is about 9%
     of the data. When this happens, we have a large increase in execution
     time: from 1.5 seconds to almost 6 seconds. 

   - using 0.5 as the value for "memory_block_read_cost", the switch from range 
     scan to table scan takes place when the range scan reads about 15% of the
     data. Also here we have an increase in the execution time, from 2.5
     seconds to almost 6 seconds.

   - using 0.2 postpones the switch from range scan to table scan to beyond 20%
     of the records in the table and produces lower execution times for much
     larger range scans.

   Based on this experiment, selecting a default value for
   "memory_block_read_cost" of about 0.2 seems like a good choice for getting a
   good point from switching from range scans to table scan for this query.

b) Result from running DBT3:

   Using a scale factor 10 database and running with the same values for the
   "memory_block_read_cost" as above, three of the queries showed changes in
   execution times and query plans:

        1.0    0.5    0.2
  
   Q6   31.4   20.7   21    (34% improvement, range scan instead of table
scan)
   Q7   10.4   11.5   16.5  (49% regression, caused by new join order)
   Q12  35.5   23.7   23.5  (34% improvement, range scan instead of table
scan)
   Q20  3.17   3.21   6.73  (105% regression, caused by using FirstMatch instead
of Materialization)

   This shows that selecting 0.5 gives no regressions in DBT3 while 0.2 causes
   one query to perform worse (with the exception for a tiny regression Q7).

Based on the above experiments, the suggested initial default value for
"memory_block_read_cost" is 0.5.

Estimate for "io_block_read_cost":
==================================

To get a proposal for a default value for the "io_block_read_cost" cost constant
we use a server with 1 GB InnoDB database buffer and run the following tests:

a) Use the single table test:

   The results from running the above range scan query with
   "io_block_read_cost" values 1.0 (baseline), 2.0, 3.0 and 4.0 are presented
   in the figure named scan_range_disk.png. The figure shows the execution
   times for the query as we increase the number of records in the range scan
   from 0% to 20% of the table. 

   From this figure we see:

   - with 1.0 as the value for "io_block_read_cost" (i.e. this is current
     default value), we see that the optimizer switches from
     doing range scan to do full table scan when the range interval is about 9%
     of the data. When this happens, we have a large decrease in execution
     time: from 2400 seconds to almost 45 seconds. 

   - using 2.0 and 3.0 as the value for "io_block_read_cost", the switch from
     range scan to table scan takes place when the range scan reads about 4% of
     the data. Also here we have a decrease in the execution time, from 300
     seconds seconds to 45 seconds. (Note: The reason for the large difference
     between using 1.0 and 2.0 in the execution time for the range scan is that
     with 1.0, this range scan is done with default MRR while with 2.0 it is
     done using DS-MRR).

   - using 4.0 moves the switch from range scan to table scan to about 3%
     of the records in the table and produces lower execution times for the
     query.

   Based on this experiment, selecting a default value for
   "io_block_read_cost" that is higher than the current default seems like a
   good choice for getting a better point from switching from range scans to
   table scan for this query.

b) Result from running DBT3:

   Using a scale factor 10 database and running with the same values
   for the "io_block_read_cost" as above, three of the queries showed
   changes in execution times and query plans. Note that these runs
   are with "mrr_cost_based=off, batched_key_access=on":

        1.0  2.0/3.0  4.0
  
   Q5   3560   276    276  (92% reduction, improved join order)
   Q8   1118   627    634  (44% reduction, improved join order)
   Q12   207   208    ???  (large regression, query "never" finished)

   This shows that selecting 2.0 or 3.0 gives no regressions in DBT3 while 4.0
   causes a large regression for one of the queries.

Based on the above experiments, the proposal is to change the default value for
"io_block_read_cost" from 1.0 to 2.0.