WL#8737: IO aware defaults for optimizer cost constants
Affects: Server-8.0
—
Status: Complete
WL#7340 introduced separate cost constants for access to data that is in a memory buffer and data that needs to be read from a secondary storage device. To reduce the impact that the worklog would have on performance of queries (performance regressions) it was decided to keep the existing default values for cost constants in WL#7340 and move changing default values to a separate worklog. This worklog will change the default values to use different cost constants for the cost of accessing data that is in memory and data that needs to be read from disk. The changes will be based on experiments with MySQL in both disk-bound and cpu-bound settings. However, to reduce the risk of regressions, the changes will be somewhat conservative. (Initially this worklog contains text copied from WL#7340 related to adjusting cost constants.) User Documentation ================== * http://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-1.html#mysqld-8-0-1-optimizer * http://dev.mysql.com/doc/refman/8.0/en/cost-model.html#cost-model-database (comment at end of section)
OVERVIEW: ========= This worklog implements the following main functionality: 1. Adjust the values for the cost constants "io_block_read_cost" and "memory_block_read_cost" to take into account that it is cheaper to read a block in a main memory buffer compared to reading from disk. DETERMINING DEFAULT VALUES FOR COST CONSTANTS ============================================= The following alternatives are considered for determining default values for the new "memory_block_read_cost" and the existing "io_block_read_cost" cost constant: Alt 0. Let "memory_block_read_cost" have the same default value as "io_block_read_cost". With this alternative, this worklog will not cause any changes in query plans and thus avoid regressions. Advanced users can change these values based on their load and our recommendations. Alt 1. Keep the current default value for "io_block_read_cost" but set a lower default value for "memory_block_read_cost" to reflect that accessing data that likely is in a memory buffer is cheaper than reading from disk. Alt 2. Increase the default value for "io_block_read_cost" based on the assumption that with the estimates from the handler that data is on disk, the current value is too low. Set the default value for "memory_block_read_cost" to be lower than the current value for "io_block_read_cost". The decision is to implement alternative 2 (in WL#7340 alternative 0 was chosen). This is the alternative that likely will provide the best overall performance improvements but is also the alternative that has the potential to produce the most severe performance regressions. Since the guesstimates that the handler produces for whether a table or index has pages in memory or on disk is just based on a heuristic, we should be fairly conservative when setting the new default values for these two cost constants. When the storage engines produce better estimates, these cost constants should likely be re-evaluated. The following strategy will be used for finding new default values for "memory_block_read_cost" and "io_block_read_cost": 1. Compare the performance between two alternative access methods that are highly dependent on the cost of data access but has very different access pattern to the data: table scan versus range scan on a secondary index. We will do this comparison for: a) different cost constant values b) where all data is either in the InnoDB memory buffer or "all" data needs to be read from disk. For these cases, we will use the cost constant value that causes the cost model do the "best" switch between table scan and range scan to provide a suggestion for new default values. 2. Use DBT3 to validate that the selected cost constants produce improvements but do not introduce regressions. This will be done both for a setting where the entire DBT3 database is in memory and where most of it is only on disk. See the low level design section for the result of this evaluation. Note: We would like to keep the existing definition of what is the "unit" for cost estimates to be the "cost of doing one random io". So if we use alternative 2 and find a new default value for "io_block_read_cost" to be larger than 1.0 (e.g. 3.0), then we will do a final adjustments for the default value of all cost constants by dividing the existing default value by the new value for "io_block_read_cost" (e.g. divide all default cost constant values by 3.0).
Determining candidates for new default values for cost constants: ================================================================= To test alternative cost constants on a single table, the following table is used: CREATE TABLE t1 ( pk INTEGER PRIMARY KEY, i1 INTEGER NOT NULL, i2 INTEGER NOT NULL, i3 INTEGER NOT NULL, i4 INTEGER NOT NULL, i5 INTEGER NOT NULL, i6 INTEGER NOT NULL, i7 INTEGER NOT NULL, i8 INTEGER NOT NULL, c1 CHAR(200), INDEX k1 (i1), INDEX k2 (i2), INDEX k3 (i3), INDEX k4 (i4), INDEX k5 (i5), INDEX k6 (i6), INDEX k7 (i7) ) ENGINE=InnoDB; This table is filled with 10 million records. The average record size is 302 bytes. The total size of the table is about 4.7 GB (2.9 GB data and 1.6 GB indexes). To get the execution time for a table scan we use the following query: SELECT * FROM t1 WHERE i8 = 2; To get execution times for range scans on a secondary table we use: SELECT * FROM t1 WHERE i7 > LOWER_LIMIT and i7 < UPPER_LIMIT and i8=2; where LOWER_LIMIT AND UPPER_LIMIT is set to produce the wanted size of the range interval. In the tests below this query is run with range intervals from 1 record up to 2 million (20% of the records in the table). Estimate for "memory_block_read_cost": ====================================== To get a proposal for a default value for the "memory_block_read_cost" cost constant we use a server with a 36 GB InnoDB database buffer and run the following tests: a) Use the single table test: The results from running the above range scan query with "memory_block_read_cost" values 1.0, 0.5 and 0.2 are presented in the figure named scan_range_memory.png. The figure shows the execution times for the query as we increase the number of records in the range scan from 0% to 20% of the table. From this figure we see: - with 1.0 as the value for "memory_block_read_cost" (i.e. this is the same behavior as before this worklog), we see that the optimizer switches from doing range scan to do full table scan when the range interval is about 9% of the data. When this happens, we have a large increase in execution time: from 1.5 seconds to almost 6 seconds. - using 0.5 as the value for "memory_block_read_cost", the switch from range scan to table scan takes place when the range scan reads about 15% of the data. Also here we have an increase in the execution time, from 2.5 seconds to almost 6 seconds. - using 0.2 postpones the switch from range scan to table scan to beyond 20% of the records in the table and produces lower execution times for much larger range scans. Based on this experiment, selecting a default value for "memory_block_read_cost" of about 0.2 seems like a good choice for getting a good point from switching from range scans to table scan for this query. b) Result from running DBT3: Using a scale factor 10 database and running with the same values for the "memory_block_read_cost" as above, three of the queries showed changes in execution times and query plans: 1.0 0.5 0.2 Q6 31.4 20.7 21 (34% improvement, range scan instead of table scan) Q7 10.4 11.5 16.5 (49% regression, caused by new join order) Q12 35.5 23.7 23.5 (34% improvement, range scan instead of table scan) Q20 3.17 3.21 6.73 (105% regression, caused by using FirstMatch instead of Materialization) This shows that selecting 0.5 gives no regressions in DBT3 while 0.2 causes one query to perform worse (with the exception for a tiny regression Q7). Based on the above experiments, the suggested initial default value for "memory_block_read_cost" is 0.5. Estimate for "io_block_read_cost": ================================== To get a proposal for a default value for the "io_block_read_cost" cost constant we use a server with 1 GB InnoDB database buffer and run the following tests: a) Use the single table test: The results from running the above range scan query with "io_block_read_cost" values 1.0 (baseline), 2.0, 3.0 and 4.0 are presented in the figure named scan_range_disk.png. The figure shows the execution times for the query as we increase the number of records in the range scan from 0% to 20% of the table. From this figure we see: - with 1.0 as the value for "io_block_read_cost" (i.e. this is current default value), we see that the optimizer switches from doing range scan to do full table scan when the range interval is about 9% of the data. When this happens, we have a large decrease in execution time: from 2400 seconds to almost 45 seconds. - using 2.0 and 3.0 as the value for "io_block_read_cost", the switch from range scan to table scan takes place when the range scan reads about 4% of the data. Also here we have a decrease in the execution time, from 300 seconds seconds to 45 seconds. (Note: The reason for the large difference between using 1.0 and 2.0 in the execution time for the range scan is that with 1.0, this range scan is done with default MRR while with 2.0 it is done using DS-MRR). - using 4.0 moves the switch from range scan to table scan to about 3% of the records in the table and produces lower execution times for the query. Based on this experiment, selecting a default value for "io_block_read_cost" that is higher than the current default seems like a good choice for getting a better point from switching from range scans to table scan for this query. b) Result from running DBT3: Using a scale factor 10 database and running with the same values for the "io_block_read_cost" as above, three of the queries showed changes in execution times and query plans. Note that these runs are with "mrr_cost_based=off, batched_key_access=on": 1.0 2.0/3.0 4.0 Q5 3560 276 276 (92% reduction, improved join order) Q8 1118 627 634 (44% reduction, improved join order) Q12 207 208 ??? (large regression, query "never" finished) This shows that selecting 2.0 or 3.0 gives no regressions in DBT3 while 4.0 causes a large regression for one of the queries. Based on the above experiments, the proposal is to change the default value for "io_block_read_cost" from 1.0 to 2.0.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.