WL#7340: IO aware cost estimate function for data access
Affects: Server-Prototype Only
—
Status: Complete
The optimizer cost model does not have information about whether data is cached in a memory buffer or needs to be read from disk (or fetched over a network in the case of NDB). When estimates are supported by storage engines, the cost models for data access methods should be updated to take into account whether data will be read from disk or already be present in a memory buffer. This worklog extends the optimizer cost model to: -use the estimates from the handler/storage engines about how much of tables and indexes are present in memory -use different cost constants for the cost of accessing data that is in memory and data that needs to be read from disk Note that currently, the estimates about whether data is in memory or needs to be read from disk is just based on heuristics (with the exception of the memory engine). The accuracy of these estimates will be greatly improved when support for these estimates are implemented by the storage engines. User Documentation ================== http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-8.html http://dev.mysql.com/doc/refman/5.7/en/cost-model.html
Functional requirements: ======================== F-1: After having installed or upgraded the MySQL server the mysql.engine_cost table should contain a record for the new cost constant "memory_block_read_cost". F-2: In the mysql.engine_cost table, the cost_value column for "memory_block_read_cost" should have the value NULL. F-3: Changes to the new "memory_block_read_cost" cost constant shall follow the requirements specified in F-3 to F-21 in WL#7315. F-4: Changing the value for "memory_block_read_cost" will influence on the cost of reading data that is in main memory. Lowering the value will make the use of tables in a memory buffer cheaper. This can influence the selected query plan, particularly on choice of access method and join order. F-5: Changing the value for "memory_block_read_cost" will influence the cost estimates presented in EXPLAIN JSON. F-6: Changing the value for "memory_block_read_cost" will influence the cost estimates presented in optimizer trace. Non-functional requirements: ============================ NF-1: Since this worklog may change cost estimates for accessing data and indexes, it will cause changes in query plans. The performance of queries where the query plan is changed may be different. Some queries may run faster, other may run slower.
WL#7168: API for estimates for how much of table and index data that is in memory buffer
WL#7170: InnoDB buffer estimates for tables and indexes
WL#7170: InnoDB buffer estimates for tables and indexes
GOALS: ====== 1. The optimizer cost model should use the estimates from the handler/storage engines about whether data or indexes are in a memory buffer or need to be read from disk (see WL#7168). 2. Introduce different cost constants for the cost of reading data from a memory buffer versus having to read data from disk. Non-goals: a. This worklog will not change the cost calculation functions used by the optimizer. Only the cost estimate for accessing a page will be adjusted to reflect where the page currently is available. Improving the cost calculation formulas will be the topic of future worklogs. b. This worklog will not attempt to do fine-tuning for existing cost constants. Only the cost constants directly related to reading a page will be adjusted. Further tuning will be the topic of future worklogs. OVERVIEW: ========= This worklog implements the following main functionality: 1. A new cost constant named "memory_block_read_cost" is added to the mysql.engine_cost table. This represents the cost of reading a data page that is present in a main memory buffer. This cost constant is similar to the "io_block_read_cost" which represents the cost of reading a data page from disk. 2. Change all cost calculations in the optimizer to take into account the guesstimates about whether pages for a table or an index is in memory or need to be read from disk. These guesstimates are implemented in WL#7168. For pages that likely are in memory, the new cost calculation will use the "memory_block_read_cost" instead of the existing "io_block_read_cost". NEW COST CONSTANT: memory_block_read_cost ========================================= A new cost constant will be added to the engine_cost table: * memory_block_read_cost This will be used in the optimizer cost calculations when calculating the cost of accessing a data or index page that likely is present in a main memory buffer. If the page needs to be read from disk, the existing "io_block_read_cost" cost constant will be used. CHANGES TO COST CALCULATIONS ============================ The cost model calculations that today take into account that data needs to be read from disk use the following function to get the cost for reading the data from disk: double Cost_model_table::io_block_read_cost(blocks) This function uses the "io_block_read_cost" cost constant and assumes that all pages will be read from disk. In cost calculations where data might be in a memory buffer, the use of Cost_model_table::io_block_read_cost() will be replaced by two new functions that will provide cost estimates for reading data from tables and indexes. These functions will take into account whether the pages are likely in a memory buffer or needs to be read from disk: double page_read_cost(pages) double page_read_cost_index(index, pages) These functions will use estimates about how much of the table or index that is in a memory buffer from the handler/storage engine to calculate the number of pages that likely are in memory versus the number of pagess that need to be read from disk. The handler API for providing these estimates was implemented in WL#7168. In the current implementation, these estimates are based on a heuristics (also implemented in WL#7168). These two functions will do the following to calculate the cost of accessing pages: double Cost_model_table::page_read_cost(pages) { in_mem_estimate= handler::table_in_memory_estimate(); pages_in_mem= pages * in_mem_estimate; pages_on_disk= pages - pages_in_mem; cost= pages_in_mem * MEMORY_BLOCK_READ_COST + pages_on_disk * IO_BLOCK_READ_COST; return cost; } In this function, IO_BLOCK_READ_COST and MEMORY_BLOCK_READ_COST are not constants, but refers to the configurable cost constants with the same name. For cost calculations for operations that do not use the storage engine's database buffer (e.g. filesort) we will continue to use the io_block_read_cost() function to get a cost estimate. DETERMINING DEFAULT VALUES FOR COST CONSTANTS ============================================= The following alternatives are considered for determining default values for the new "memory_block_read_cost" and the existing "io_block_read_cost" cost constant: Alt 0. Let "memory_block_read_cost" have the same default value as "io_block_read_cost". With this alternative, this worklog will not cause any changes in query plans and thus avoid regressions. Advanced users can change these values based on their load and our recommendations. Alt 1. Keep the current default value for "io_block_read_cost" but set a lower default value for "memory_block_read_cost" to reflect that accessing data that likely is in a memory buffer is cheaper than reading from disk. Alt 2. Increase the default value for "io_block_read_cost" based on the assumption that with the estimates from the handler that data is on disk, the current value is too low. Set the default value for "memory_block_read_cost" to be lower than the current value for "io_block_read_cost". The decision is to implement alternative 0 in this worklog. This will minimize the risk of introducing performance regressions due to this worklog. Changing of default values for cost constants will be done in WL#8737.
Changes to the engine_cost table: ================================= The new cost constant will be added to the engine_cost table by adding the following SQL statement to the mysql_system_tables.sql script in order to have it initialized during install or upgrade of the MySQL server: INSERT IGNORE INTO engine_cost VALUES ("default", 0, "memory_block_read_cost", DEFAULT, CURRENT_TIMESTAMP, DEFAULT); Extension of the Cost_model_table class: ======================================== The Cost_model_table class will be extended with the following two functions for getting cost estimates for reading pages from tables or indexes: /** Cost of reading a number of random pages from a table. @param pages number of pages to read @return Cost estimate */ double page_read_cost(double pages) const; /** Cost of reading a number of random pages from an index. @param index the index number @param pages number of pagess to read @return Cost estimate */ double page_read_cost_index(uint index, double pages) const; Changes to optimizer cost model code: ===================================== The main change to code for doing cost calculations will be to replace the use of Cost_model_table::io_block_read_cost() with one of the above functions.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.