WL#7168: API for estimates for how much of table and index data that is in memory buffer
Affects: Server-Prototype Only — Status: Complete — Priority: Medium
For storage engines that cache parts of the database in a main memory buffer, the time to read data from a table or index will be very dependent on whether the data is already in the memory buffer or has to be read from disk. In order to take this into account, the code for computing cost estimates for data access needs to know whether data is in memory or will be read from disk. This worklog will extend the handler with a new API for providing estimates for where table and index data is stored. It will also contain a default implementation that will be used if the storage engine does not provide this information. Separate worklogs should be created for the individual storage engines to provide the estimates and for making them available to the handler. The estimates will be used by the cost estimation functions that will be implemented as part of improving the optimizer cost model.
This worklog will not introduce any functional or non-functional changes. It just provides a handler API that storage engines will provide data for. When this new API is taken into use, there will be changes to how the optimizer selects between query plans.
The worklog extends the handler with: a) Data structures where storage engines can provide estimates for how much of data from tables and indexes that are stored in a main memory buffer. b) An API that the cost model can use to access this information. c) A default implementation/heuristic that can be used if the storage engine has not provided this information. Data structures for storing estimates: ====================================== The following data structures will be extended in order for storage engines to store information about how much of tables or indexes that are in a main memory buffer; For estimate of how much of the table that is in a memory buffer: -extend the ha_statistics class with a new variable: /** Estimate for how much of the table that is availabe in a memory buffer. Valid range is [0..1]. If negative means that the storage engine has not supplied any value for it. */ double table_in_mem_estimate; The server will initialize this to -1.0 when creating new handler objects. This will be used for detecting whether the storage engine has provided a value or not. For estimate of how much of each index that is in a memory buffer: -extend the st_key struct with a new variable: /** Estimate for how much of the index data that is currently available in a memory buffer. Valid range is [0..1]. This will be initialize to a negative number. If it still has a negative values when used means that the storage engine has not supplied a value. */ double in_memory_estimate; The server will initialize this to -1.0 when creating new handler objects. This will be used for detecting whether the storage engine has provided a value or not. When the server calls handler::info() with the HA_STATUS_VARIABLE flag set, it will be the storage engine's responsibility to update these variables with correct estimates. handler::info() is called on the start of every query, and it is important for the optimizer that these estimates do not change during the optimization of a query. Note about partition pruning: since handler::info(HA_STATUS_VARIABLE) will be used for retrieving these estimates, this will allow estimates that correctly takes into account partition pruning. Handler API for use to get access to the estimates: =================================================== Two new functions are added to the handler class. These should be used by the cost calculation functions to access the estimates that the storage engines have provided: /** Return an estimate of how much of the table that is currently stored in main memory. This estimate should be the fraction of the table that currently is available in a main memory buffer. The estimate should be in the range from 0.0 (nothing in memory) to 1.0 (entire table in memory). @return The fraction of the table in main memory buffer */ double table_in_memory_estimate() const; /** Return an estimate of how much of the index that is currently stored in main memory. This estimate should be the fraction of the index that currently is available in a main memory buffer. The estimate should be in the range from 0.0 (nothing in memory) to 1.0 (entire index in memory). @param keyno the index to get an estimate for @return The fraction of the index in main memory buffer */ double index_in_memory_estimate(uint keyno) const; Default implementation: ======================= If the storage engine does not implement support for estimating the amount of the data and indexes that is in the memory buffer, the handler will use a default implementation for this. This implementation will be based on the size of the table/index and the size of the memory buffer that the storage engine uses for caching data in (if available). Proposed high level heuristics: a) if the handler knows the size of the main memory buffer, then if the size table or index is less than 10 percent of this buffer, then assume the table/index is cached in memory. b) if the handler does not know the size of the main memory buffer, then if the table or index is less than 10 MB, then assume it is cached in a memory buffer.
An initial working prototype for this API has been implemented. In addition to implement the interface as specified in the High-level specification section, this also implements the needed support in the Memory engine.
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.