WL#2474: Batched range read handler functions
Affects: Server-6.0
—
Status: Complete
This task is for a refinement of the Multi Read Range interface (WL #2126). The new interface supports usage of pushed predicates and includes a special handler function returning cost estimates for the batched range acccess.
Contents -------- 1. Current MRR handler functions 2. Envisioned implementations of the MRR interface 3. Limitations of the current MRR interface 4. Batched key access method 5. Range sequence interface 6. New MRR handler functions 1. Current MRR handler functions --------------------------------- At present Multi-Range Read (MRR) interface is represented by two virtual functions: virtual int read_multi_range_first( KEY_MULTI_RANGE **found_range_p, KEY_MULTI_RANGE *ranges, uint range_count, bool sorted, HANDLER_BUFFER *buffer); virtual int read_multi_range_next( KEY_MULTI_RANGE **found_range_p); The following type definintions fully specify the KEY_MULTI_RANGE structure used by these functions: enum ha_rkey_function { HA_READ_KEY_EXACT, HA_READ_KEY_OR_NEXT, HA_READ_KEY_OR_PREV, HA_READ_AFTER_KEY, HA_READ_BEFORE_KEY, HA_READ_PREFIX, HA_READ_PREFIX_LAST, HA_READ_PREFIX_LAST_OR_PREV, HA_READ_MBR_CONTAIN, HA_READ_MBR_INTERSECT, HA_READ_MBR_WITHIN, HA_READ_MBR_DISJOINT, HA_READ_MBR_EQUAL }; typedef struct st_key_range { const byte *key; uint length; enum ha_rkey_function flag; } key_range; /* For key ranges */ #define NO_MIN_RANGE 1 #define NO_MAX_RANGE 2 #define NEAR_MIN 4 #define NEAR_MAX 8 #define UNIQUE_RANGE 16 #define EQ_RANG 32 #define NULL_RANGE 64 #define GEOM_FLAG 128 typedef struct st_key_multi_range { key_range start_key; key_range end_key; char *ptr; uint range_flag; } KEY_MULTI_RANGE; A structure of the type KEY_MULTI_RANGE specifies an interval of an index. The first two fields represent two keys for the index as the endpoints of the interval. The last field characterizes the type of the interval: open, closed, half-open etc : 1. (a,b) = { x | a < x < b } 2. [a,b] = { x | a <= x <= b } 3. [a,b) = { x | a <= x < b } 4. (a,b] = { x | a < x <= b } 5. (a,inf) = { x | x > a } 6. [a,inf) = { x | x => a } 7. (-inf,b) = { x | x < b } 8. (-inf,b] = { x | x <= b } 9. (-inf,inf) Thus an interval of the type 3 is specified with range_flag = NEAR_MAX, of the type 9 with range_flag = (NO_MIN_RANGE | NO_MAX_RANGE). A singleton is represented by a structure where the second field is ignored and the range_flag field is equal to EQ_RANGE. The field 'ptr' of a KEY_MULTI_RANGE object currently is not used anywhere. The handler functions read_multi_range_first and read_multi_range_next are iterator functions typical for MySQL handler interface. The first one is used to initiate an iteration process over the data records accessed in a multiple range scan and fetches the first record if there is any. The second function is used to iterate over this set records one by one. At each step of the iteration process the record being accessed are placed into the record buffer attached to the handler. The specifics of MRR handler functions is that they support iteration over a records accessed as result of another iteration ¨C an iteration over a sequence of index ranges. This sequence is represented by an array of index ranges. 2. Envisioned implementations of the MRR interface -------------------------------------------------- MRR interface allows us to process requests for a key access to a table in a batch manner. It makes sense to use batch processing of requests for key access when performing massive operations that require accessing row data through index. Range index scan and equi-join that uses index to the join attributes of the inner table (see a paragraph on Batched Key Access below) are such operations. At present we see two scenarios when such processing brings significant benefits. Scenario A. - a portion of keys is accumulated in a buffer - the keys in the buffer are sorted by their rowid/pk - data records are accessed according to the sorted key sequence. MyISAM/InnoDb can capitalize on this scenario when performing both index range scans and equi-join operations. Scenario B. - a portion of ranges, possibly single key ranges, is accumulated in a buffer on the central node to where the query is submitted - the ranges are sent to the execution nodes that access data records - the accessed records are packed into packages and sent back to the central node - the received packages with data records are placed to a buffer - data records are read from the buffer NDB cluster currently works by this scenario when executing multiple range index scan. It also can benefit from this batch processing when performing an equi-join by an attribute. 3. Limitations of the current MRR interface ------------------------------------------- Let's consider the query SELECT * FROM t WHERE keyp1 >= 1000 AND keyp1 < 2000 AND keyp2=10000 with a compound index defined on (keyp1,keyp2). Current optimizer will construct only one range for this query [{1000,10000}, {2000,MIN_INT}). However, if we have at least 1 and on average 100 index entries for any value of keyp1 from [1000,2000) then scanning this range definitely is not a good choice. More efficient would be to scan the sequence of single point ranges [{1000,10000}],...[{1999,10000}]. To create an array of ranges for this sequence first we have to build 1000 different keys that will take 8*1000 bytes in total. The array of range structures will take additionally sizeof(MULTI_KEY_RANGE)*1000 bytes. Yet, multi-range batch processing not necessarily requires materialization of the whole sequence of ranges. At the scenario A we could build keys one by one in a key buffer and perform an index look-up for each new key. At the scenario B we also don't need all ranges at once. We need them in portions small enough to be placed in the buffer sent to the execution node. A requirement to represent a range exactly by a structure of the KEY_MULTI_RANGE type is another limitation of the current MRR interface. When using batched key access we have an array of single point ranges to be passed to the MRR interface. More exactly we have an array of pointers to the key values of a fixed size representing these ranges. A representation of these ranges by an array of the KEY_MULTI_RANGE structures requires at least sizeof(key_range) extra bytes for each array element. Besides the info about the length and the type of the key (flag) is not needed to be stored in each element of the array as all keys are of the same size and form single point ranges. For the batched key access any space wasted in the memory buffer is unwanted. This space can be used to store more partially joined records. That's why it's desirable to have as compact representation of the ranges as possible. We foresee at least the following possible representation of single point ranges to be used by batched key access method. typedef st_mrr_single_point_ref { char *record_list; char * key; } Here 'record_list' contains a head of list of pointers (or their offsets) to records in the batch buffer with the same key value while 'key' points to this value. Another possible representation uses the structure: typedef st_mrr_single_point_dir { char *record_list; char key[]; }; It makes sense to utilize this representation in the cases when sizeof(key) < size(char *). 4. Batched key access method ---------------------------- We've already mentioned the batched key access method. It can be employed for join operations yielding a significant performance gains. The procedure by which this method works can be presented as follows. 1. Get the next result record of the partial join of tables T1,...,Tk that met the condition pushed for these tables and put it the in a memory buffer after already written records and, possibly, keys. 2. Extract a key from the record. Look for this key in a search structure placed at the end of the buffer. If such key is found then the record is linked to the list of records with this key. Otherwise an element pointing to the key is added to the search structure. At this moment the list of records associated with the key consists only of one lastly read record. If the image of the key cannot be found in the record we put it into buffer after the last record. 3. When the whole buffer has been filled the multi_range_read_init method is called with the sequence of ranges that can be extracted from the heap structure. The call initiates a procedure that accesses records of the next joined table T by keys from the heap structure. 4. When multi_range_read_init finishes the accessed records are ready to be joined to the records in the buffer. To accomplish the join operation the multi_range_read_next method is called repeatedly. At each its call it receives the next accessed record r from table T and a pointer to the list of records containing the keys by which record r has been accessed. The list consists of all matches for record r. The newly joined records either are put to another buffer for the next join operation if there is any, or are sent to the output stream. In scenario A the record r is read from table T just before joining with matching records from the buffer. In scenario B the record is read from internal MRR buffer. The following picture presents the layout of the data in the buffer used by Batched Key Access method. | linked records [and keys] ---> [key K] ^ ... | | | +---->[ * ][record i with key K] | to first record | | with key K ^ +---->[ * ][ record i+1 with key K] | | | | | ... | +------>[ * ][last record with key K] | | [ * ][ * ] <--- search key structure | 5. Range sequence interface ---------------------------- All said above brings us to the idea of exploiting iterators when representing sequences of ranges rather than just static arrays of range structures. To ensure the utmost flexibility we'll represent a class of range sequences through its interface consisting of two member functions 'init' and 'next'. If the MRR interface accepted objects of C++ classes we would define an abstract class RangeSeq with two virtual member functions named accordingly. The objects of the classes derived from this abstract class would serve as concrete range sequences and would be passed as parameters to the multi_range_read_init function. Having to work in the C environment of the handler implementations we cannot do this. Instead of this we define the following structure consisting of two callback functions: typedef struct st_range_seq_if { range_seq (*init)(void *init_params, uint n, uint flags); uint (*next) (range_seq s, KEY_MULTI_RANGE *range); } RANGE_SEQ_IF; where range_seq is defined as follows: typedef void *range_seq; The 'init' member functions of the RANGE_SEQ_IF interface structure shall interpret the first parameter as a pointer to the structure that let us to perform all necessary setup for iteration through the ranges to be scanned. The structure can contain a pointer to an array of materialized representations of ranges to be run through or, alternatively, it can point to a structure by which this ranges can be generated. The specific info placed in this structure is to be used only by another member function ¨C 'next'. The 'init' function is to return a handler of the type range_seq to be passed to the successive calls of the 'next' member function. The latter shall fill in the KEY_MULTI_RANGE structure passed to it as the second parameter. The structure shall specify the range yielded at the next step of iteration trough the range sequence set by 'init' if the sequence has not ended yet. After this it will prepare to be able to fetch info for the following range in the sequence. The second and third parameters the init function could be accessed through the first parameter. By some reasons we prefer separate those parameters from 'init_params' (see below notes on the default implementation of the range sequence interface). Although the 'next' function receives a general range structure it has not to fill all fields of the structure. The parameter 'flags' dictates what fields are to be filled. The flag HA_MRR_SINGLE_POINT says that all fields related to the second key are to be skipped as well as the 'range_flag' field. The flag HA_MRR_FIXED_KEY says that the length of the key is to be skipped. The flag HA_MRR_NO_ASSOCIATION says that the field 'ptr' is not to be used by the MRR interface function and may be ignored by the 'next' function. Otherwise the 'ptr' value of the yielded range is to be remembered by MRR interface in order to be returned by any call of the multi_range_read_next function that reads records accessed by a key from the this range. This allows us to get matches for the records read in the batched key access procedure. The 'next' member function returns 0, if it has succeeded to get the next range. Otherwise it returns 1. 6. New MRR handler functions ------------------------------------- The new multi_read_range interface contains three functions: multi_range_read_init, multi_range_read_next, malti_range_read_info. The synopsis for the virtual functions multi_range_read_init and multi_range_read_next is as follows: int multi_range_read_init( RANGE_SEQ_IF *seq; /* range sequence interface */ void *seq_init_param; /* first param for seq.init */ uint n, /* number of ranges in seq. */ uint mode, /* flags */ HANDLER_BUFFER *buf /* memory buffer to be used */ ) int multi_range_read_next ( char **range_info /* out: info associated range */ ) When working by scenario A a typical implementation of the multi_range_read_init function first calls seq.init with parameters 'seq_init_param', 'n' and 'mode'. This call sets everything for getting ranges to be scanned. Then calling seq.next the method iterates through these ranges and when doing so it gets rowids for keys in these ranges and places them in the buffer 'buf'. When buffer is full the method prepares it for calls of the multi_range_read_next by sorting the rowids in the buffer to optimize the following access to the records referenced by these rowids. After this iterative calls of the multi_range_read_next method will read records with rowids from the buffer and will put them into the buffer table->record[0]. If this record is to be joined with matching records from the first join operand the reference returned in the range_info is to be used. Originally this reference is placed to the range structure by which key the record was accessed. Then this reference is attached to each rowid for this key and is placed to the buffer together with the rowid. When a call of multi_range_read_next discovers that there are no rowids in the buffer first it reads rowids for keys in the remaining ranges and fills the buffer, after this sorts the buffer, takes the top rowid and accesses the corresponding record. The 'mode' parameter can contain not only flags interpreted by seq.init. Additionally it can contain flag HA_MRR_SORTED. This flag is used when the read records must be returned in order compatible with the index used for the ranges. It does not make sense to set this flag if the sequence of ranges generated by the range iterator does not comply with the index order. At the first glance retrieving table records in rowid order cannot comply the requirement to return records in index order unless the index is defined as a clustered primary key. Actually the read records can be put first in a separate piece of memory (a part of that specified by the 'buf' parameter). After this they can be returned by calls of multi_range_read_next in the index order. When working by scenario B a typical implementation of the multi_range_read_init function first also has to call the seq.init that performs initial setting for iteration through ranges. In a general case the iteration itself is triggered by a requirement to form a package of ranges to be sent to remote nodes. After the package has been sent a return package of read records is received. The records are placed into the memory of the recipient and are read by multi_range_read_next calls. When the last of them has been read a new portion is requested. This is repeated until all ranges from the sent package are scanned. After this the next package of ranges is sent to remote nodes. In this scenario the multi_range_read_next call can return not all records that are accessed by keys from the range sequence. If a condition is pushed to the accessed table then only the records that meet this condition are returned to the query node. A condition can be pushed to the table by a call of the cond_push handler function. The 'mode' also can contains information on the type of the join to be performed. For semi-join and anti-join operations we don't need any record to be joined and the buffer table->record[0] has not to be filled. The flag HA_MRR_SEMI_JOIN/HA_MRR_ANTI_JOIN says that the join operation is a semi-join/anti-join. There is a special flag to specify outer join operations as well ¨C HA_MRR_OUTER_JOIN. Here a left outer join operation always is assumed. The flags HA_MRR_SEMI_JOIN, HA_MRR_ANTI_JOIN, HA_MRR_OUTER_JOIN are used only for operations with equality join predicates. Thus if one of this flag is set the ranges in the range sequence must be singletons. When the flag HA_MRR_SEMI_JOIN is set a call of the multi_range_read_next function returns a reference to the list of records associated with the next scanned key for which there is a match in the inner table. This is opposite to the situation when the flag HA_MRR_ANTI_JOIN is set. Here a call of the multi_range_read_next function returns a reference to the list of records associated with the next scanned key for which there is no match in the inner table. In both cases the matched records are not even accessed. With the flag HA_MRR_OUTER_JOIN a call of the multi_range_read_next function returns matched inner records, possibly projected to some fields, when there are matches. They are returned in addition to the list of outer records associated with the matching key. If the there is no matches for a key only a list of records associated with the key is returned. The return code of the multi_range_read_next function says whether there was a match for a key or not. There are some variants of join operation when only the first/last record with a given key is joined. The corresponding flags for these variant are to be added later. When a matched record is found only fields that are referred in the query can be put into the buffer table->record[0]. If all these fields can be read from the index the data record has not to be accessed at all. The flag HA_MRR_INDEX_ONLY indicate this case. In scenario A it's not necessary to allocate a buffer for rowids in this case. If the 'seq' parameter is equal to NULL then it's assumed that seq_init_param is an array of universal range representations of type KEY_MULTI_RANGE and n is number of ranges in this array. This explains why in some cases we can't do without a separate parameter that specifies the number of ranges in the sequence. The synopsis the multi_range_read_info handler function is: int multi_range_read_info( uint keyno, /* index number */ uint n, /* est. number of ranges in seq. */ uint keys, /* est. number of keys in ranges */ uint *bufsz, /* size of the buffer to be used */ uint *flags, /* in/out: flags */ COST_VECT *cost /* est. cost of the MRR access */ ) where COST_VECT is defined as follows: typedef st_cost_vect { double io_count; /* number of I/O * */ double avg_io_cost; /* cost of an average I/O oper. */ double cpu_cost; /* cost of operations in CPU */ double mem_cost; /* cost of used memory */ double import_cost; /* cost of remote operations */ } COST_VECT The multi_range_read_info function is to be called by the optimizer. The function passes the following as input parameters: - the number of the index to be scanned - an estimate of number of ranges in the range sequence - an estimate for the number of keys in these ranges - a suggested size of the external buffer to be used in the MRR access method - suggested modes for the MRR method. The function can return the size of external buffer that it considers a more appropriate. In particular it can return 0 in the 'bufsz' parameter if the MRR handler functions do not use any buffer at all of use an 'internal' buffer allocated by the method itself. If the flag HA_MRR_LIMITS is passed in the 'mode' parameter then the returned size of the buffer cannot be greater than the suggested one. In general the 'mode' parameter should contain all flags to be passed to the multi_range_read_init method. The 'cost' parameter is used to get the cost of the whole multi-range read operation. The cost C is calculated as: C = f1 * io_count * avg_io_cost + f2 * cpu_cost + f3 * mem_cost + f4 * import_cost where f1, f2, f3, f4 are system variables that allow to tune the cost of usage of different resources ¨C CPU, main memory and nework/other servers ¨C with respect to the cost of IO operations. All these components of the cost are supposed to be used by some future variants of the EXPLAIN command. The rules to calculate these costs depend basically on a given implementation of the MRR handler functions. The function multi_range_read_info is supposed to get cost estimates in the cases when the range sequence is unknown yet or too long. If all members of the range sequence are known before the query is executed then another MRR function - multi_range_read_info_const - can be used to get cost estimates. This function is assumed to return better estimates as implementations of this function are allowed to perform index lookups for range endpoints. The synopsis the multi_range_read_info_const handler function is: int multi_range_read_info_const( uint keyno, /* index number */ RANGE_SEQ_IF *seq; /* range sequence interface */ void *seq_init_param;/* first param for seq.init */ uint n, /* number of ranges in seq */ uint *bufsz, /* size of the buffer to be used */ uint *flags, /* in/out: flags */ COST_VECT *cost /* est. cost of the MRR access */ ) The parameters seq, seq_init_param and n are used to walk through the sequence of ranges whose scan is to be estimated. These parameters have the same meaning as in the function multi_range_read_init. The remaining parameters are quite similar to those of the same name used by the function multi_range_read_info.
CONTENTS 1. Interface description 1.1 Coding style-compliant MRR Interface definitions 1.2 Coding style-compliant Range Sequence interface definitions 1.3 Definitions of used HA_MRR_* flags 2. Range sequence interface implementations 2.1 The "array walker" implementation 2.2 SEL_ARG tree walker implementation 3. MRR interface implementations 3.1 The default implementation 3.2 Changes in ha_ndbcluster.cc 4. SQL Layer changes 4.1 Update range analysis code to use the MRR interface 4.2 Update the QUICK_RANGE_SELECT code to use the new MRR interface 5. What will not be implemented 1. Interface description ======================== The MRR interface consists of two parts: * The MRR interface itself MRR interface is implemented by - Generic implementation (converts MRR calls to non-MRR calls) - ha_ndblcuster-specific implementation - [in development] MyISAM/InnoDB/BDB implementation MRR interface is used by - Range access method code - [in development] Batched key access - [possible] Loose Index scan * The Range sequence interface. This interface is part of MRR interface. Range sequence interface is implemented by - range access method optimizer - range access method row retrieval code Range sequence is used by - Any table handler that implements MRR interface 1.1 Coding style-compliant MRR Interface definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ typedef struct st_key_range { const byte *key; uint length; } RANGE_ENDPOINT; typedef struct st_key_multi_range { RANGE_ENDPOINT start_key; RANGE_ENDPOINT end_key; enum ha_rkey_function function; char *ptr; uint range_flag; } KEY_RANGE; /* Get cost and other information about MRR scan over currently unknown ranges SYNOPSIS multi_range_read_info() keyno Index number n_ranges Estimated number of ranges (i.e. intervals) in the range sequence. n_rows Estimated total number of records contained within all of the ranges bufsz INOUT IN: Size of the buffer available for use OUT: Size of the buffer that will be actually used, or 0 if buffer is not needed. flags INOUT A combination of HA_MRR_* flags cost OUT Estimated cost of MRR access DESCRIPTION This function obtains the cost and other info about about the MRR scan. The caller has some information about what kinds of ranges will be scanned (e.g. "1000 {keypart1=const} ranges"}, but the constant themselves are assumed not to be known when this function is called. The flags parameter is a combination of those flags: HA_MRR_SORTED, HA_MRR_INDEX_ONLY, HA_MRR_NO_ASSOCIATION, HA_MRR_LIMITS. RETURN 0 - OK, *cost contains cost of the scan, *bufsz and *flags contain scan parameters. other - Error or can't perform the requested scan */ int multi_range_read_info(uint keyno, uint n_ranges, uint n_rows, uint *bufsz, uint *flags, COST_VECT *cost); /* Get cost and other information about MRR scan over a known list of ranges SYNOPSIS multi_range_read_info_const() keyno Index number seq Range sequence to be traversed seq_init_param First parameter for seq->init() n_ranges Number of ranges in the sequence bufsz INOUT IN: Size of the buffer available for use OUT: Size of the buffer that is expected to be actually used, or 0 if buffer is not needed. flags INOUT A combination of HA_MRR_* flags cost OUT Estimated cost of MRR access DESCRIPTION Obtain the cost and other information about an MRR scan when the list of ranges to be scanned is known. NOTE Reasons for having this function: * Let high-roundtrip-cost handlers, like NDB/Federated produce the estimates in one roundtrip. (One might ask, why don't we use condition-pushdown-based system for the purpose of estimating number of records that match the table condition?... good question... in any case there is another reason) * We need a specialized cost-of-MRR-over-constant-ranges function. = It must be different from non-MRR-cost function (since MRR is supposed to be better and have a smaller cost) = It must be different from multi_range_read_info() because here we know the ranges. Therefore, we have this function. RETURN 0 - OK, *cost contains cost of the scan, *bufsz and *flags contain scan parameters. 1 - Error or can't perform the requested scan */ int multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq, void *seq_init_param, uint n_ranges, uint *bufsz, uint *flags, COST_VECT *cost); /* Initialize the MRR scan SYNOPSIS multi_range_read_init() seq Range sequence to be traversed seq_init_param First parameter for seq->init() n_ranges Number of ranges in the sequence mode Flags, see the description for details buf INOUT, memory buffer to be used. DESCRIPTION Initialize the MRR scan. This function may do heavyweight scan initialization like row prefetching/sorting/etc (NOTE: but better not do it here as we may not need it, e.g. if we never satisfy WHERE clause on previous tables. For many implementations it would be natural to do such initializations in _next() call) mode is a combination of the following flags: HA_MRR_SORTED, HA_MRR_INDEX_ONLY, HA_MRR_NO_ASSOCIATION NOTES One must have called index_init() before calling this function. Several multi_range_read_init() calls may be made in course of one query. Until WL#2623 is done (see section 3.2), the following will also hold: The caller will guarantee that if "seq->init == mrr_ranges_array_init" then seq_init_param will be an array of KEY_MULTI_RANGE structures (with n_ranges elements). This property will only be used by NDB handler until WL#2623 is done (see section 3.2 again). Buffer memory management is done according to the following scenario: The caller allocates the buffer and provides it to the callee by filling in the members of HANDLER_BUFFER structure (see current code for definition of HANDLER_BUFFER). The callee consumes all or some fraction of the provided buffer space, and sets the HANDLER_BUFFER members accordingly. The callee may use the buffer memory until the next multi_range_read_init() call is made, all records have been read, or until index_end() call is made, whichever comes first. RETURN 0 - OK 1 - Error */ int multi_range_read_init(RANGE_SEQ_IF *seq, void *seq_init_param, uint n_ranges, uint mode, HANDLER_BUFFER *buf); /* Get next record in MRR scan SYNOPSIS multi_range_read_next() range_info OUT If HA_MRR_NO_ASSOCIATION flag is in effect or an error occured, nothing (but the caller is still allowed to store an arbitrary value there) Otherwise, the opaque value associated with the range that contains the returned record. NOTE It should be possible to process range lists of arbitrary length without any kind of overflows. RETURN 0 OK other Error code */ int multi_range_read_next(char **range_info); /* Operation cost vector class */ class COST_VECT { public: double io_count; /* number of I/O */ double avg_io_cost; /* cost of an average I/O oper. */ double cpu_cost; /* cost of operations in CPU */ double mem_cost; /* cost of used memory */ double import_cost; /* cost of remote operations */ double total_cost() { /* f{i} will be #define'd constants */ return f1 * io_count * avg_io_cost + f2 * cpu_cost + f3 * mem_cost + f4 * import_cost; } }; 1.2 Coding style-compliant Range Sequence interface definitions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ typedef void *range_seq; typedef struct st_range_seq_if { /* Initialize the traversal of range sequence SYNOPSIS init() init_params The seq_init_param parameter n_ranges The number of ranges obtained flags HA_MRR_SINGLE_POINT, HA_MRR_FIXED_KEY RETURN An opaque value to be used as RANGE_SEQ_IF::next() parameter */ range_seq (*init)(void *init_params, uint n_ranges, uint flags); /* Get the next range in the range sequence SYNOPSIS next() seq The value returned by RANGE_SEQ_IF::init() range OUT RETURN 0 - Ok, the range structure filled with info about the next range 1 - No more ranges */ uint (*next) (range_seq seq, KEY_RANGE *range); } RANGE_SEQ_IF; 1.3 Definitions of used HA_MRR_* flags ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /* The below two are not used (and not handled) in this milestone of this WL entry because there seems to be no use for them at this stage of implementation. (TODO is that so??) */ #define HA_MRR_SINGLE_POINT n #define HA_MRR_FIXED_KEY n /* Indicates that RANGE_SEQ_IF::next(&range) doesn't need to fill in the 'range' parameter. */ #define HA_MRR_NO_ASSOCIATION n /* The MRR user will provide ranges in key order, and MRR implementation must return rows in key order. */ #define HA_MRR_SORTED n /* Don't retrieve full records */ #define HA_MRR_INDEX_ONLY n /* The passed memory buffer is of maximum possible size, the caller can't assume larger buffer. */ #define HA_MRR_LIMITS n 2. Range sequence interface implementations =========================================== 2.1 The "array walker" implementation ------------------------------------- This implementation will be used by QUICK_RANGE_SELECT code. When a QUICK_RANGE_SELECT is constructed, the list of ranges to be scanned is materialized in an array. This implementation will walk over this array. There will be two RANGE_SEQ_IF-member-compliant functions: range_seq is a pointer to struct { QUICK_RANGE *cur; QUICK_RANGE *end; }; /* Set the cur and end pointers */ mrr_ranges_array_init(); /* Advance over one element in the array */ mrr_ranges_array_next() { if (cur == end) return 1; /* No more ranges */ else { //fill range from *cur cur++; return 0; } } 2.2 SEL_ARG tree walker implementation -------------------------------------- This range sequence interface implementation will traverse the SEL_ARG* tree representation of range list. struct state { SEL_ARG **stack; byte *min_key; byte *max_key; }; init() { /* Initialize the traversal */ } next() { /* Traverse the SEL_ARG tree until we've built the next interval */ } 3. MRR interface implementations ================================ 3.1 The default implementation ------------------------------ This is the only implementation that will be created in scope of this WL entry. The planned implementations are WL#2475 (MyISAM/InnoDB/etc) and WL#2623 (NDB). The purpose of this implementation is to allow the query executioner to always use the MRR interface, no matter if the underlying engine supports MRR or not. In a nutshell, this is "MRR calls to non-MRR calls adapter" Below is a sketch of the functions to be implemented: int handler::multi_range_read_info(uint key_no, uint n_ranges, uint keys, uint *bufsz, uint *flags, COST_VECT *cost) { *bufsz= 0; // No need for buffer /* Produce the same cost as current code does */ if (*flags & HA_MRR_INDEX_ONLY) { /* get_index_only_read_time is implemented in opt_range.cc already */ cost->io_cost= get_index_only_time(key_no, keys); } else cost->io_count= this->read_time(n_ranges, keys); cost->avg_io_cost= 1; // Assume random seeks cost->cpu_cost= cost->import_cost= cost->mem_cost= 0; return 0; } int handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq, void *seq_init_param, uint n_ranges, uint *bufsz, uint *flags, COST_VECT *cost) { KEY_RANGE range; range_seq seq_it; ha_rows rows, total_rows; *bufsz= 0; total_rows= 0; seq->init(seq_init_param, n_ranges, flags); while (!seq->next(seq_it, &range)) { if (HA_POS_ERROR == (rows= this->records_in_range(keyno, &range.start_key, &range.end_key))) { /* Can't scan one range => can't do MRR scan at all */ return 1; } total_rows += rows; } cost->io_count= this->read_time(n_ranges, total_rows); cost->avg_io_cost= 1; // Assume random seeks cost->cpu_cost= cost->import_cost= cost->mem_cost= 0; return 0; } For handler::multi_range_read_init() and handler::multi_range_read_next(), adopt the code from handler::read_multi_range_first() and handler::read_multi_range_next(). It should be easy as that code uses simple loops. 3.2 Changes in ha_ndbcluster.cc ------------------------------- The proper changes will be done in WL#2623. Within the scope of this WL entry the following will be done: The ha_ndbcluster::read_multi_range_first and read_multi_range_next functions will be changed to use the new interface. The new functions will not be fully conformant implementations: They'll only handle the case where "seq->init == mrr_ranges_array_init" This limitation is expected to be lifted by implementation of WL#2623. 4. SQL Layer changes ==================== 4.1 Update range analysis code to use the MRR interface ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is partially already done. See section #(SEL_ARG tree) for one thing. 4.2 Update the QUICK_RANGE_SELECT code to use the new MRR interface ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Update the QUICK_*_SELECT to use the new MRR interface. Also remove the QUICK_RANGE class and replace all its uses with uses of KEY_RANGE as these two structures contain identical information. //btw we will not ask the table handler to return the container interval. 5. What will not be implemented =============================== Support for HA_MRR_SEMI_JOIN, ANTI_JOIN or OUTER_JOIN will not be implemented in this WL entry as there is no way to test that functionality.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.