WL#2474: Batched range read handler functions
Affects: Server-6.0
—
Status: Complete
This task is for a refinement of the Multi Read Range interface (WL #2126). The new interface supports usage of pushed predicates and includes a special handler function returning cost estimates for the batched range acccess.
Contents
--------
1. Current MRR handler functions
2. Envisioned implementations of the MRR interface
3. Limitations of the current MRR interface
4. Batched key access method
5. Range sequence interface
6. New MRR handler functions
1. Current MRR handler functions
---------------------------------
At present Multi-Range Read (MRR) interface is represented by two virtual
functions:
virtual int read_multi_range_first(
KEY_MULTI_RANGE **found_range_p,
KEY_MULTI_RANGE *ranges,
uint range_count,
bool sorted,
HANDLER_BUFFER *buffer);
virtual int read_multi_range_next(
KEY_MULTI_RANGE **found_range_p);
The following type definintions fully specify the KEY_MULTI_RANGE structure
used by these functions:
enum ha_rkey_function {
HA_READ_KEY_EXACT,
HA_READ_KEY_OR_NEXT,
HA_READ_KEY_OR_PREV,
HA_READ_AFTER_KEY,
HA_READ_BEFORE_KEY,
HA_READ_PREFIX,
HA_READ_PREFIX_LAST,
HA_READ_PREFIX_LAST_OR_PREV,
HA_READ_MBR_CONTAIN,
HA_READ_MBR_INTERSECT,
HA_READ_MBR_WITHIN,
HA_READ_MBR_DISJOINT,
HA_READ_MBR_EQUAL
};
typedef struct st_key_range
{
const byte *key;
uint length;
enum ha_rkey_function flag;
} key_range;
/* For key ranges */
#define NO_MIN_RANGE 1
#define NO_MAX_RANGE 2
#define NEAR_MIN 4
#define NEAR_MAX 8
#define UNIQUE_RANGE 16
#define EQ_RANG 32
#define NULL_RANGE 64
#define GEOM_FLAG 128
typedef struct st_key_multi_range
{
key_range start_key;
key_range end_key;
char *ptr;
uint range_flag;
} KEY_MULTI_RANGE;
A structure of the type KEY_MULTI_RANGE specifies an interval of an index. The
first two fields represent two keys for the index as the endpoints of the
interval.
The last field characterizes the type of the interval:
open, closed, half-open etc :
1. (a,b) = { x | a < x < b }
2. [a,b] = { x | a <= x <= b }
3. [a,b) = { x | a <= x < b }
4. (a,b] = { x | a < x <= b }
5. (a,inf) = { x | x > a }
6. [a,inf) = { x | x => a }
7. (-inf,b) = { x | x < b }
8. (-inf,b] = { x | x <= b }
9. (-inf,inf)
Thus an interval of the type 3 is specified with range_flag = NEAR_MAX, of the
type 9 with range_flag = (NO_MIN_RANGE |
NO_MAX_RANGE).
A singleton is represented by a structure where the second field is ignored and
the range_flag field is equal to EQ_RANGE.
The field 'ptr' of a KEY_MULTI_RANGE object currently is not used anywhere.
The handler functions read_multi_range_first and read_multi_range_next are
iterator functions typical for MySQL handler interface. The first one is used
to initiate an iteration process over the data records accessed in a multiple
range scan and fetches the first record if there is any. The second function is
used to iterate over this set records one by one. At each step of the iteration
process the record being accessed are placed into the record buffer attached to
the handler.
The specifics of MRR handler functions is that they support iteration over a
records accessed as result of another iteration ¨C an iteration over a sequence
of index ranges. This sequence is represented by an array of index ranges.
2. Envisioned implementations of the MRR interface
--------------------------------------------------
MRR interface allows us to process requests for a key access to a table in a
batch manner.
It makes sense to use batch processing of requests for key access when
performing massive operations that require accessing row data through index.
Range index scan and equi-join that uses index to the join attributes of the
inner table (see a paragraph on Batched Key Access below) are such operations.
At present we see two scenarios when such processing brings significant
benefits.
Scenario A.
- a portion of keys is accumulated in a buffer
- the keys in the buffer are sorted by their rowid/pk
- data records are accessed according to the sorted key sequence.
MyISAM/InnoDb can capitalize on this scenario when performing both index range
scans and equi-join operations.
Scenario B.
- a portion of ranges, possibly single key ranges, is accumulated in a buffer
on the central node to where the query is submitted
- the ranges are sent to the execution nodes that access data records
- the accessed records are packed into packages and sent back to the central
node
- the received packages with data records are placed to a buffer
- data records are read from the buffer
NDB cluster currently works by this scenario when executing multiple range
index scan. It also can benefit from this batch processing when performing an
equi-join by an attribute.
3. Limitations of the current MRR interface
-------------------------------------------
Let's consider the query
SELECT * FROM t WHERE keyp1 >= 1000 AND keyp1 < 2000
AND keyp2=10000
with a compound index defined on (keyp1,keyp2).
Current optimizer will construct only one range for this query [{1000,10000},
{2000,MIN_INT}).
However, if we have at least 1 and on average 100 index entries for any value
of keyp1 from [1000,2000) then scanning this range definitely is not a good
choice. More efficient would be to scan the sequence of single point ranges
[{1000,10000}],...[{1999,10000}].
To create an array of ranges for this sequence first we have to build 1000
different keys that will take 8*1000 bytes in total. The array of range
structures will take
additionally sizeof(MULTI_KEY_RANGE)*1000 bytes.
Yet, multi-range batch processing not necessarily requires materialization of
the whole sequence of ranges.
At the scenario A we could build keys one by one in a key buffer and perform an
index look-up for each new key.
At the scenario B we also don't need all ranges at once. We need them in
portions small enough to be placed in the buffer sent to the execution node.
A requirement to represent a range exactly by a structure of the
KEY_MULTI_RANGE type is another limitation of the current MRR interface. When
using batched key access we have an array of single point ranges to be passed
to the MRR interface. More exactly we have an array of pointers to the key
values of a fixed size representing these ranges.
A representation of these ranges by an array of the KEY_MULTI_RANGE structures
requires at least sizeof(key_range) extra bytes for each array element. Besides
the info about the length and the type of the key (flag) is not needed to be
stored in each element of the array as all keys are of the same size and form
single point ranges.
For the batched key access any space wasted in the memory buffer is unwanted.
This space can be used to store more partially joined records. That's why it's
desirable to have as compact representation of the ranges as possible.
We foresee at least the following possible representation of single point
ranges to be used by batched key access method.
typedef st_mrr_single_point_ref
{
char *record_list;
char * key;
}
Here 'record_list' contains a head of list of pointers (or their offsets) to
records in the batch buffer with the same key value while 'key' points to this
value.
Another possible representation uses the structure:
typedef st_mrr_single_point_dir
{
char *record_list;
char key[];
};
It makes sense to utilize this representation in the cases when sizeof(key) <
size(char *).
4. Batched key access method
----------------------------
We've already mentioned the batched key access method. It can be employed for
join operations yielding a significant performance gains. The procedure by
which this method works can be presented as follows.
1. Get the next result record of the partial join of tables T1,...,Tk that met
the condition pushed for these tables and put it the in a memory buffer after
already written records and, possibly, keys.
2. Extract a key from the record. Look for this key in a search structure
placed at the end of the buffer. If such key is found then the record is linked
to the list of records with this key. Otherwise an element pointing to the key
is added to the search structure. At this moment the list of records associated
with the key consists only of one lastly read record. If the image of the key
cannot be found in the record we put it into buffer after the last record.
3. When the whole buffer has been filled the multi_range_read_init method is
called with the sequence of ranges that can be extracted from the heap
structure.
The call initiates a procedure that accesses records of the next joined table T
by keys from the heap structure.
4. When multi_range_read_init finishes the accessed records are ready to be
joined to the records in the buffer. To accomplish the join operation the
multi_range_read_next method is called repeatedly. At each its call it receives
the next accessed record r from table T and a pointer to the list of records
containing the keys by which record r has been accessed. The list consists of
all matches for record r. The newly joined records either are put to another
buffer for the next join operation if there is any, or are sent to the output
stream.
In scenario A the record r is read from table T just before joining with
matching records from the buffer. In scenario B the record is read from
internal MRR buffer.
The following picture presents the layout of the data in the buffer used by
Batched Key Access method.
| linked records [and keys] --->
[key K]
^ ...
| |
| +---->[ * ][record i with key K]
| to first record |
| with key K ^ +---->[ * ][ record i+1 with key K]
| | |
| | ...
| +------>[ * ][last record with key K]
| |
[ * ][ * ] <--- search key structure |
5. Range sequence interface
----------------------------
All said above brings us to the idea of exploiting iterators when representing
sequences of ranges rather than just static arrays of range structures. To
ensure the utmost flexibility we'll represent a class of range sequences
through its interface consisting of two member functions 'init' and 'next'.
If the MRR interface accepted objects of C++ classes we would define an
abstract class RangeSeq with two virtual member functions named accordingly.
The objects of the classes derived from this abstract class would serve as
concrete range sequences and would be passed as parameters to the
multi_range_read_init function. Having to work in the C environment of the
handler implementations we cannot do this. Instead of this we define the
following structure consisting of two callback functions:
typedef struct st_range_seq_if
{
range_seq (*init)(void *init_params, uint n, uint flags);
uint (*next) (range_seq s, KEY_MULTI_RANGE *range);
} RANGE_SEQ_IF;
where range_seq is defined as follows:
typedef void *range_seq;
The 'init' member functions of the RANGE_SEQ_IF interface structure shall
interpret the first parameter as a pointer to the structure that let us to
perform all necessary setup for iteration through the ranges to be scanned. The
structure can contain a pointer to an array of materialized representations of
ranges to be run through or, alternatively, it can point to a structure by
which this ranges can be generated. The specific info placed in this structure
is to be used only by another member function ¨C 'next'. The 'init' function is
to return a handler of the type range_seq to be passed to the successive calls
of the 'next' member function. The latter shall fill in the KEY_MULTI_RANGE
structure passed to it as the second parameter. The structure shall specify the
range yielded at the next step of iteration trough the range sequence set
by 'init' if the sequence has not ended yet. After this it will prepare to be
able to fetch info for the following range in the sequence.
The second and third parameters the init function could be accessed through the
first parameter. By some reasons we prefer separate those parameters
from 'init_params' (see below notes on the default implementation of the range
sequence interface).
Although the 'next' function receives a general range structure it has not to
fill all fields of the structure.
The parameter 'flags' dictates what fields are to be filled.
The flag HA_MRR_SINGLE_POINT says that all fields related to the second key are
to be skipped as well as the 'range_flag' field.
The flag HA_MRR_FIXED_KEY says that the length of the key is to be skipped.
The flag HA_MRR_NO_ASSOCIATION says that the field 'ptr' is not to be used by
the MRR interface function and may be ignored by the 'next' function. Otherwise
the 'ptr' value of the yielded range is to be remembered by MRR interface in
order to be returned by any call of the multi_range_read_next function that
reads records accessed by a key from the this range. This allows us to get
matches for the records read in the batched key access procedure.
The 'next' member function returns 0, if it has succeeded to get the next
range. Otherwise it returns 1.
6. New MRR handler functions
-------------------------------------
The new multi_read_range interface contains three functions:
multi_range_read_init,
multi_range_read_next,
malti_range_read_info.
The synopsis for the virtual functions multi_range_read_init and
multi_range_read_next is as follows:
int multi_range_read_init(
RANGE_SEQ_IF *seq; /* range sequence interface */
void *seq_init_param; /* first param for seq.init */
uint n, /* number of ranges in seq. */
uint mode, /* flags */
HANDLER_BUFFER *buf /* memory buffer to be used */
)
int multi_range_read_next (
char **range_info /* out: info associated range */
)
When working by scenario A a typical implementation of the
multi_range_read_init function first calls seq.init with
parameters 'seq_init_param', 'n' and 'mode'. This call sets everything for
getting ranges to be scanned. Then calling seq.next the method iterates through
these ranges and when doing so it gets rowids for keys in these ranges and
places them in the buffer 'buf'. When buffer is full the method prepares it for
calls of the multi_range_read_next by sorting the rowids in the buffer to
optimize the following access to the records referenced by these rowids. After
this iterative calls of the multi_range_read_next method will read records with
rowids from the buffer and will put them into the buffer table->record[0]. If
this record is to be joined with matching records from the first join operand
the reference returned in the range_info is to be used.
Originally this reference is placed to the range structure by which key the
record was accessed. Then this reference is attached to each rowid for this key
and is placed to the buffer together with the rowid. When a call of
multi_range_read_next discovers that there are no rowids in the buffer first it
reads rowids for keys in the remaining ranges and fills the buffer, after this
sorts the buffer, takes the top rowid and accesses the corresponding record.
The 'mode' parameter can contain not only flags interpreted by seq.init.
Additionally it can contain flag HA_MRR_SORTED. This flag is used when the read
records must be returned in order compatible with the index used for the
ranges. It does not make sense to set this flag if the sequence of ranges
generated by the range iterator does not comply with the index order.
At the first glance retrieving table records in rowid order cannot comply the
requirement to return records in index order unless the index is defined as a
clustered primary key. Actually the read records can be put first in a separate
piece of memory (a part of that specified by the 'buf' parameter). After this
they can be returned by calls of multi_range_read_next in the index order.
When working by scenario B a typical implementation of the
multi_range_read_init function first also has to call the seq.init that
performs initial setting for iteration through ranges. In a general case the
iteration itself is triggered by a requirement to form a package of ranges to
be sent to remote nodes. After the package has been sent a return package of
read records is received. The records are
placed into the memory of the recipient and are read by
multi_range_read_next calls. When the last of them has been read a new portion
is requested. This is repeated until all ranges from the sent package are
scanned. After this the next package of ranges is sent to remote nodes.
In this scenario the multi_range_read_next call can return not all records that
are accessed by keys from the range sequence. If a condition is pushed to the
accessed table then only the records that meet this condition are returned
to the query node.
A condition can be pushed to the table by a call of the cond_push handler
function.
The 'mode' also can contains information on the type of the join to be
performed. For semi-join and anti-join operations we don't need any record to
be joined and the buffer table->record[0] has not to be filled.
The flag HA_MRR_SEMI_JOIN/HA_MRR_ANTI_JOIN says that the join operation is a
semi-join/anti-join.
There is a special flag to specify outer join operations as well ¨C
HA_MRR_OUTER_JOIN. Here a left outer join operation always is assumed.
The flags HA_MRR_SEMI_JOIN, HA_MRR_ANTI_JOIN, HA_MRR_OUTER_JOIN are used only
for operations with equality join predicates. Thus if one of this flag is set
the ranges in the range sequence must be singletons.
When the flag HA_MRR_SEMI_JOIN is set a call of the multi_range_read_next
function returns a reference to the list of records associated with the next
scanned key for which there is a match in the inner table.
This is opposite to the situation when the flag HA_MRR_ANTI_JOIN is set. Here a
call of the multi_range_read_next function returns a reference to the list of
records associated with the next scanned key for which there is no match in the
inner table.
In both cases the matched records are not even accessed.
With the flag HA_MRR_OUTER_JOIN a call of the multi_range_read_next function
returns matched inner records, possibly projected to some fields, when there
are matches. They are returned in addition to the list of outer records
associated with the matching key. If the there is no matches for a key only a
list of records associated with
the key is returned. The return code of the multi_range_read_next function says
whether there was a match for a key or not.
There are some variants of join operation when only the first/last record with
a given key is joined. The corresponding flags for these variant are to be
added later.
When a matched record is found only fields that are referred in the query can
be put into the buffer
table->record[0]. If all these fields can be read from the index the data
record has not to be accessed at all.
The flag HA_MRR_INDEX_ONLY indicate this case.
In scenario A it's not necessary to allocate a buffer for rowids in this case.
If the 'seq' parameter is equal to NULL then it's assumed that seq_init_param
is an array of universal range representations of type KEY_MULTI_RANGE and n is
number of ranges in this array. This explains why in some cases we can't do
without a separate parameter that specifies the number of ranges in the
sequence.
The synopsis the multi_range_read_info handler function is:
int multi_range_read_info(
uint keyno, /* index number */
uint n, /* est. number of ranges in seq. */
uint keys, /* est. number of keys in ranges */
uint *bufsz, /* size of the buffer to be used */
uint *flags, /* in/out: flags */
COST_VECT *cost /* est. cost of the MRR access */
)
where COST_VECT is defined as follows:
typedef st_cost_vect
{
double io_count; /* number of I/O * */
double avg_io_cost; /* cost of an average I/O oper. */
double cpu_cost; /* cost of operations in CPU */
double mem_cost; /* cost of used memory */
double import_cost; /* cost of remote operations */
} COST_VECT
The multi_range_read_info function is to be called by the optimizer. The
function passes the following as input parameters:
- the number of the index to be scanned
- an estimate of number of ranges in the range sequence
- an estimate for the number of keys in these ranges
- a suggested size of the external buffer to be used in the MRR access method
- suggested modes for the MRR method.
The function can return the size of external buffer that it considers a more
appropriate. In particular it can return 0 in the 'bufsz' parameter if the MRR
handler functions do not use any buffer at all of use an 'internal' buffer
allocated by the method itself. If the flag HA_MRR_LIMITS is passed in
the 'mode' parameter then the returned size of the buffer cannot be greater
than the suggested one.
In general the 'mode' parameter should contain all flags to be passed to the
multi_range_read_init method.
The 'cost' parameter is used to get the cost of the whole multi-range read
operation. The cost C is calculated as:
C = f1 * io_count * avg_io_cost + f2 * cpu_cost +
f3 * mem_cost + f4 * import_cost
where f1, f2, f3, f4 are system variables that allow to tune the cost of usage
of different resources ¨C CPU, main memory and nework/other servers ¨C with
respect to the cost of IO operations.
All these components of the cost are supposed to be used by some future
variants of the EXPLAIN command.
The rules to calculate these costs depend basically on a given implementation
of the MRR handler functions.
The function multi_range_read_info is supposed to get cost estimates in the
cases when the range sequence is unknown yet or too long.
If all members of the range sequence are known before the query is executed
then another MRR function - multi_range_read_info_const - can be used to get
cost estimates. This function is assumed to return better estimates as
implementations of this function are allowed to perform index lookups for range
endpoints.
The synopsis the multi_range_read_info_const handler function is:
int multi_range_read_info_const(
uint keyno, /* index number */
RANGE_SEQ_IF *seq; /* range sequence interface */
void *seq_init_param;/* first param for seq.init */
uint n, /* number of ranges in seq */
uint *bufsz, /* size of the buffer to be used */
uint *flags, /* in/out: flags */
COST_VECT *cost /* est. cost of the MRR access */
)
The parameters seq, seq_init_param and n are used to walk through the sequence
of ranges whose scan is to be estimated. These parameters have the same meaning
as in the function multi_range_read_init.
The remaining parameters are quite similar to those of the same name used by
the function multi_range_read_info.
CONTENTS
1. Interface description
1.1 Coding style-compliant MRR Interface definitions
1.2 Coding style-compliant Range Sequence interface definitions
1.3 Definitions of used HA_MRR_* flags
2. Range sequence interface implementations
2.1 The "array walker" implementation
2.2 SEL_ARG tree walker implementation
3. MRR interface implementations
3.1 The default implementation
3.2 Changes in ha_ndbcluster.cc
4. SQL Layer changes
4.1 Update range analysis code to use the MRR interface
4.2 Update the QUICK_RANGE_SELECT code to use the new MRR interface
5. What will not be implemented
1. Interface description
========================
The MRR interface consists of two parts:
* The MRR interface itself
MRR interface is implemented by
- Generic implementation (converts MRR calls to non-MRR calls)
- ha_ndblcuster-specific implementation
- [in development] MyISAM/InnoDB/BDB implementation
MRR interface is used by
- Range access method code
- [in development] Batched key access
- [possible] Loose Index scan
* The Range sequence interface. This interface is part of MRR interface.
Range sequence interface is implemented by
- range access method optimizer
- range access method row retrieval code
Range sequence is used by
- Any table handler that implements MRR interface
1.1 Coding style-compliant MRR Interface definitions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
typedef struct st_key_range
{
const byte *key;
uint length;
} RANGE_ENDPOINT;
typedef struct st_key_multi_range
{
RANGE_ENDPOINT start_key;
RANGE_ENDPOINT end_key;
enum ha_rkey_function function;
char *ptr;
uint range_flag;
} KEY_RANGE;
/*
Get cost and other information about MRR scan over currently unknown ranges
SYNOPSIS
multi_range_read_info()
keyno Index number
n_ranges Estimated number of ranges (i.e. intervals) in the
range sequence.
n_rows Estimated total number of records contained within all
of the ranges
bufsz INOUT IN: Size of the buffer available for use
OUT: Size of the buffer that will be actually used, or
0 if buffer is not needed.
flags INOUT A combination of HA_MRR_* flags
cost OUT Estimated cost of MRR access
DESCRIPTION
This function obtains the cost and other info about about the MRR scan.
The caller has some information about what kinds of ranges will be scanned
(e.g. "1000 {keypart1=const} ranges"}, but the constant themselves are
assumed not to be known when this function is called.
The flags parameter is a combination of those flags: HA_MRR_SORTED,
HA_MRR_INDEX_ONLY, HA_MRR_NO_ASSOCIATION, HA_MRR_LIMITS.
RETURN
0 - OK, *cost contains cost of the scan, *bufsz and *flags contain scan
parameters.
other - Error or can't perform the requested scan
*/
int multi_range_read_info(uint keyno, uint n_ranges, uint n_rows, uint *bufsz,
uint *flags, COST_VECT *cost);
/*
Get cost and other information about MRR scan over a known list of ranges
SYNOPSIS
multi_range_read_info_const()
keyno Index number
seq Range sequence to be traversed
seq_init_param First parameter for seq->init()
n_ranges Number of ranges in the sequence
bufsz INOUT IN: Size of the buffer available for use
OUT: Size of the buffer that is expected to be actually
used, or 0 if buffer is not needed.
flags INOUT A combination of HA_MRR_* flags
cost OUT Estimated cost of MRR access
DESCRIPTION
Obtain the cost and other information about an MRR scan when the list of
ranges to be scanned is known.
NOTE
Reasons for having this function:
* Let high-roundtrip-cost handlers, like NDB/Federated produce the
estimates in one roundtrip. (One might ask, why don't we use
condition-pushdown-based system for the purpose of estimating number of
records that match the table condition?... good question... in any case
there is another reason)
* We need a specialized cost-of-MRR-over-constant-ranges function.
= It must be different from non-MRR-cost function (since MRR is supposed
to be better and have a smaller cost)
= It must be different from multi_range_read_info() because here we know
the ranges.
Therefore, we have this function.
RETURN
0 - OK, *cost contains cost of the scan, *bufsz and *flags contain scan
parameters.
1 - Error or can't perform the requested scan
*/
int multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
void *seq_init_param, uint n_ranges,
uint *bufsz, uint *flags, COST_VECT *cost);
/*
Initialize the MRR scan
SYNOPSIS
multi_range_read_init()
seq Range sequence to be traversed
seq_init_param First parameter for seq->init()
n_ranges Number of ranges in the sequence
mode Flags, see the description for details
buf INOUT, memory buffer to be used.
DESCRIPTION
Initialize the MRR scan. This function may do heavyweight scan
initialization like row prefetching/sorting/etc (NOTE: but better not do
it here as we may not need it, e.g. if we never satisfy WHERE clause on
previous tables. For many implementations it would be natural to do such
initializations in _next() call)
mode is a combination of the following flags: HA_MRR_SORTED,
HA_MRR_INDEX_ONLY, HA_MRR_NO_ASSOCIATION
NOTES
One must have called index_init() before calling this function. Several
multi_range_read_init() calls may be made in course of one query.
Until WL#2623 is done (see section 3.2), the following will also hold:
The caller will guarantee that if "seq->init == mrr_ranges_array_init"
then seq_init_param will be an array of KEY_MULTI_RANGE structures (with
n_ranges elements). This property will only be used by NDB handler until
WL#2623 is done (see section 3.2 again).
Buffer memory management is done according to the following scenario:
The caller allocates the buffer and provides it to the callee by filling
in the members of HANDLER_BUFFER structure (see current code for
definition of HANDLER_BUFFER).
The callee consumes all or some fraction of the provided buffer space, and
sets the HANDLER_BUFFER members accordingly.
The callee may use the buffer memory until the next multi_range_read_init()
call is made, all records have been read, or until index_end() call is
made, whichever comes first.
RETURN
0 - OK
1 - Error
*/
int multi_range_read_init(RANGE_SEQ_IF *seq, void *seq_init_param,
uint n_ranges, uint mode, HANDLER_BUFFER *buf);
/*
Get next record in MRR scan
SYNOPSIS
multi_range_read_next()
range_info OUT If HA_MRR_NO_ASSOCIATION flag is in effect or an error
occured, nothing (but the caller is still allowed to
store an arbitrary value there)
Otherwise, the opaque value associated with the range
that contains the returned record.
NOTE
It should be possible to process range lists of arbitrary length without
any kind of overflows.
RETURN
0 OK
other Error code
*/
int multi_range_read_next(char **range_info);
/*
Operation cost vector class
*/
class COST_VECT
{
public:
double io_count; /* number of I/O */
double avg_io_cost; /* cost of an average I/O oper. */
double cpu_cost; /* cost of operations in CPU */
double mem_cost; /* cost of used memory */
double import_cost; /* cost of remote operations */
double total_cost()
{
/* f{i} will be #define'd constants */
return f1 * io_count * avg_io_cost + f2 * cpu_cost +
f3 * mem_cost + f4 * import_cost;
}
};
1.2 Coding style-compliant Range Sequence interface definitions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
typedef void *range_seq;
typedef struct st_range_seq_if
{
/*
Initialize the traversal of range sequence
SYNOPSIS
init()
init_params The seq_init_param parameter
n_ranges The number of ranges obtained
flags HA_MRR_SINGLE_POINT, HA_MRR_FIXED_KEY
RETURN
An opaque value to be used as RANGE_SEQ_IF::next() parameter
*/
range_seq (*init)(void *init_params, uint n_ranges, uint flags);
/*
Get the next range in the range sequence
SYNOPSIS
next()
seq The value returned by RANGE_SEQ_IF::init()
range OUT
RETURN
0 - Ok, the range structure filled with info about the next range
1 - No more ranges
*/
uint (*next) (range_seq seq, KEY_RANGE *range);
} RANGE_SEQ_IF;
1.3 Definitions of used HA_MRR_* flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/*
The below two are not used (and not handled) in this milestone of this WL
entry because there seems to be no use for them at this stage of
implementation. (TODO is that so??)
*/
#define HA_MRR_SINGLE_POINT n
#define HA_MRR_FIXED_KEY n
/*
Indicates that RANGE_SEQ_IF::next(&range) doesn't need to fill in the
'range' parameter.
*/
#define HA_MRR_NO_ASSOCIATION n
/*
The MRR user will provide ranges in key order, and MRR implementation
must return rows in key order.
*/
#define HA_MRR_SORTED n
/* Don't retrieve full records */
#define HA_MRR_INDEX_ONLY n
/*
The passed memory buffer is of maximum possible size, the caller can't
assume larger buffer.
*/
#define HA_MRR_LIMITS n
2. Range sequence interface implementations
===========================================
2.1 The "array walker" implementation
-------------------------------------
This implementation will be used by QUICK_RANGE_SELECT code. When a
QUICK_RANGE_SELECT is constructed, the list of ranges to be scanned is
materialized in an array. This implementation will walk over this array.
There will be two RANGE_SEQ_IF-member-compliant functions:
range_seq is a pointer to
struct
{
QUICK_RANGE *cur;
QUICK_RANGE *end;
};
/* Set the cur and end pointers */
mrr_ranges_array_init();
/* Advance over one element in the array */
mrr_ranges_array_next()
{
if (cur == end)
return 1; /* No more ranges */
else
{
//fill range from *cur
cur++;
return 0;
}
}
2.2 SEL_ARG tree walker implementation
--------------------------------------
This range sequence interface implementation will traverse the SEL_ARG* tree
representation of range list.
struct state
{
SEL_ARG **stack;
byte *min_key;
byte *max_key;
};
init()
{
/* Initialize the traversal */
}
next()
{
/* Traverse the SEL_ARG tree until we've built the next interval */
}
3. MRR interface implementations
================================
3.1 The default implementation
------------------------------
This is the only implementation that will be created in scope of this WL
entry. The planned implementations are WL#2475 (MyISAM/InnoDB/etc) and
WL#2623 (NDB).
The purpose of this implementation is to allow the query executioner to always
use the MRR interface, no matter if the underlying engine supports MRR or not.
In a nutshell, this is "MRR calls to non-MRR calls adapter"
Below is a sketch of the functions to be implemented:
int
handler::multi_range_read_info(uint key_no, uint n_ranges, uint keys,
uint *bufsz, uint *flags, COST_VECT *cost)
{
*bufsz= 0; // No need for buffer
/* Produce the same cost as current code does */
if (*flags & HA_MRR_INDEX_ONLY)
{
/* get_index_only_read_time is implemented in opt_range.cc already */
cost->io_cost= get_index_only_time(key_no, keys);
}
else
cost->io_count= this->read_time(n_ranges, keys);
cost->avg_io_cost= 1; // Assume random seeks
cost->cpu_cost= cost->import_cost= cost->mem_cost= 0;
return 0;
}
int
handler::multi_range_read_info_const(uint keyno, RANGE_SEQ_IF *seq,
void *seq_init_param, uint n_ranges,
uint *bufsz, uint *flags, COST_VECT *cost)
{
KEY_RANGE range;
range_seq seq_it;
ha_rows rows, total_rows;
*bufsz= 0;
total_rows= 0;
seq->init(seq_init_param, n_ranges, flags);
while (!seq->next(seq_it, &range))
{
if (HA_POS_ERROR == (rows= this->records_in_range(keyno, &range.start_key,
&range.end_key)))
{
/* Can't scan one range => can't do MRR scan at all */
return 1;
}
total_rows += rows;
}
cost->io_count= this->read_time(n_ranges, total_rows);
cost->avg_io_cost= 1; // Assume random seeks
cost->cpu_cost= cost->import_cost= cost->mem_cost= 0;
return 0;
}
For handler::multi_range_read_init() and handler::multi_range_read_next(),
adopt the code from handler::read_multi_range_first() and
handler::read_multi_range_next(). It should be easy as that code uses simple
loops.
3.2 Changes in ha_ndbcluster.cc
-------------------------------
The proper changes will be done in WL#2623. Within the scope of this WL
entry the following will be done:
The ha_ndbcluster::read_multi_range_first and read_multi_range_next functions
will be changed to use the new interface. The new functions will not be fully
conformant implementations: They'll only handle the case where
"seq->init == mrr_ranges_array_init"
This limitation is expected to be lifted by implementation of WL#2623.
4. SQL Layer changes
====================
4.1 Update range analysis code to use the MRR interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is partially already done.
See section #(SEL_ARG tree) for one thing.
4.2 Update the QUICK_RANGE_SELECT code to use the new MRR interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Update the QUICK_*_SELECT to use the new MRR interface.
Also remove the QUICK_RANGE class and replace all its uses with uses of
KEY_RANGE as these two structures contain identical information.
//btw we will not ask the table handler to return the container interval.
5. What will not be implemented
===============================
Support for HA_MRR_SEMI_JOIN, ANTI_JOIN or OUTER_JOIN will not be implemented
in this WL entry as there is no way to test that functionality.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.