range access method
uses a single index to retrieve a subset of table rows that
are contained within one or several index value intervals. It
can be used for a single-part or multiple-part index. The
following sections give descriptions of conditions under which
the optimizer uses range access.
For a single-part index, index value intervals can be
conveniently represented by corresponding conditions in the
WHERE clause, denoted as
rather than “intervals.”
The definition of a range condition for a single-part index is as follows:
BTREEindexes, comparison of a key part with a constant value is a range condition when using the
LIKEcomparisons if the argument to
LIKEis a constant string that does not start with a wildcard character.
“Constant value” in the preceding descriptions means one of the following:
Here are some examples of queries with range conditions in
SELECT * FROM t1 WHERE key_col > 1 AND key_col < 10; SELECT * FROM t1 WHERE key_col = 1 OR key_col IN (15,18,20); SELECT * FROM t1 WHERE key_col LIKE 'ab%' OR key_col BETWEEN 'bar' AND 'foo';
Some nonconstant values may be converted to constants during the optimizer constant propagation phase.
MySQL tries to extract range conditions from the
WHERE clause for each of the possible
indexes. During the extraction process, conditions that
cannot be used for constructing the range condition are
dropped, conditions that produce overlapping ranges are
combined, and conditions that produce empty ranges are
Consider the following statement, where
key1 is an indexed column and
nonkey is not indexed:
SELECT * FROM t1 WHERE (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z');
The extraction process for key
key1 is as
Start with original
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z')
nonkey = 4and
key1 LIKE '%b'because they cannot be used for a range scan. The correct way to remove them is to replace them with
TRUE, so that we do not miss any matching rows when doing the range scan. Having replaced them with
TRUE, we get:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR (key1 < 'bar' AND TRUE) OR (key1 < 'uux' AND key1 > 'z')
Collapse conditions that are always true or false:
(key1 LIKE 'abcde%' OR TRUE)is always true
(key1 < 'uux' AND key1 > 'z')is always false
Replacing these conditions with constants, we get:
(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)
FALSEconstants, we obtain:
(key1 < 'abc') OR (key1 < 'bar')
Combining overlapping intervals into one yields the final condition to be used for the range scan:
(key1 < 'bar')
In general (and as demonstrated by the preceding example),
the condition used for a range scan is less restrictive than
WHERE clause. MySQL performs an
additional check to filter out rows that satisfy the range
condition but not the full
MySQL does not support merging multiple ranges for the
range access method for
spatial indexes. To work around this limitation, you can use
UNION with identical
SELECT statements, except
that you put each spatial predicate in a different
Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.
For example, consider a multiple-part index defined as
key1(, and the
following set of key tuples listed in key order:
key_part1 key_part2 key_part3 NULL 1 'abc' NULL 1 'xyz' NULL 2 'foo' 1 1 'abc' 1 1 'xyz' 1 2 'abc' 2 1 'aaa'
defines this interval:
(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)
The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.
By contrast, the condition
does not define a single interval and cannot
be used by the range access method.
The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.
HASHindexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:
key_part1 cmp const1 AND key_part2 cmp const2 AND ... AND key_partN cmp constN;
const2, … are constants,
cmpis one of the
IS NULLcomparison operators, and the conditions cover all index parts. (That is, there are
Nconditions, one for each part of an
N-part index.) For example, the following is a range condition for a three-part
key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'
For the definition of what is considered to be a constant, see The Range Access Method for Single-Part Indexes.
BTREEindex, an interval might be usable for conditions combined with
AND, where each condition compares a key part with a constant value using
'does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if
The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is
IS NULL. If the operator is
LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses
=from the first comparison. It also uses
>=from the second comparison but considers no further key parts and does not use the third comparison for interval construction:
key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10
The single interval is:
('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)
It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value
('foo', 11, 0), which does not satisfy the original condition.
If conditions that cover sets of rows contained within intervals are combined with
OR, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with
AND, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:
(key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)
The intervals are:
(1,-inf) < (key_part1,key_part2) < (1,2) (5,-inf) < (key_part1,key_part2)
In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The
key_lencolumn in the
EXPLAINoutput indicates the maximum length of the key prefix used.
In some cases,
key_lenmay indicate that a key part was used, but that might be not what you would expect. Suppose that
NULL. Then the
key_lencolumn displays two key part lengths for the following condition:
key_part1 >= 1 AND key_part2 < 2
But, in fact, the condition is converted to this:
key_part1 >= 1 AND key_part2 IS NOT NULL
The Range Access Method for Single-Part Indexes, describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.
Consider these expressions, where
col_name is an indexed column:
col_name IN(val1, ..., valN) col_name = val1 OR ... OR col_name = valN
Each expression is true if
col_name is equal to any of
several values. These comparisons are equality range
comparisons (where the “range” is a single
value). The optimizer estimates the cost of reading
qualifying rows for equality range comparisons as follows:
If there is a unique index on
col_name, the row estimate for each range is 1 because at most one row can have the given value.
Otherwise, any index on
col_nameis nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.
With index dives, the optimizer makes a dive at each end of
a range and uses the number of rows in the range as the
estimate. For example, the expression
has three equality ranges and the optimizer
makes two dives per range to generate a row estimate. Each
pair of dives yields an estimate of the number of rows that
have the given value.
col_name IN (10, 20,
Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.
system variable enables you to configure the number of
values at which the optimizer switches from one row
estimation strategy to the other. To permit use of index
dives for comparisons of up to
equality ranges, set
N + 1. To disable use of
statistics and always use index dives regardless of
To update table index statistics for best estimates, use
To control the memory available to the range optimizer, use
A value of 0 means “no limit.”
With a value greater than 0, the optimizer tracks the memory consumed when considering the range access method. If the specified limit is about to be exceeded, the range access method is abandoned and other methods, including a full table scan, are considered instead. This could be less optimal. If this happens, the following warning occurs (where
Nis the current
Warning 3170 Memory capacity of N bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.
For individual queries that exceed the available range
optimization memory and for which the optimizer falls back
to less optimal plans, increasing the
value may improve performance.
To estimate the amount of memory needed to process a range expression, use these guidelines:
For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with
ORuses approximately 230 bytes:
SELECT COUNT(*) FROM t WHERE a=1 OR a=2 OR a=3 OR .. . a=N;
Similarly for a query such as the following, each predicate combined with
ANDuses approximately 125 bytes:
SELECT COUNT(*) FROM t WHERE a=1 AND b=1 AND c=1 ... N;
For a query with
SELECT COUNT(*) FROM t WHERE a IN (1,2, ..., M) AND b IN (1,2, ..., N);
Each literal value in an
IN()list counts as a predicate combined with
OR. If there are two
IN()lists, the number of predicates combined with
ORis the product of the number of literal values in each list. Thus, the number of predicates combined with
ORin the preceding case is
Before 5.7.11, the number of bytes per predicate combined
OR was higher,
approximately 700 bytes.
The optimizer is able to apply the range scan access method to queries of this form:
SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));
Previously, for range scans to be used, it was necessary to write the query as:
SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' ) OR ( col_1 = 'c' AND col_2 = 'd' );
For the optimizer to use a range scan, queries must satisfy these conditions:
On the left side of the
IN()predicate, the row constructor contains only column references.
On the right side of the
IN()predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.
On the right side of the
IN()predicate, there is more than one row constructor.
For more information about the optimizer and row constructors, see Section 22.214.171.124, “Row Constructor Expression Optimization”