[+/]
The operation of partition pruning is defined as follows:
“Given a query over partitioned table, match the table
DDL against any WHERE
or
ON
clauses, and find the minimal set of
partitions that must be accessed to resolve the query.”
The set of partitions thus obtained (hereafter referred to as “used”) can be smaller then the set of all table partitions. Partitions that did not get into this set (that is, those that were pruned away) will not be accessed at all: this is how query execution is made faster.
NonTransactional Table Engines.
With nontransactional tables such as
MyISAM
, locks are placed on entire
partitioned table. It is theoretically possible to use
partition pruning to improve concurrency by placing locks
only on partitions that are actually used, but this is
currently not implemented.
Partition pruning doesn't depend on what table engine is used. Therefore its implementation is a part of the MySQL Query Optimizer. The next few sections provide a detailed description of partition pruning.
Partition pruning is performed using the following steps:
Analyze the
WHERE
clause and construct an interval graph describing the results of this analysis.Walk the graph, and find sets of partitions (or subpartitions, if necessary) to be used for each interval in the graph.
Construct a set of partitions used for the entire query.
The description represented by the interval graph is structured in a “bottomup” fashion. In the discussion that follows, we first define the term partitioning interval, then describe how partitioning interval are combined to make an interval graph, and then describe the graph “walking” process.
Let's start from simplest cases. Suppose that we have a
partitioned table with N
columns, using partitioning type
p_type
and the partitioning
function p_func
, represented
like this:
CREATE TABLE t (columns)
PARTITION BY p_type(p_func(col1, col2,... colN)...);
Suppose also that we have a WHERE
clause of the form
WHERE t.col1=const1 AND t.col2=const2 AND ... t.colN=constN
We can calculate
and
discover which partition can contain records matching the
p_func
(const1, const2
... constN
)WHERE
clause. Note that this process
works for all partitioning types and all partitioning
functions.
This process works only if the WHERE
clause is of the exact form given above — that is,
each column in the table must be tested for equality
with some arbitrary constant (not necessarily the same
constant for each column). For example, if
col1=const1
were missing from the
example WHERE
clause, then we would
not be able to calculate the partitioning function value
and so would be unable to restrict the set of partitions
to those actually used.
Let a partitioned table t
be defined
with a set of column definitions
columns
, a partitioning type
p_type
using a partitioning
function p_func
taking an
integer column int_col
, as
shown here:
CREATE TABLE t (columns)
PARTITION BY
p_type(p_func(int_col))
...
Now suppose that we have a query whose
WHERE
clause is of the form
WHERE const1 <= int_col <= const2
We can reduce this case to a number of cases of
singlepoint intervals by converting the
WHERE
clause into the following
relation:
int_field=const1 OR
int_field=const1 + 1 OR
int_field=const1 + 2 OR
... OR
int_field=const2
In the source code this conversion is referred to as interval walking. Walking over short intervals is not very expensive, since we can reduce the number of partitions to scan to a small number. However, walking over long intervals may not be very efficient — there will be lots of numbers to examine, and we are very likely to out that all partitions need to be scanned.
The threshold for interval walking is determined by
#define MAX_RANGE_TO_WALK=10
The logic of the previous example also applies for a relation such as this one:
const1 >= int_col >= const2
Let a partitioned table t
be defined as
follows:
CREATE TABLE t (columns)
PARTITION BY RANGELIST(unary_ascending_function(column))
Suppose we have a query on table t
whose WHERE
clause is of one of the
forms shown here:
const1
<= t.column
<=const2
t.
column
<=const2
const1
<= t.column
Since the partitioning function is ascending, the following relationship holds:
const1 <= t.col <= const2
=> p_func(const1) <=
p_func(t.column) <= p_func(const2)
Using A
and
B
to denote the leftmost and
rightmost parts of this relation, we can rewrite it like
this:
A <= p_func(t.column) <= B
In this instance, the interval is closed and has two bounds. However, similar inferences can be performed for other kinds of intervals.
For RANGE
partitioning, each partition
occupies one interval on the partition function value
axis, and the intervals are disjoint, as shown here:
p0 p1 p2
table partitions xxx>
search interval x==============x>
A B
A partition needs to be accessed if and only if its
interval has a nonempty intersection with the search
interval [
.
A
,
B
]
For LIST
partitioning, each partition
covers a set of points on the partition function value
axis. Points produced by various partitions may be
interleaved, as shown here:
p0 p1 p2 p1 p1 p0
table partitions ++++++>
search interval x===================x>
A B
A partition needs to be accessed if it has at least one
point in the interval
[
. The set of
partitions used can be determined by running from
A
,
B
]A
to
B
and collecting partitions
that have their points within this range.
In the previous sections we've described ways to infer the set of used partitions from "elementary" WHERE clauses. Everything said there about partitions also applies to subpartitions (with exception that subpartitioning by RANGE or LIST is currently not possible).
Since each partition is subpartitioned in the same way, we'll find which subpartitions should be accessed within each partition.
Previous sections deal with inferring the set of partitions
used from WHERE
clauses that represent
partitioning or subpartitioning intervals. Now we look at
how MySQL extracts intervals from arbitrary
WHERE
clauses.
The extraction process uses the Range
Analyzer — a part of the MySQL optimizer
that produces plans for the range access method. This is
because the tasks are similar. In both cases we have a
WHERE
clause as input: the range access
method needs index ranges (that is, intervals) to scan;
partition pruning module needs partitioning intervals so
that it can determine which partitions should be used.
For range access, the Range Analyzer is invoked with the
WHERE
clause and descriptions of table
indexes. Each index is described by an ordered list of the
columns which it covers:
(keypart1, keypart2, ..., keypartN)
For partition pruning, Range Analyzer is invoked with the
WHERE
clause and a list of table columns
used by the partitioning and subpartitioning functions:
(part_col1, part_col2, ... part_colN,
subpart_col1, subpart_col2, ... subpart_colM)
The result of the Range Analyzer's work is known as a
SEL_ARG
graph.
This is a complex (and not yet fully documented) structure,
which we will not attempt to describe here. What's important
for the current discussion is that we can walk over it and
collect partitioning and subpartitioning intervals.
The following example illustrates the structure and the
walking process. Suppose a table t
is
partitioned as follows:
CREATE TABLE t (..., pf INT, sp1 CHAR(5), sp2 INT, ... )
PARTITION BY LIST (pf)
SUBPARTITION BY HASH(sp1, sp2) (
PARTITION p0 VALUES IN (1),
PARTITION p1 VALUES IN (2),
PARTITION p2 VALUES IN (3),
PARTITION p3 VALUES IN (4),
PARTITION p4 VALUES IN (5),
);
Now suppose that a query on table t
has a
highly complex WHERE
clause, such as this
one:
pf=1 AND (sp1='foo' AND sp2 IN (40,50))
OR
(pf1=3 OR pf1=4) AND sp1='bar' AND sp2=33
OR
((pf=3 OR pf=4) AND sp1=5)
OR
p=8
The SEL_ARG
graph for this is shown here:
(root)
 :
 Partitioning : Subpartitioning
 :
 :
 :
 ++ : ++ ++
\ pf=1 : sp1='foo'  sp2=40 
++ : ++ ++
 : 
 : ++
 :  sp2=50 
 : ++
 :
 :
++ : ++ ++
 pf=3 :+ sp1='bar'  sp2=33 
++ :  ++ ++
 : 
++ : 
 pf=4 :+
++ :
 :
 :
++ : ++
 pf=8 : sp1='baz' 
++ : ++
In the previous diagram, vertical edges
(
) represent OR
and
the horizontal ones (
) represent
AND
(the line with both horizontal and
vertical segments also represents AND
).
The partitionpruning code walks the graph top to bottom and from left to right, making these inferences:
Start with an empty set of used partitions at the topmost and leftmost interval.
Perform interval analysis for
pf=1
; find a corresponding set of partitionsP0
; move right.Move right again, to
sp2=40
.Analyze the interval
sp1='foo' AND sp2=40
interval; find that it covers rows in some subpartitionSP1
. Make first inference: within each partition making up setP0
, mark subpartitionSP1
as “used”.Move down to
sp2=50
.Analyze the interval
sp1='foo' AND sp2=50
, finding that it covers rows in some subpartitionSP2
. Make another inference: within each partition of setP0
, mark subpartitionSP2
as used.Move back to
pf=1
, and then down topf=3
.
Perform interval analysis for
pf=3
; find a corresponding set of partitionsP1
; move right.Move right again, to
sp2=33
.Analyze the interval
sp1='foo' AND sp2=33
, find that it covers rows in a subpartitionSP3
. Make another inference: within each partition from setP1
, mark subpartitionSP3
as “used”.Move back to
pf=3
, then down topf=4
.
Perform interval analysis for
pf=4
; find a corresponding set of partitionsP2
; move right.Perform moves and inferences analogous to what we did to the right of
pf=3
. There is some potential inefficiency due to the fact that we will analyze the interval forsp1='foo' AND sp2=33
again, but this should not have much impact on overall performance.Move back to
pf=3
, then down topf=8
.
Perform interval analysis for
pf=8
; find a corresponding set of partitionsP3
, move right.Now we've arrived at
sp1='baz'
, and find that we can't move any further to the right and can't construct a subpartitioning interval. We remember this, and move back topf=8
.In the previous step we could not limit the set of subpartitions, so we make this inference: for every partition in set
P3
, assume that all subpartitions are active, and mark them as such.
Try to move down from
pf=8
; find that there is nothing there; this completes the graph analysis.
In certain cases the result of the
RANGE
optimizer will be several
SEL_ARG
graphs that are to be combined
using OR
or AND
operators. This happens for WHERE
clauses which either are very complicated or do not allow
for the construction of a single list of intervals. In
such cases, the partition pruning code takes apprpriate
action, an example being this query:
SELECT * FROM t1 WHERE partition_id=10 OR subpartition_id=20
No single list of intervals can be constructed in this instance, but the partition pruning code correctly infers that the set of partitions used is a union of:

All subpartitions within the partition containing rows with
partition_id=10
; anda subpartition containing rows with
subpartition_id=20
within each partition.
Here is a short walkthrough of what is where in the code:

sql/opt_range.cc
:This file contains the implementation of what is described in Section 7.3.2.1.4, “From WHERE Clauses to Intervals”. The entry point is the function
prune_partitions()
.There are also detailed codelevel comments about partition pruning; search for
PartitionPruningModule
to find the starting point. 
sql/partition_info.h
:class partition_info { ... /* Bitmap of used (i.e. not pruned away) partitions. This is where result of partition pruning is stored. */ MY_BITMAP used_partitions; /* "virtual function" pointers to functions that perform interval analysis on this partitioned table (used by the code in opt_range.cc) */ get_partitions_in_range_iter get_part_iter_for_interval; get_partitions_in_range_iter get_subpart_iter_for_interval; };

sql/sql_partition.cc
:This file contains the functions implementing all types of partitioning interval analysis.