WL#2985: Perform Partition Pruning of Range conditions
Affects: Server-5.1
—
Status: Complete
This is part of the solution to the problem of only using the necessary partitions in a query using partitioned. The problem is given any WHERE clause with a set of tables to come up with a set of bitmaps (one bitmap per table) where the bitmap specifies all partitions to be used for that table. (Naturally only bitmaps are needed for partitioned tables). This is another way of phrasing the problem defined in WL #2537 and WL #2538. Background ---------- From an email from Mikael Ronstrom to Trudy Pelzer, 2005-11-17: > After discussing with Mark Matthews I went ahead and made a slight > reorganisation of the WL tasks for handling optimisations of > partitioning that enables more parallelism in the development. > > The split is to implement WL #2537 and WL #2538 by WL#2985, 2986 > and 2987 instead. > > WL 2985 is the Server part of the optimisation, WL 2986 is the > partition handler part of the tasks and WL 2987 is the NDB handler > part of the optimisation. > Cancelled entries are WL#2537 and 2538; see their descriptions for further details. -- Trudy Pelzer, 2005-11-23
This document contains: - an overview of methods of table partitioning supported by MySQL server - a formal specification of the problem of query partition pruning - a description of a procedure to perform partition pruning Partition Tables ---------------- We say that table T is partitioned if there is function PF that maps rows of table T into [1,...,n]. This mapping effectively splits the set of rows in table T into n disjoint sets. Each of theses sets is of the same structure as table T and we can consider it as a separate table called partition. Thus for a partitioned table T we have a set of partition tables P1,...,Pn such that for any given state of the table R(T) the following is true: R(T)=R(P1) UNION ... R(Pn), where R(Pi) INTERSECT R(Pj) = 0 for any i and j. Usually the following types of partition map functions are considered: - range partitioning - hash partitioning - combined partitioning Range partitioning is defined with the help of partition index I. A sequence of disjoint index ranges R1,...,Rn that covers all index values is specified. A row r belongs to the partition Pi iff I(r) in Ri. (In MySQL a partition index over columns C1,...,Ck of a table T is defined by an expression E over columns C1,...Ck that returns a scalar value). We could have not one-to-one correspondence between ranges and partitions. Ranges could be divided into groups of ranges RG1,...RGm: RGj={Ri1,...Rij}. In this case we have: (r IN Pi) iff RGi includes a range Ril that contains I(r). So-called list partitioning supported in MySQL provides us with an example of such mapping. List partitioning specifies groups of values that select different partitions. For the following considerations it does not matter whether partitions are defined by ranges or by groups of ranges each group containing maybe a number of ranges. Hash partitioning is defined with the help of hash function H with different values H1,...Hn. A row belongs to the partition Pi iff H(r)=Hi. With combined partitioning we have range partitioning on the upper level and hash sub-partitioning for each range partition. The fact that table T is partitioned into partitions P1,...,Pn will be denoted as T[P1,...,Pn]. Let Pi1,...,Pik be a subset of partitions P1,...,Pn. A projection of a set of rows RS from table T into partitions Pi1,...Pik will be denoted as RS[Pi1,...,Pik]. Thus the set of all rows from partitions Pi1,...,Pik can be denoted as T [Pi1,...,Pik]. The Problem of Partition Pruning -------------------------------- For each query Q over a partitioned table T[P1,...Pn] and some other, possibly partitioned tables T1,...,Tm we can consider an operation of partition pruning for partitions of the table T. If Q is a query over a table and R is a set of rows from this table then Q(R) will denote the result set for the query evaluated for the table T populated only with rows from the subset R. In its most general form the operation of partition pruning for a query Q can be defined as a function PP that returns a subset of partitions Pi1,...Pik such that Q(T,T1,...,Tm) = Q(T[Pi1,...,Pik],T1,...Tm). If this subset is minimal then we say that the function performs full partition pruning. It is obvious that if two different subsets of partitions P={Pi1,...,Pik} and P’={Pj1,...,Pjl} can be returned as results of partition pruning then the intersect of this subsets P INTERSECT P’ can be yielded as a result of partition pruning as well. In other words: (Q(T,T1,...,Tm) = Q(T[Pi1,...,Pik],T1,...Tm) AND Q(T,T1,...,Tm) = Q(T[Pj1,...,Pjl],T1,...Tm)) => Q(T,T1,...,Tm) = Q(T[P INTERSECT P’],T1,...Tm). This fact let us to talk about full partition pruning. A procedure of partition pruning -------------------------------- An efficient procedure of full partition pruning can be constructed for a wide class of queries. Let C1,...,Ck be the columns that form the partition index. Define a regular index Idx(C1,...,Ck) on table T and perform the procedure that extracts the a range condition RC for the query Q. This condition can be represented by a sequence of disjoint intervals (ranges) over index Idx. The intervals can be open, closed and semi-closed. Here is an example of such a sequence for a single component index over an integer column C: (2, 4), [10, 17], (21, 32], [100, ). The corresponding RC condition is: (2 < C AND C < 4 ) OR (10 <= C AND C <=17) OR (21 < C AND C <= 32) OR (100 <= C). For any range condition RC extracted by this procedure we have: (r IN Q(R)) => RC(proj[Idx](r))=true. Here proj[Idx](r) denotes the projection of row r on the columns of index Idx. The sequence of intervals corresponding to a range condition RC will be denoted by S(RC). It should be noted that an extraction of the RC sequence of intervals for an index is not a trivial operation in a general case even for single component indexes. Let’s consider the following query: SELECT * FROM t1,t2 WHERE (t1.a < 5 OR t1.b=t2.b OR t1.a > 10) AND (t.a <= 20 OR t1.c=t2.c AND t1.a > 0) with column ‘a’ forming the partition index for partitioned table ‘t1’. First there will be extracted the formula: (t1.a < 5 AND t1.a <= 20 OR t1.a > 10 AND t1.a <= 20 OR t1.a < 5 AND t1.a > 0 OR t1.a > 10 AND t1.a > 0) This formula will be simplified to: (t1.a > 10 AND t1.a <= 20) OR (t1.a > 0 AND t1.a < 5) This formula specifies the following intervals for the partition index: (0, 5) (10, 20]. If the query Q is such that: - it contains a where condition C - it does not contain any grouping or aggregate function - the pushdown condition for table T extracted from C is a AND-OR formula over comparison predicates then the extracted range condition RC is guaranteed to be maximal, i.e. there is no other range condition RC’ such that: RC(proj[Idx](r)) => RC’(proj [Idx](r)) & ((r IN Q(R)) => RC(proj[Idx](r))=true for any instance of table T). Here by comparison predicates we understand the predicates that use the following operators: =, <>, <, <=, >, >=, BETWEEN, IN, IS NULL, IS NOT NULL. Let table T be defined over range partitions as follows: CREATE TABLE T ( ... ) PARTITION BY RANGE (C) ( PARTITION P1 VALUES LESS THAN (L1), PARTITION P2 VALUES LESS THAN (L2), ... PARTITION Pn VALUES LESS THAN (Ln) ) where L1,...Ln are constants. The partition ranges here are (,L1),[L1,L2)...[Ln,). It’s easy to find a subsequence of this sequence of intervals Ri1,...,Rik such that for each Rij from this sequence we have (Rij INTERSECT S(RC) != 0). This subsequence will yield the result of partition pruning. Apparently if RC is maximal range condition the described procedure performs full partition pruning. If range partitions are defined with the help of a growing monotone function F: CREATE TABLE T ( ... ) PARTITION BY RANGE (F(C)) ( PARTITION P1 VALUES LESS THAN (L1), PARTITION P2 VALUES LESS THAN (L2), ... PARTITION Pn VALUES LESS THAN (Ln) ) then to find the partition to be used we have: 1. to get ranges F(S(RC)) 2. to select the subsequence of partition ranges that are intersected with F(S(RC)). The same procedure can be used if F is an arbitrary function but S(RC) contains only single point intervals. If hash partitioning is defined for table T and the extracted sequence range condition S(RC) consists only of single-point intervals we still can perform partition pruning: H(S(RC)) will yield the numbers of the partitions to be used. Implementation of partition pruning ----------------------------------- In the pseudo-code below list partitioning is considered as a special case of range group partitioning (see comments on this in the first section). As a result of the actions presented by this pseudo-code some partitions and sub-partitions of a partition table T are pruned for a given query Q and those that have escaped pruning become specially marked. If after: not elimination, having merge, outer join elimination, equality propagation there are no ON condition left in the join being processed then: for each partition table T { 1.Find components to form the partition index [for combined partitioning an index is created where columns used for hash partitioning follow columns used for range partitioning]; 2.Build internal structures for the partition index to extract a range condition for it RC; 3.Extract the range condition RC; 4.If (RC is empty) Mark all partitions and optional sub-partitions; Else { If (this is a range partitioning and (a monotone function F is used or all RC ranges are singletons) Mark partitions for ranges intersected with F(S(RC)); Else If (this is a hash partitioning and S(RC) is a sequence of single-point ranges) Mark partitions with indexes H(S(RC)); Else If (this is a combined partitioning and a monotone function F is used for range partitions) { Mark all sub-partitions of partitions for ranges intersected with F(S(RC)); If all RC ranges intersected for such a range are singletons S1,...,Sk then unmark sub-partitions that are not covered by H(S1),...,H(Sk); } } } Note. To handle a case where no partitions but some sub-partitions can be pruned we should have additionally applied range analysis to an index consisting only of the columns used for sub-partitioning. The easiest way to mark partitions to be used after pruning is probably to add a new flag member to the st_table structure used by any handler. To speed up iterations over marked partitions it makes sense also to add a field to link this marked partitions into a chain.
WL#2985 Partition Pruning - LLD =============================== CONTENTS 1. Where partition pruning is invoked 2. Passing info about pruned partitions to the table handler 3. The pruning function 3.1 Partition index description construction 3.2 Changes in the code invoked from get_mm_tree() 3.2.1 Don't call field->optimize_range() when doing partition pruning 3.2.2 Allow construction of "index merge" tree for single index 3.3 Analysis of the produced SEL_TREE 3.3.1 A natural property: no redundant "partitions-in-interval analysis" calls 3.3.2 Do have redundant "subpartitions-in-interval analysis" calls though 3.4 [Sub]partitions-in-interval analysis 3.4.1 The generic case 3.4.2 The special case: use Partitioning interval analysis 3.4.2.1 Partitioning Interval analysis - interval mapping 3.4.2.1.1 Interval mapping: RANGE partitioning implementation 3.4.2.1.2 Interval mapping: LIST partitioning implementation 3.4.2.1.3 Interval mapping: Detection of monotonically increasing functions 3.4.2.2 Partitioning Interval analysis - interval walking 4. Other notes 1. Where partition pruning is invoked ------------------------------------- Partition pruning will be invoked from JOIN::optimize(), before the GROUP BY optimization (opt_sum_query) is applied. The reason for doing pruning before the opt_sum_query() call is that opt_sum_query may access the query table(s), and we want to access only non-pruned away partitions in these table accesses.Another possible place to invoke partition pruning would be in make_join_statistics(), right before the get_quick_record_count() call. As opposed to the "before opt_sum_query()" part, this part of the code is executed after the "const" tables have been read, so partition pruning will have a broader applicability when invoked here. Possible todo for the future: We could make two partitioning pruning steps for cases where that would allow us to prune away more partitions. The hypothesis is that if we've got a SEL_TREE of type MAYBE (e.g. "t1.key < t2.key2") or KEY_SMALLER (e.g. "t1.key< 10 AND t1.key< t2.key2") when performing partition pruning before the opt_sum_query() call, and later we got some tables marked as "const" tables, then we would be able to determine (in O(1) time) if it would make sense to run partition pruning again (for some particular remaining non- const table). 2. Passing info about pruned partitions to the table handler ------------------------------------------------------------ Information about used (ie. non-pruned-away) partitions will be stored in partition_info::used_partitions bitmap: bitmap_is_set(&used_partitions, X) <=> "partition with partition id X is used". When partition pruning procedure is invoked, all partitions are assumed to be unused. The prune_partitions() function will modify the bitmap, so after its return partition_info::used_partitions will indicate which partitions are used for the query. 3. The pruning function ----------------------- The prune_partitions() functions will do the following: prune_partitions() { construct partition index description; call get_mm_tree(); analyze the produced SEL_TREE; } 3.1 Construction of partition index description ----------------------------------------------- table->part_info->[sub]part_field_array holds a duplicate-free array of table fields used in [sub]partitioning. From these two arrays we create a description of a single partitioning index: partition_index(partition_fields, [subpartition_fields]). Here KEY_PART::length= field->pack_length_in_rec() KEY_PART::store_length= { calculate it the same way as it is done in open_binary_frm() code } If [sub]partition_fields contains a GEOMETRY field then [sub]partition_fields is not included in the partition index description. The reason is that we can't process geometry intervals (they have different semantics to which our processing logic doesn't apply). We'll do the same for ENUM fields to be on the safe side (which is probably redundant) 3.2 Changes in the code invoked from get_mm_tree() -------------------------------------------------- The following changes are required: 3.2.1 Don't call field->optimize_range() when doing partition pruning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Currently get_mm_leaf() calls field->optimize_range() to determine if the index supports scans on non-singlepoint intervals (currently all table engines support it except HEAP tables with HASH indexes). Field::optimize_range() calls handler::index_flags(HA_READ_RANGE) so we must not make field->optimize_range() calls when doing partition pruning.*perhaps*, when doing partition pruning, we should instead of field->optimize_range() call make call to the following function: bool optimize_range_for_partitioning_index() { if (this keypart of partitioning index refers to a type of partitioning that doesn't allow partition pruning on non-singlepoint intervals) { /* example when we get here: we try to construct a "t.field < const" interval for "PARTITION BY HASH(somefunc(t.field))" */ return FALSE; } else return TRUE; } However, I'm not sure if this is worth doing: * We'll not be able to do partition pruning for cases like "t.key >= 10 AND t.key <= 10", where several non-singlepoint intervals form one singlepoint interval. * I can't immediately with 100% assurance tell that using the above function provides 100% assurance that no singlepoint ranges will be generated (There is a second filtering step in records_in_range() implementation, ha_heap::records_in_range() does make the check if it is invoked for singlepoint interval. The presense of that check implies that range analysis code can't guarantee that all constructed intervals will be singlepoint? Considering the amount time needed to fully figure this out, we'll do this: when range analysis is invoked from prune_partitions(), let get_mm_leaf() assume that field->optimize_range() has returned TRUE. The "analyze the produced SEL_TREE" step in prune_partitions() will check if the produced intervals are singlepoint when that is required. In the worst case this will create only a relatively minor inefficiency. 3.2.2 Allow construction of "index merge" tree for single index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We've recently discovered this property of the range analyzer: it can create "index merge" trees for one, single index. This is only done when it is not possible to create a single range tree at all. Here are simple examples: "keypart1=c1 OR keypart2=c2", "keypart1c2" Here the range optimizer will create an index_merge tree with two subtrees. This did not cause invalid index_merge plans to be generated because such single-index index_merge trees were discarded at later stage - one of the trees doesn't cover first keypart and is not suitable for record retrieval. For partition pruning case, this property provides some advantage: it allows to perform partition pruning for a broader set of conditions. Consider an example: let the table be partitioned by PARTITION BY RANGE(t.a) SUBPARTITION BY HASH(t.b), and let the condition be "t1.a = 1 OR t1.b = 2". Here the range analyzer will construct index_merge of two trees, we'll be able to peform partition pruning for both and produce a union of used partitions. Code-wise this translates into the following: We'll introduce a range optimizer parameter that will control the creation of index merge trees where the merged trees do not represent valid index scans. The parameter will be set to FALSE (don't create) when the range analyzer is invoked from range/index_merge optimizer. The parameter will be set to TRUE (do create) when the range analyzer is invoked from prune_partitions(). The code for "analyze the produced SEL_TREE" step in prune_partitions() will be able to handle all 3 cases: 1. The produced SEL_TREE has one SEL_ARG* graph that represents a list of intervals. The analysis of SEL_TREE objects of this type is described in the next section. Example: "part_field1 = c1 OR part_field1 = c2" 2. The produced SEL_TREE is an "index merge" SEL_TREE, i.e. it represents "tree1 OR tree2 OR ... treeN" where tree{i} is a tree like in #1. In this case we'll produce a union of partitions used by each of the trees. Example: "part_field1 = c1 OR subpart_field1 = c2" 3. The produced SEL_TREE is a list of "index merge" SEL_TREEs, i.e. it represents "merge_tree1 AND merge_tree2 ... AND merge_treeN", where each merge_tree{i} is of type described in #2. In this case we'll produce a set of used partitions for each of merge_tree{i} and compute an intersection. Example: "(part_field1 < c1 OR subpart_field1 = c2) AND (part_field1 > c3 OR subpart_field1 = c4)" 3.3 Analysis of the produced SEL_TREE ------------------------------------- See the previous section for the description of analysis of SEL_TREE objects that represents merges of several scans. Now we'll describe analysis of SEL_TREE object that represents a single list of intervals (referred to as case #1 in the previous section). The analysis function will be modeled after the check_quick_keys() function that does a similar job for the range optimizer. A SEL_ARG graph has 2 "dimensions": left/right and next_key_part. The function will traverse the dimensions via self-recursion. The following picture demonstrates a possible SEL_ARG graph (with lots of edges removed for simplicity). The up-down connections are connections via SEL_ARG::left and SEL_ARG::right. A horizontal connection to the right is the SEL_ARG::next_key_part connection. (start) | $ | Partitioning keyparts $ subpartitioning keyparts | $ | ... ... $ | | | $ | +---------+ +---------+ $ +-----------+ +-----------+ \-| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5| (**) +---------+ +---------+ $ +-----------+ +-----------+ | | $ | | | | $ | +-----------+ | | $ | | subpar2=c6| | | $ | +-----------+ | | $ | | | $ +-----------+ +-----------+ | | $ | subpar1=c4|--| subpar2=c8| | | $ +-----------+ +-----------+ | | $ | | $ | +---------+ $ +-----------+ | | par2=c9 |-----| subpar1=c9| | +---------+ $ +-----------+ | | $ | ... $ | $ | $ +---------+ $ +------------+ | par1>c2 |------------------| subpar1=c10|--... (***) +---------+ $ +------------+ | $ ... $ $ (c{i} are marked arbitrarily and dont convey any meaning) The traversal of left/right pointers will be performed via head/tail recursion (Mikael has pointed out that we don't actually need recursion here, we can use a loop instead and remove the possibility of stack overrun when processing very long conditions like "key=c1 OR key=c2 OR ... key1M= c1M". For now, we'll stick with recursion for the sake of code simplicity. The same kind of recursion is done in check_quick_keys() and there were no complaints this far). The traversal of SEL_ARG::next_key_part axis will be performed via recursion as well. From the above picture it is apparent that this traversal will proceed according to the following scenario: 1. Get the constraints on partitioning fields. Depending on what SEL_ARG graph we've got, we may get constraints for all partitioning fields (e.g. see line marked with (**) on the picture), some of them (see line (***) on the picture), or none at all (not displayed on the pic.) If we get suitable (we'll describe below what exactly is suitable) constraints for all partitioning fields, (this can be checked when we cross the $-signs line on the picture) we perform the "partitions-in-interval analysis" - obtain a set of partitions that may contain records that match the partitioning fields constraints, and proceed. If we don't get suitable constraints for all partitioning fields, we set the set of used partitions to be all-table-partitions, and proceed anyway. 2. Get the constraints on subpartitioning fields. If we get suitable constraints for all subpartitioning fields (this can be fully checked when we reach the right ends of the horizontal chains in the picture), we perform "subpartitions-in-interval analysis" - obtain a set of subpartitions that may contain records that match the subpartitioning fields constraints. Note that this set of subpartitions refers to every partition, i.e. the result of "subpartitions-in-interval analysis" can be expressed like "within each partition, subpartition X must be used". If we don't get suitable constraints for all subpartitioning fields, we set the set of used subpartitions to be all-subpartitions. 3. Having completed steps 1 and 2, we have * a set of used partitions, * a set of used subpartitions. Now we can do this: for each used partition P for each used subpartition SP mark P_SP as used;
The SEL_TREE::left/right and SEL_TREE::next_key_part recursion causes the steps 1-2-3 to be done for every possible (*start)->next_key_part->... ->next_key_part path in the SEL_ARG graph. 3.3.1 A natural property: no redundant "partitions-in-interval analysis" calls ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that the "partitions-in-range analysis" will be done the least possible number of times. An example for the above picture: for the two paths path1: +---------+ +---------+ $ +-----------+ +-----------+ -| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5| +---------+ +---------+ $ +-----------+ +-----------+ and path2: +---------+ +---------+ $ +-----------+ +-----------+ -| par1=c1 |--| par2=c2 |-----| subpar1=c4|--| subpar2=c8| +---------+ +---------+ $ +-----------+ +-----------+ we cross the $-line only once, and therefore "partitions-in-range analysis" will be performed only once. This will be natural optimization as we're using recursion. 3.3.2 Do have redundant "subpartitions-in-interval analysis" calls though ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "subpartitions in range analysis" will be performed more times then strictly necessary. A SEL_ARG graph may be constructed in a way that the restrictions on subpartitions are shared. For example, in the picture the "par2=c9" could have next_key_part pointing to "subpar1=c3". In that case we'll perform subpartitioning analysis for "subpar1=c3 AND subpar2=c5" two times. We could use some tricks to avoid that but in my opinion that is not worth doing at the current level. 3.4 [Sub]partitions-in-interval analysis ---------------------------------------- This section describes the [sub]partitioning interval analysis. Task setting: We've got constraints on all fields used in [sub]partitioning, i.e. we've got an interval. Goal: Find the partitions that may contain records that satisfy this set of constraints. 3.4.1 The generic case ~~~~~~~~~~~~~~~~~~~~~~ For any type of [sub]partitioning, the following analysis may be performed: if (the interval is a singlepoint interval) { /* Ok, the interval has form "field1=const1 AND ... AND fieldN=constN" */ Save the field constants into the record buffer; /* Now find into which partition the rows with "(field1, ... fieldN) = (const1, ..., constN)" will go. */ partition_id= get_part_partition_id(); // or get_subpartition_id() } else { /* Can't infer anything */ } Note that the above code uses only parts of partitioning interface that already intentionally exposed as public (partition_info::get_part_partition_id, partition_info::get_subpartition_id). 3.4.2 The special case: use Partitioning interval analysis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For some types of partitioning it is possible to analyse a broader set of restrictions then specified in section 3.4.1. To perform such analysis, we introduce a concept of "Partitioning interval analysis": "Partitition interval analysis" is applicable for any type of [sub]partitioning done on the value of single field fieldX. Task setting: << Given an interval "const1 <=? fieldX <=? const2" (I) find a set of partitions that may contain records that have value of fieldX contained within the given interval. >> The implementation of Partitioning Interval Analysis varies depending on partitioning type and partitioning function, and so it will be incapsulated within sql_partition.cc. The interface for Partitioning Interval Analysis will be as follows: /* A type of function that does Partitioning Interval Analysis SYNOPSIS get_partitions_in_range_iterator() interval IN Description of interval over partitioning field. part_iter OUT Initialized iterator that allows to enumerate partitions that cover the given interval. */ ... In partition_info class, we'll add two function pointers: class partition_info { public: ... /* Pointer to function that performs Partitioning Interval Analysis for partitioning, or NULL if analysis is not possible. */ get_partitions_in_range_iterator get_part_iter_for_interval; /* Same as above but for subpartitioning */ get_partitions_in_range_iterator get_subpart_iter_for_interval; ... }; . Following sections describe the implementations. 3.4.2.1 Partitioning Interval analysis - interval mapping ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Interval mapping can be used when partitioning is done using PARTITION BY
unary_increasing_func(fieldX) Here for the interval (I) "const1 CMP1 t.fieldX CMP2 const2" (I) we can obtain corresponding interval (II) part_func(const1) CMP1X part_func(t.fieldX) CMP2X part_func(const1) (II) by "mappping" the interval edges. In both intervals, CMPxx are '<' or '<='. The conversion rules for comparision operators are as follows: For strictly increasing functions, CMP1X == CMP1, CMP2X == CMP2. For non-strictly increasing functions, CMP1X == CMP2X '<='. (example: t.fieldX < '2005-12-29' maps to YEAR(t.fieldX) <= YEAR('2005-12-29' ) Having obtained the edges of interval (II), we can get the corresponding partition ids: 3.4.2.1.1 Interval mapping: RANGE partitioning implementation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We'll use the partition_info::range_int_array. This is an ordered array where range_int_array[N] describes the interval of partition nuber N: (range_int_array[i-1] <= part_func(rowX) < range_int_array[i]) => rowX goes to partition #i. Having performed the interval mapping, we can obtain (with two binary searches) first and last array elements that have non-empty intersection with interval (II). If these two elements have indexes idx1 and idx2, then that means that partitions with partition id within [idx1, idx2] range are the partitions we'll need to use. 3.4.2.1.2 Interval mapping: LIST partitioning implementation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ partition_info::list_array is an array of pairs, ordered by the value field. Each array element describes a singlepoint interval: part_func(rowX) == list_array[i].value => rowX goes to partition #i. Like in RANGE partitioning, we can find two edge indexes idx1, idx2. Then the set of used partitions can be obtained by walking from idx1 to idx2 and collecting partition ids from list_array[i].partition_id 3.4.2.1.3 Interval mapping: Detection of monotonically increasing functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To check if Interval Mapping is applicable, we'll need to be able to check if given Item* tree represents an unary monotonically increasing function. This will be achieved by doing the following: Add an enum: typedef enum monotonicity_info { NON_MONOTONIC, /* none of the below holds */ MONOTONIC_INCREASING, /* F() is unary and (x < y) => (F(x) <= F(y)) */ MONOTONIC_STRICT_INCREASING /* F() is unary and (x < y) => (F(x) < F(y)) */ } enum_monotonicity_info; Add new virtual method in class Item: virtual enum_monotonicity_info Item::get_monotonicity_info() const { return NON_MONOTONIC; } Add non-default implementations for * Item_func_year (that function is MONOTONIC_INCREASING). * Item_func_to_days (that function is MONOTONIC_INCREASING for DATETIME values and is MONOTONIC_STRICT_INCREASING for DATE values) * Item_field (assume it to be always MONOTONIC_STRICT_INCREASING, see concerns in section #4) 3.4.2.2 Partitioning Interval analysis - interval walking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This implementation is applicable for any type of partitioning when the partitioning field has integer type. The idea is that interval (I): "const1 <=? t.fieldX <=? const2" can be replaced with a sequence of singlepoint intervals: t.fieldX = const1 OR t.fieldX = const1 + 1 OR t.fieldX = const1 + 2 OR ... OR t.fieldX = const2 For each of the singlepoint intervals, we can find the corresponding partition (see section 3.4.1). So, the interval analysis is performed by "walking" from const1 to const2 and collecting partition ids. It is apparent that this method will pay off if the "walk" is short enough. For now, we've adopted the fillowing definition of "short enough": * number of values to walk must be less then number of [sub]partitions, and * it must be less then some predefined constant MAX_WALK_INTERVAL. There can be cases when both interval walking and interval mapping are applicable. It is obvious that interval walking is less CPU-intensive, so we'll use interval walking only if interval mapping is not applicable. 4. Other notes -------------- * We assume that BUG#15447 has been fixed and RANGE partitioning code assumes that "NULL < -inf" * We also ignore the problems with BIGINT UNSIGNED fields, as they are not handled correctly by partitioning code itself (see BUG#16002)
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.