# WL#4065: Support extended partition pruning for partitioning by COLUMNS

Affects: Server-6.1
—
Status: Assigned
—
Priority: Medium

Partitioning in MySQL 5.1 is limited to functions returning integers. To perform partition pruning it's only possible to define partition pruning on ranges if the function is monotonically increasing which is true for TO_DAYS, YEAR and YEAR_TO_DATE. There are no useful partitioning variants based on character fields other than KEY partitioning and possibly some hash partitioning variant. This worklog makes it possible to make useful partitioning on character fields. It also enables partition pruning to be useful on much more generic ranges. Thus we can with this new partitioning scheme partition over a,b and c and prune partitions away even with a WHERE clause that states a = 1 AND b < 1. This was impossible with the partition pruning introduced in MySQL 5.1. This new variant partitions on a list of columns instead of on a function. To distinguish this from the current variant and ensure that there is no confusion to users what the partitioning introduced is, we add the keyword COLUMN_LIST to the partition definition. The new partitioning scheme is only defined for RANGE and LIST partitions. The worklog also adds a small extension to the current function-based partitioning by adding the function TO_SECONDS since the current allowed partition pruning functions can only be on per day level, so it's not currently possible to partition based on parts of a day. The major improvements of this worklog is thus the possibility to define partitioning on character fields and a much more advanced partition pruning possibility.

<contents> 1. Work overview 2. On copying the approach used in the range optimizer 3. Partition pruning for multi-column non-ranges 3.1 Demonstration of the idea on an example 3.2 General description of handling non-ranges 3.3 Connection to subpartition pruning </contents> 1. Work overview ---------------- COLUMN_LIST comparisons have the same meaning as index tuple comparisons, so our task is to make partition pruning module handle multiple-keypart SEL_ARG graphs, without the limitation that we handle only graphs equivalent to "col1 = const1 AND col2=const2 AND ...". 2. On copying the approach used in the range optimizer ------------------------------------------------------ (for the sake of simplicity this section will ignore subpartition pruning) The most straightforward way of doing partition pruning for COLUMN_LIST is to just do what the range optimizer does for multi-part keys. However, that would be not the best solution as the range optimizer has a property (or rather limitiation) that we don't need in partition pruning. We'll show the property using an example: consider a WHERE clause (and its corresponding SEL_ARG tree): WHERE t.keypart1 < 'foo' AND t.keypart2=1 It specifies a possibly infinite set of intervals S = { (keypart1, keypart2) = (x, 1) | x < 'foo' } The range optimizer is not able produce E(#matching records) for this possibly infinite set of intervals, as that would require doing many index lookups into the index to see index to see which intervals we will actually need to scan (which is too expensive for optimization). Therefore range optimizer computes an "envelope" that encloses all of the intervals: S_envelope = " (keypart1, keypart2) <= ('foo', 1) " S_envelope includes regions that are not the original set S, therefore the move from S to S_envelope is bad as we start examining records that would not match the original WHERE condition. In partition pruning, instead of a [big] table index we have a [relatively small] list of partition definitions, so we can process the set S directly, without moving to S_envelope. 3. Partition pruning for multi-column non-ranges ------------------------------------------------ 3.1 Demonstration of the idea on an example ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose partitioning is PARTITION BY RANGE(COLUMN_LIST(kp1, kp2, ...)) ( PARTITION p0 VALUES LESS THAN (COLUMN_LIST(1,'abc')), PARTITION p1 VALUES LESS THAN (COLUMN_LIST(1,'cde')), PARTITION p2 VALUES LESS THAN (COLUMN_LIST(2,'zzz')), PARTITION p3 VALUES LESS THAN (COLUMN_LIST(3,'aaa')), ... ) and suppose we have a non-range (will use this term because I can't come up with anything better) kp1<5 AND kp2='foo' It is apparent that partitions p1 and p3 cannot contain any matching records. Another interesting observation that we're only able to make this observation about partition p3 because we know that there are no possible values between 2 and 3. This technique is also applicable when there is no SEL_ARG for kp1 at all, or when we have a tree (or, more precisely a next_key_part path) which lacks SEL_ARGs for some key part(s). 3.2 General description of handling non-ranges ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We will now give a general description of the approach shown in the previous section. Suppose we have partitioning on COLUMN_LIST(kp1, ... kpN) and a SEL_ARG tree: ... ... | | SEL_ARG(kp1) -- X -- SEL_ARG(kp2) | | ... ... Then the tree can be processed as follows: 1. Start from the first keypart (keypart=0) and proceed right until the walked chain of SEL_ARGs represents a proper equality interval keypart0=c1 AND ... AND keypart_k=c_k (PrefixInterval) let's denote the length of the prefix as const_prefix_len, and the length of the "hole" in keypart numbers as hole_len 2. Use partitioning range analyzer and obtain a set of partitions PS1 that may contain records from PrefixInterval. 3. For each partition in PS1, check if partition can contain records that match the remaining SEL_ARG chain, and if yes, mark it as used. The check is performed as follows: if (at least one of the missed keyparts is of a non-enumerable type) { if (prefix(prefix_len + hole_len, left_partition_endpoint) != prefix(prefix_len + hole_len, right_partition_endpoint)) { // The partition may contain rows ((tuple)PrefixInterval, *) mark the partition as used; } else { // the prefix is the same, check if the partition matches the column if (interval_intersection( suffix(left_partition_endpoint, part_cols - (prefix_len + hole_len)), suffix(left_partition_endpoint, part_cols - (prefix_len + hole_len)) ) != EMPTY_INTERVAL) { mark the partition as used; } } } else { /* The same as above, but we'll need to also handle the special case of partition having edges with adjacent prefix values: PARTITION ... VALUES LESS THAN COLUMN_LIST(prefixval, suffix1), PARTITION pX VALUES LESS THAN COLUMN_LIST(prefixval+1, suffix2) (if prefix has multiple columns, then "+1" applies to the last column) where we can conclude that partition pX may not contain any records with suffix columns with some "suf1 < suffix_cols < suf2" range. */ } 3.3 Connection to subpartition pruning -------------------------------------- The scheme described in the previous section shows how to obtain the set of used partitions. The scheme does not compute interval envelopes (like range optimizer does), so there is no problem with different ranges in the envelope having different intervals on subpartitioning columns). We also do not support partitioning by COLUMN_LIST for subpartitioning so no changes are needed for subpartition pruning.

1. Check if interval analyzer function for COLUMN_LIST can handle prefixes (it should, even for the "naive" solution). 2. Let partitions store the information whether their endpoints have identical prefixes (for every prefix length, i.e. we need an array/bitmap). 3. Modify find_used_partitions(): - invoke interval analyzer even if we have a COLUMN_LIST prefix tuple interval - get and remember set of used partitions for the prefix interval (Q: do we need a MAX_KEY_PARTS stack of those?) - then for each partition in the obtained partition set (**) { save left and right endpoint prefixes somewhere; call find_used_partitions() (*) /* so it walks over next-key-part SEL_ARGs and runs analysis */ } - when find_used_partitions() is invoked in situation like (*) * get the parent's left/right endpoints, attach the suffix in the passed SEL_ARG, obtain the "extended range" and check if this extended range has non-empty intersection with the partition (TODO: could we arrange so only the last components of the range endpoints are compared?) The problem is that (**) loop runs number-of-partitions-after-pruning times, and the call (*) will do a loop (using head/tail recursion) over the list of SEL_ARGs that can be quite large as well. 3.1 An alternative approach to one described 3.0: (see the wl4065-r2.odg picture attached to WL entry) The idea is that we don't need to do loop (**). Instead we need to loop through groups of partitions that have rows that belong to some particular value groups of partitions (which is just as bad in the worst case but can be much better). Besides that, we can take advantage of the fact SEL_ARGs in SEL_ARG::next/prev chain form an ordered disjoint list. When analyzing the first SEL_ARG in the chain, we should remember which partition the right edge was in. This partition is the leftmost partition that can have non-empty intersection with the second (and subsequent) SEL_ARGs in the chain. We can use this to restrict the set of partition we'll need to search for 2nd and subsequent SEL_ARGs. This won't prevent the O(#partitions_from_prevous_keyparts * #SEL_ARGs_for_this_keypart) blowup in the general case, but probably could reduce the number of practical cases where this blowup occurs. On possible implementations of "move to next value group" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most straightforward way is to have each partition to have an array of indexes of the first partition within the next value group. * 0-th array element will have the number of partition with different value of (part_col1) * 1-th array element will have the number of partition with different value of (part_col1, part_col2) * .. and so forth. Another, more space-efficient solution: Store a bitmap which indicates which partitions have values that are in a different value from the last value of the previous partition. This way we can walk ahead in big jumps (1024/64=16 step max.) and quickly find the range of partitions within this value group as well find the start/end of the next value group. Additional things to do ----------------------- * Add printout of what is passed to range analyzer interface. * Add printout of used partitions (i.e. of the result of partition pruning).

Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.