WL#3352: Extending Partitioning Types and new partition function
Affects: Server-5.5 — Status: Complete — Priority: Medium
Partitioning in MySQL 5.1 is limited to functions returning integers. To perform partition pruning it's only possible to define partition pruning on ranges if the function is monotonically increasing which is true for TO_DAYS, YEAR and YEAR_TO_DATE. There are no useful partitioning variants based on character fields other than KEY partitioning and possibly some hash partitioning variant. This worklog makes it possible to make useful partitioning on character fields. It also enables partition pruning to be useful on much more generic ranges. Thus we can with this new partitioning scheme partition over a,b and c and prune partitions away even with a WHERE clause that states a = 1 AND b < 1. This was impossible with the partition pruning introduced in MySQL 5.1. This new variant partitions on a list of columns instead of on a function. To distinguish this from the current variant and ensure that there is no confusion to users what the partitioning introduced is, we reuse the keyword COLUMNS to the partition definition. The new partitioning scheme is only defined for RANGE and LIST partitions. The worklog also adds a small extension to the current function-based partitioning by adding the function TO_SECONDS since the current allowed partition pruning functions can only be on per day level, so it's not currently possible to partition based on parts of a day. The major improvements of this worklog is thus the possibility to define partitioning on character fields and a much more advanced partition pruning possibility.
The following syntax is to be implemented. CREATE TABLE t1 (a int, b int, c int) PARTITION BY RANGE COLUMNS (a,b,c) (PARTITION p0 VALUES LESS THAN ("a","b","c")); and CREATE TABLE t1 (a int, b int, c int) PARTITION BY LIST COLUMNS (a,b,c) (PARTITION p0 VALUES IN (("a1","b1,"c1"),("a2","b2","c2"))); Also the same syntax will be available for ALTER TABLE in the same manner. Important notes here are that: 1) PARTITION BY RANGE COLUMNS (a,b,c) is really a list of columns, no functions on columns are allowed in the column list. 2) It is possible to have a column list of up to 16 fields. 3) VALUES LESS THAN will list the right endpoint of multi-field range. This endpoint can contain real values, and MAXVALUE. 4) VALUES IN will list one or several multi-field points. These points can contain real values and NULL values. 5) Each multi-field point consists of one value for each field in the column list defined in the PARTITION BY RANGE/LIST definition. For RANGE partitions in the description of each partition where a value for the boundary is provided one can use a value and MAXVALUE which is a value bigger than all values of the data type of the partition field. For LIST partitions it is not possible to provide the value MAXVALUE but instead NULL is an allowed value. The patch does also in addition add a simple additional MySQL function TO_SECONDS which converts a date to a number of seconds since some day. This is very useful in the old type of partitioning to handle cases where one needs to create new partitions more often than once a day. Partition pruning for the new COLUMNS type is also extended to handle also multiple fields of the partitioning. This means that a condition like WHERE a = "a2" AND b < "b2" will take into account both the a and the b parts. In previous versions of the partitioning it was necessary to provide all partition field valus to get a proper partition pruning since only functions were possible to partition on. A function with an undefined field value won't give a proper result and thus all fields previously had to have a value.
A new struct part_column_list_val will be used to represent the value of the column constant which can be either: MAXVALUE or an expression for RANGE partitioning and NULL or an expression for LIST partitioning. The prune_partitions function have been extensively rewritten. It could previously only handle ranges for single-field partition pruning and single points. The new partition pruning can handle partition pruning for ranges in the same manner as for indexes. Thus if partitioning is done on field a,b and c. Then all of the below WHERE clauses will be helpful in partition pruning. 1) WHERE a < 1 2) WHERE a = 1 AND b < 10 3) WHERE a = 1 AND b = 10 AND c < 10 4) WHERE a = 1 AND b <= 10 AND c <= 10 The partition pruning will still also have to take subpartitioning into account. MAXVALUE needs to be removed as a possible name in the key_sp list in the MySQL parser. A number of new partitioning tests are developed to handle this new feature and also QA will develop a new set of tests that will also improve testing of previous partitioning versions. A new function to_seconds is introduced in the same manner as the to_days function. This function will be possible to do partition pruning on as well. store_min_key and store_max_key functions in opt_range.cc was reused. However since the partitioning index contains a partition part and a subpartition part it was necessary to extend these methods with a last_part variable to ensure we can stop at the partition part if needed. A new variable ignore_part_fields is used to skip partitioning fields when a partition range have already been found and when there can still be subpartition ranges to check out. Since the find_used_partitions is a recursive call, it is necessary to check that we don't overrun the stack. If this happens we'll interrupt and say that no partition pruning was possible. A new function in the get_part_iter_for_interval was introduced to handle the new column list partitioning. check_range_constants and check_list_constants was essentially split in two parts, one part that handles column list partitioning and one part that handles the old partitioning variant. A new function fix_column_value_functions was introduced to fix the column list fields. A new function compare_column_values was introduced to compare lists of column values to each other to ensure ranges are increasing and that points in list partitioning isn't the same. The check_partition_info is called from ALTER TABLE when we're adding or reorganizing partitions and at CREATE TABLE. The interface was previously not so clear here, so we changed the name of the variable specifying where we called from and reversing the TRUE and FALSE thus. We also used this variable to check on range and list constants since the variable was not really the right variable to use here. We improved the parser by often defining part_info variables instead of just a lex variable. Also by introducing a new function add_column_value that does what a lot of parser statements did by replicating code previously. Also similarly a function set_part_expr was introduced to avoid code duplication. A number of variables for total number of charset fields was removed since they weren't really used. this meant some code could be removed as well. A number of new error messages was introduced to describe errors in relation to the new column list partitioning. The new version was shortly described in the comments. The partitioning functions was slightly reorganised. By introducing a new function get_partition_with_sub we could avoid having the number of functions explode by getting all combinations of partitioning and subpartitioning. Thus although introducing a number of new partitioning functions we still managed to decrease the total number of functions to suppport calculations of partition id's for points. Two new functions cmp_rec_and_tuple and cmp_rec_and_tuple_prune was introduced to support the new get_part_iter_for_interval function and the new get_partition_id functions. Through the removal of some unnecessary variables on charset partitioning functions it was also possible to simplify charset partition id handling. An unnecessary function call member was removed from fix_fields_part_func. A new function to also write the new partitioning syntax was introduced as part of the generate_partition_syntax framework. Slight reorganisation of this function was also performed in light of the new function. Slight reorganisation of the ALTER TABLE code was needed to handle the new partitioning variants. A new function store_tuple_to_record was introduced to write a record from key values in opt_range format to the handler interface record format. The get_part_iter_for_interval interface introduced three new variables to be able to keep track of length of the fields in the representation of the opt_range format. PARTITION_METHOD was extended in size since RANGE COLUMN_LIST doesn't fit in 12 bytes. The parser was updated to handle the new syntax and actually managed to decrease the number of shift/reduce conflicts from 169 to 168. Detailed description of find_used_partitions -------------------------------------------- The heart of the prune_partitions code is the find_used_partitions code. This code have been extensively modified as part of WL#3352 since we now can prune on much more elaborate ranges. First a basic explanation of the workings of find_used_partitions which was there also before WL#3352 was implemented. First a "mock" index is created consisting of the union of all partition fields and all subpartition fields. This means that if we partition on the fields a,b,c and subpartition on c,d we'll create a "mock" index on a,b,c,c,d. Thus subpartitioning fields comes after partitioning fields and if we have the same field in partitioning and in subpartitioning we will define it in both in the correct place. Using this "mock" the prune_partitions code calls get_mm_tree. This function creates a key tree. The key_tree is characterised by 1) The first key tree object is always on the first field of the index. 2) A next_key_part always points to an object describing the next field in the index. 3) All objects connected by next_key_part pointers are logically connected using AND. Thus together they represent a range. 4) Each object can also point to a left object and a right object. These both points to objects describing the same field that are logically connected using OR. Thus they represent another range. 5) Each object describes an upper bound and a lower bound. Thus in each object there is a field value for min and max, there is a min flag and a max flag. Some examples: The condition a < 3 is represented as: The min value is NULL, the min flag is NEAR_MIN which means that NULL isn't part of the range (NULL values by definition isn't part of any range in a WHERE clause). The max value is 3 and the flag is NEAR_MAX which again means that the 3 isn't included in the range. The condition a >= 4 is represented as: The min value is 4 and the flag is 0 to indicate that the 4 is included in the range. The max value is NULL and the flag is NO_MAX_RANGE indicating that there is no upper bound. Assume we have the condition: (a = 1 AND b < 2 AND c <= 2) In this case the range is built up as: Min range: a = 1 where both min value is 1 and min flag isn't set on a and set to NO_MIN_RANGE on b. Max range: a = 1, b = 2 where where max value is 1 on a and no flag set on a, b max value is set to 2 and flag is set to NEAR_MAX. The flag NO_MIN_RANGE indicates that no more fields are needed for the lower end of the range and NEAR_MAX on the b field indicates that no more fields after b are interesting for the upper end of the range. Thus the condition on c can be ignored here. In a normal index it is possible to jump out as soon as we find something that makes it impossible to extend the range. However in our case we need to proceed to the subpartitioning fields. So what we do then is that we set a special ignore_part_fields variable when we reached the end of a condition, this variable is reset when we return from evaluating the fields after following the next_key_part on the object where we found the end. In the example the end will be found already at the a object, the b max limit will be set in executing the a object by calling store_max_key which will recursively set the range in cases like this. The key tree for the condition: (a = 1 AND (b = 2 OR b = 4)) AND (c = 3 OR c = 5) AND (d = 4 OR d = 6) could look like this: ------------ |d (5) SR2 | |min=max=6 | ------------ | ------------ ------------ ------------ |c (3) PR1 | |c (4) | |d (6) SR1 | |min=max=5 |->|min=max=5 | |min=max=4 | ------------ ------------ ----------- | | | | | ------------ | |d (9) SR3 | | |min=max=6 | | ------------ | | ------------ ------------ ------------ ------------ |b (2) | |c (7) PR2 | |c (8) | |d (10) SR4| |min=max=4 |->|min=max=3 |->|min=max=3 | |min=max=4 | ------------ ------------ ------------ ------------ | ------------ ------------ -----------| ------------ ------------ |a (1) | |b (11) | |c (12) PR3| |c (13) | |d (14) SR5| |min=max=1 |->|min=max=2 |->|min=max=3 |->|min_max=3 |->|min=max=4 | ------------ ------------ ------------ ------------ ------------ | | | ------------ | |d (15) SR6| | |min=max=6 | | ------------ | ------------ ------------ ------------ |c (16) PR4| |c (17) | |d (18) SR7| |min=max=5 |->|min=max=5 | |min=max=4 | ------------ ------------ ----------- | ------------ |d (19) SR8| |min=max=6 | ------------ We have put in the order the key tree objects are executed in find_used_partitions. find_used_partitions is a recursive function. This means that we call it once for each key tree object. We always follow the left pointer immediately before doing anything else, at the end of the function we'll follow the right pointer and after checking a number of things and recording range values we call it for the next key tree object we get by following the next_key_part pointer. Left pointers are represented above as up links, right pointers as down links and horisontal pointers to the right represents next_key_part pointers. We evaluate the partition condition for each range we find in the key tree. In the tree above we find 4 different partition ranges which are: 1) a = 1 AND b = 4 AND c = 5 2) a = 1 AND b = 4 AND c = 3 3) a = 1 AND b = 2 AND c = 3 4) a = 1 AND b = 2 AND c = 5 For each of these ranges we'll call get_part_iter_for_interval which provides an iterator over all partitions used by this range, the list of partitions used by the range is calculated by first getting the start partition for the range and then retrieving the end partition for the range. We will also similarly evaluate the subpartitioning ranges. In the example above we get 8 different subpartitioning ranges out of which a number of them are duplicates: 1) c = 5 AND d = 6 PR1 2) c = 5 AND d = 4 PR1 3) c = 3 AND d = 6 PR2 4) c = 3 AND d = 4 PR2 5) c = 3 AND d = 4 PR3 6) c = 3 AND d = 6 PR3 7) c = 5 AND d = 4 PR4 8) c = 5 AND d = 6 PR4 That they are duplicates doesn't matter since the subpartitions are evaluated in the context of the partition range, thus the subpartitions are only set for the partitions which the current partition range covers. Given that find_used_partitions is a recursive function we need to have mechanisms to save values and restore them later. When calling find_used_partitions using the next_key_part pointer we save the context using local variables. When entering find_used_partitions after returning from executing the left part of the tree, we will use stack to save some values which will be restored immediately before executing the right part of the tree.
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.