WL#3352: Extending Partitioning Types and new partition function

Affects: Server-5.5   —   Status: Complete   —   Priority: Medium

Partitioning in MySQL 5.1 is limited to functions returning integers.
To perform partition pruning it's only possible to define partition
pruning on ranges if the function is monotonically increasing which
is true for TO_DAYS, YEAR and YEAR_TO_DATE. There are no useful
partitioning variants based on character fields other than KEY
partitioning and possibly some hash partitioning variant.

This worklog makes it possible to make useful partitioning on character
fields. It also enables partition pruning to be useful on much more
generic ranges. Thus we can with this new partitioning scheme partition
over a,b and c and prune partitions away even with a WHERE clause that
states a = 1 AND b < 1. This was impossible with the partition pruning
introduced in MySQL 5.1.

This new variant partitions on a list of columns instead of on a
function. To distinguish this from the current variant and ensure
that there is no confusion to users what the partitioning introduced
is, we reuse the keyword COLUMNS to the partition definition.

The new partitioning scheme is only defined for RANGE and LIST partitions.

The worklog also adds a small extension to the current function-based
partitioning by adding the function TO_SECONDS since the current allowed
partition pruning functions can only be on per day level, so it's not
currently possible to partition based on parts of a day.

The major improvements of this worklog is thus the possibility to define
partitioning on character fields and a much more advanced partition
pruning possibility.
The following syntax is to be implemented.
CREATE TABLE t1 (a int, b int, c int)
(PARTITION p0 VALUES LESS THAN ("a","b","c"));


CREATE TABLE t1 (a int, b int, c int)
(PARTITION p0 VALUES IN (("a1","b1,"c1"),("a2","b2","c2")));

Also the same syntax will be available for ALTER TABLE
in the same manner.

Important notes here are that:
1) PARTITION BY RANGE COLUMNS (a,b,c) is really a list of
columns, no functions on columns are allowed in the column list.
2) It is possible to have a column list of up to 16 fields.
3) VALUES LESS THAN will list the right endpoint of multi-field
range. This endpoint can contain real values, and MAXVALUE.
4) VALUES IN will list one or several multi-field points.
These points can contain real values and NULL values.
5) Each multi-field point consists of one value for each field
in the column list defined in the PARTITION BY RANGE/LIST definition.

For RANGE partitions in the description of each partition where a value
for the boundary is provided one can use a value and MAXVALUE which is
a value bigger than all values of the data type of the partition field.

For LIST partitions it is not possible to provide the value MAXVALUE but
instead NULL is an allowed value.

The patch does also in addition add a simple additional MySQL
function TO_SECONDS which converts a date to a number of seconds
since some day. This is very useful in the old type of
partitioning to handle cases where one needs to create new
partitions more often than once a day.

Partition pruning for the new COLUMNS type is also extended to handle
also multiple fields of the partitioning. This means that a condition like
WHERE a = "a2" AND b < "b2" will take into account both the a and the b
parts. In previous versions of the partitioning it was necessary to provide
all partition field valus to get a proper partition pruning since only
functions were possible to partition on. A function with an undefined field
value won't give a proper result and thus all fields previously had to have
a value.
A new struct part_column_list_val will be used to represent
the value of the column constant which can be either:
MAXVALUE or an expression for RANGE partitioning and
NULL or an expression for LIST partitioning.

The prune_partitions function have been extensively rewritten.
It could previously only handle ranges for single-field partition
pruning and single points. The new partition pruning can handle
partition pruning for ranges in the same manner as for indexes.
Thus if partitioning is done on field a,b and c. Then all of the
below WHERE clauses will be helpful in partition pruning.

1) WHERE a < 1
2) WHERE a = 1 AND b < 10
3) WHERE a = 1 AND b = 10 AND c < 10
4) WHERE a = 1 AND b <= 10 AND c <= 10

The partition pruning will still also have to take subpartitioning
into account.

MAXVALUE needs to be removed as a possible name in the key_sp list in
the MySQL parser.

A number of new partitioning tests are developed to handle this new
feature and also QA will develop a new set of tests that will also
improve testing of previous partitioning versions.

A new function to_seconds is introduced in the same manner as the
to_days function. This function will be possible to do partition
pruning on as well.

store_min_key and store_max_key functions in opt_range.cc was reused.
However since the partitioning index contains a partition part and a
subpartition part it was necessary to extend these methods with a last_part
variable to ensure we can stop at the partition part if needed.

A new variable ignore_part_fields is used to skip partitioning fields when
a partition range have already been found and when there can still be
subpartition ranges to check out.

Since the find_used_partitions is a recursive call, it is necessary to check
that we don't overrun the stack. If this happens we'll interrupt and say
that no partition pruning was possible.

A new function in the get_part_iter_for_interval was introduced to handle
the new column list partitioning.

check_range_constants and check_list_constants was essentially split
in two parts, one part that handles column list partitioning and one
part that handles the old partitioning variant.

A new function fix_column_value_functions was introduced to fix the
column list fields.

A new function compare_column_values was introduced to compare lists of
column values to each other to ensure ranges are increasing and that
points in list partitioning isn't the same.

The check_partition_info is called from ALTER TABLE when we're adding
or reorganizing partitions and at CREATE TABLE. The interface was
previously not so clear here, so we changed the name of the variable
specifying where we called from and reversing the TRUE and FALSE thus.
We also used this variable to check on range and list constants since
the variable was not really the right variable to use here.

We improved the parser by often defining part_info variables instead of
just a lex variable. Also by introducing a new function add_column_value
that does what a lot of parser statements did by replicating code
previously. Also similarly a function set_part_expr was introduced to
avoid code duplication.

A number of variables for total number of charset fields was removed since
they weren't really used. this meant some code could be removed as well.

A number of new error messages was introduced to describe errors in relation
to the new column list partitioning.

The new version was shortly described in the comments.

The partitioning functions was slightly reorganised. By introducing a new
function get_partition_with_sub we could avoid having the number of
functions explode by getting all combinations of partitioning and
subpartitioning. Thus although introducing a number of new partitioning
functions we still managed to decrease the total number of functions to
suppport calculations of partition id's for points.

Two new functions cmp_rec_and_tuple and cmp_rec_and_tuple_prune was
introduced to support the new get_part_iter_for_interval function
and the new get_partition_id functions.

Through the removal of some unnecessary variables on charset partitioning
functions it was also possible to simplify charset partition id handling.

An unnecessary function call member was removed from fix_fields_part_func.

A new function to also write the new partitioning syntax was introduced
as part of the generate_partition_syntax framework. Slight reorganisation
of this function was also performed in light of the new function.

Slight reorganisation of the ALTER TABLE code was needed to handle the
new partitioning variants.

A new function store_tuple_to_record was introduced to write a record from
key values in opt_range format to the handler interface record format.

The get_part_iter_for_interval interface introduced three new variables to
be able to keep track of length of the fields in the representation of the
opt_range format.

PARTITION_METHOD was extended in size since RANGE COLUMN_LIST doesn't fit
in 12 bytes.

The parser was updated to handle the new syntax and actually managed to
decrease the number of shift/reduce conflicts from 169 to 168.

Detailed description of find_used_partitions

The heart of the prune_partitions code is the find_used_partitions code.
This code have been extensively modified as part of WL#3352 since we now
can prune on much more elaborate ranges.

First a basic explanation of the workings of find_used_partitions which
was there also before WL#3352 was implemented.

First a "mock" index is created consisting of the union of all partition
fields and all subpartition fields. This means that if we partition on the
fields a,b,c and subpartition on c,d we'll create a "mock" index on
a,b,c,c,d. Thus subpartitioning fields comes after partitioning fields and
if we have the same field in partitioning and in subpartitioning we will
define it in both in the correct place.

Using this "mock" the prune_partitions code calls get_mm_tree. This function
creates a key tree.

The key_tree is characterised by
1) The first key tree object is always on the first field of the index.
2) A next_key_part always points to an object describing the next field
in the index.
3) All objects connected by next_key_part pointers are logically connected
using AND. Thus together they represent a range.
4) Each object can also point to a left object and a right object. These
both points to objects describing the same field that are logically connected
using OR. Thus they represent another range.
5) Each object describes an upper bound and a lower bound. Thus in each object
there is a field value for min and max, there is a min flag and a max flag.

Some examples:
The condition a < 3 is represented as:
The min value is NULL, the min flag is NEAR_MIN which means that NULL isn't
part of the range (NULL values by definition isn't part of any range in a
WHERE clause). The max value is 3 and the flag is NEAR_MAX which again means
that the 3 isn't included in the range.

The condition a >= 4 is represented as:
The min value is 4 and the flag is 0 to indicate that the 4 is included in the
range. The max value is NULL and the flag is NO_MAX_RANGE indicating that there
is no upper bound.

Assume we have the condition:
(a = 1 AND b < 2 AND c <= 2)
In this case the range is built up as:
Min range:
a = 1 where both min value is 1 and min flag isn't set on a and set to
Max range:
a = 1, b = 2 where where max value is 1 on a and no flag set on a, b max value
is set to 2 and flag is set to NEAR_MAX.

The flag NO_MIN_RANGE indicates that no more fields are needed for the
lower end of the range and NEAR_MAX on the b field indicates that no more
fields after b are interesting for the upper end of the range. Thus the
condition on c can be ignored here.

In a normal index it is possible to jump out as soon as we find something
that makes it impossible to extend the range. However in our case we need
to proceed to the subpartitioning fields. So what we do then is that we
set a special ignore_part_fields variable when we reached the end of a
condition, this variable is reset when we return from evaluating the fields
after following the next_key_part on the object where we found the end.

In the example the end will be found already at the a object, the b max
limit will be set in executing the a object by calling store_max_key
which will recursively set the range in cases like this.

The key tree for the condition:
(a = 1 AND (b = 2 OR b = 4)) AND (c = 3 OR c = 5) AND (d = 4 OR d = 6)
could look like this:

                                                        |d (5) SR2 |
                                                        |min=max=6 |
                            ------------  ------------  ------------
                            |c (3) PR1 |  |c  (4)    |  |d (6) SR1 |
                            |min=max=5 |->|min=max=5 |  |min=max=4 |
                             ------------  ------------  -----------
                                 |                      ------------
                                 |                      |d (9) SR3 |
                                 |                      |min=max=6 |
                                 |                      ------------
                                 |                           |
              ------------  ------------  ------------  ------------
              |b  (2)    |  |c (7) PR2 |  |c  (8)    |  |d (10) SR4|
              |min=max=4 |->|min=max=3 |->|min=max=3 |  |min=max=4 |
              ------------  ------------  ------------  ------------
------------  ------------  -----------|  ------------  ------------
|a  (1)    |  |b  (11)   |  |c (12) PR3|  |c  (13)   |  |d (14) SR5|
|min=max=1 |->|min=max=2 |->|min=max=3 |->|min_max=3 |->|min=max=4 |
------------  ------------  ------------  ------------  ------------
                                 |                           |
                                 |                      ------------
                                 |                      |d (15) SR6|
                                 |                      |min=max=6 |
                                 |                      ------------
                            ------------  ------------  ------------
                            |c (16) PR4|  |c  (17)   |  |d (18) SR7|
                            |min=max=5 |->|min=max=5 |  |min=max=4 |
                             ------------  ------------  -----------
                                                        |d (19) SR8|
                                                        |min=max=6 |

We have put in the order the key tree objects are executed in
find_used_partitions. find_used_partitions is a recursive function. This
means that we call it once for each key tree object. We always follow the
left pointer immediately before doing anything else, at the end of the
function we'll follow the right pointer and after checking a number of
things and recording range values we call it for the next key tree object
we get by following the next_key_part pointer.

Left pointers are represented above as up links, right pointers as down
links and horisontal pointers to the right represents next_key_part

We evaluate the partition condition for each range we find in the key
tree. In the tree above we find 4 different partition ranges which are:
1) a = 1 AND b = 4 AND c = 5
2) a = 1 AND b = 4 AND c = 3
3) a = 1 AND b = 2 AND c = 3
4) a = 1 AND b = 2 AND c = 5

For each of these ranges we'll call get_part_iter_for_interval which
provides an iterator over all partitions used by this range, the list
of partitions used by the range is calculated by first getting the start
partition for the range and then retrieving the end partition for the

We will also similarly evaluate the subpartitioning ranges. In the
example above we get 8 different subpartitioning ranges out of which
a number of them are duplicates:
1) c = 5 AND d = 6 PR1
2) c = 5 AND d = 4 PR1
3) c = 3 AND d = 6 PR2
4) c = 3 AND d = 4 PR2
5) c = 3 AND d = 4 PR3
6) c = 3 AND d = 6 PR3
7) c = 5 AND d = 4 PR4
8) c = 5 AND d = 6 PR4

That they are duplicates doesn't matter since the subpartitions are
evaluated in the context of the partition range, thus the subpartitions
are only set for the partitions which the current partition range covers.

Given that find_used_partitions is a recursive function we need to have
mechanisms to save values and restore them later. When calling
find_used_partitions using the next_key_part pointer we save the context
using local variables. When entering find_used_partitions after returning
from executing the left part of the tree, we will use stack to save some
values which will be restored immediately before executing the right part
of the tree.