WL#3929: PSE API Index Condition and Aggregate Pushdown

Affects: Server-5.2 — Status: Un-Assigned

Description
High Level Architecture
Low Level Design

This replaces WL#3925 and WL#3924 

The specific goal of this enhancement is to add support for Nitro indexes and
Nitro index based aggregation accelration functions. The general goal is to
extend the MySQL PSE API to support any index type by:
1) pushing down the cost calculations and 
2) pushing down the execution of accesses to such indexes to the storage engine. 

While the primary goal is to push-down the index conditions in support of the
push-down of aggregation functions, the index condition push-down should also
work when no aggregates are present. 

General overview of proposed solution is to:
1) If the HA_PUSH_DOWN_INDEXS flag is set then replace the current 1 or more
calls to records_in_range per index with a single call to a new (or overloaded
records_in_range) function that returns both a record count and and I/O count.
The select list (including aggregates) will also be pushed down and the
function will also "mark" which of the items in the select list it can return
during its scan.
2) The MySQL optimizer will choose the “best” index based on the returned values.
3) The access to the index will be completely pushed down to the storage engine
via a new function (?index_read_push?) which will simply push down the
conditions and read index records via index_next. The index will be scaned just
like a range scan does now. In the case where there are pushed down select items
(i.e. aggregates) it will behave like a covering index containing ALL OF THE
VALUES which returns one row per group by value set. 

Push Down Aggregation:

Don’t ask me how they do it, it’s their big secret, but Nitro can determine the
value of an aggregate function such as SUM, AVG, STDDEV, (not just MIN, Max, and
COUNT) in 2 I/Os or less even on tables with billions of rows. This is a big
feature of their storage engine and its utilization is critical to the success
of their storage engine. 
In order to use this functionality an index must be defined with the aggregate
(using our new index comments.) These accelerated aggregates are then maintained
along with the index. The accelerated aggregates need to be accessed via the
index. Obviously, since the calculation of the aggregate is pushed down to the
storage engine it is a prerequisite that all of the conditions on that index
also be passed down. There are other prerequisites of the queries where the
aggregate(s) can be pushed down including:
	a) All conditions must be on fields in the index
	b) All of the group by columns must be on columns in the index. 
	c) No OR conditions are left after the optimizer rewrite phase.
	d) No joins (there are cases where a join would work, but Nitro is not
interested in solving that problem at this time) NOTE also that the "no joins"
restriction only applies to the push down aggregates, not to the push down indexes!

The pushed down aggregation will be including in the calculation of the cost
(number of records and number of I/Os to read those records) as will the access
of “hitch-hiker” columns that may be part of the non-indexed columns of an index.  

Support for Skip Scanning indexes:

As a side effect the general push-down of index conditions will also allow MySQL
to support Nitro’s skip-scanning indexes for both general query use and with the
push down aggregates. Unlike the aggregates, index push down should work with
join conditions and in any other case a range scan should work. Any place an
index range scan or read exact will work the index push down should work. This
includes index merge, etc.

Functionality needed to support Nitro indexes and Nitro Aggregates in indexes:

1) Add a new engine flag HA_PUSH_DOWN_INDEXES  

2) Add optimizer support for pushing down the calculation of the number of
records and the number of I/Os to the storage engine via a generic pushdown of
index conditions (overloaded records_in_range?)

3) create a new execution function to access such indexes that is similar to a
range scan (do we need two, one for a single record read and another for scans?
If so we need a way to differentiate between the two since a returned record
count of 1 might just be an estimate.) 

Issues:
A) Do we want to support the index push-down by engine or on an index by index
basis? For Nitro they will be happy with a engine flag, but will that be easier
to code?

B) It makes sense to have some way of letting the storage engine determine which
conditions it will execute and which it will not. Nitro will always take all or
none, but we should design with the intention of supporting selective index
push-downs where the returns which conditions it “accepted” and which it did not.
C) The above aggregate push-down can be thought of as all or nothing from a
single index for Nitro. While this is the primary condition we are targeting,
there is another case for aggregate queries with no group bys where aggregates
might be pushed down to different indexes for the same query. This will be
discussed in detail later.   

Added 20070629 - Brian M

We do not need to push down fully qualified (all columns are specified) accesses
to unique indexes. Since only one record is returned there is little to gain
here. There is no reason to do this unless it is easier to include that to exclude.

Push Down Index Conditions

Why do we need push down conditions for indexes and why can’t we use the
interface we already have?

How does this differ from what we do now, we already push down conditions to
indexes? Well yes and no. For a simple query:

Create table T (
A int,
B int,
C, int,
D int,
E varchar(32),
Primary key(a,b,c,d) ) ;

Select a, b, c, d , e from T where  A > 1000 ;

We push down the value 1000 as a parameter to records_in_range and in
read_index. But consider the query:

Select a, b, c, d, e from T where a in (23, 67, 56, … , 9123) ;

To get the cost to access the results using the index we:
1) Call records_in_range 1 time for each value in the “IN” list.
2) Call read_index  for each value in the IN list. 

In this case we do not push down the full condition the storage engine, but
rather pass it down piecemeal one value at a time.

While the above example illustrates a case of simply pushing down the execution
for some, possibly very minimal, performance gain, there are other cases where
we would push down the index conditions to allow the storage engine to execute n
index access behavior that MySQL DOES NOT SUPPORT. An example of this more
interesting case is a skip-scanning index, such as is supported by Nitro. A skip
scanning index will scan part of an index, skip to another part, scan some more,
possibly scan many more segments of the index before it is complete. MySQL is
working on this in the special case of aggregation; see WL#3220, “Loose index
scan for aggregate functions”. Nitro uses it generally whenever it can and keeps
detailed index statistics so that its optimizer can accurately calculate the
cost of a skip-scan index range access to a table. 

For example:

Using the same table as in the previous examples:

Select a, b, c, d, e from T where B = 5;

For this query MySQL has no option other than a full table scan unless someone
adds another index. If the primary key was a skip-scanning index then the engine
could use the index to skip scan through the data. Obviously if the engine is to
be able to efficiently determine the cost of the query during the optimization
then the engine must have good statistics on the index. (Not really our problem,
the storage engine needs to solve this issue. Nitro for instance, keeps detailed
column level statistics on their indexes.) 

The current interface could be easily made to work for the last example. But
what about:

Select  a, b, c, d, e from T where B = 5 and D = 2

Select a, b, c, d, e from T where B = 5 and C IN (3,78,102)

Select a, b, c, d, e from T where B = 5 and C < 10 and D > 98

All of these queries have conditions that are too complex to be passed using the
pair of key_range values from records_in_range or key buffer and mode_flag in
read_index. In addition, if we make any assumptions on what an index might
accept we may not all index types. 

Exceptions:
1) Do not use index push-down if the index is unique and its values are fully
specified.

Detailed design

I)	Add the HA_INDEX_PUSH_DOWN index flag.
II)	Modify the COND (Item derived) object so that it has the ability to store a
flag for each condition that indicates if the storage engine has selected
(accepted) it for push down and used it as part of the cost and record
calculations. 
III)	Add an COND pointer called cond to the KEY structure.
IV)	Create a new function that builds an COND similar to make_cond_for_table
called make_cond_for_index. The function will build an COND from the table
COND and the index based on the columns in the index. If there is a condition
with a column in an index that that index’s COND will contain that condition.  
V)	In  the loop in SQL_SELECT::test_quick_select (~line 2195) where the
key_parts are added to each KEY object for the key to be tested check for the
HA_INDEX_PUSH_DOWN flag. If the current index has the flag set, then call
make_cond_for_index and set the value in idx_cond instead of filling out all of
the key_parts.
VI)	Create a new handler method index_pushdown_cost method that takes as parameters:
a.	The index number.
b.	An COND object pointer. 
c.	A pointer to a place to store the number of I/Os the storage engine expects
to do to access the records. This is critical for push down indexes since it is
impossible for MySQL to be able to make valid assumptions on how the index might
be scanning to get the records it is returning.
d.	A pointer to a Boolean is_ror_scan which is set by the storage engine if the
returned values are returned in ror order. 
e.	And returns the number of records that will be returned or HA_POS_ERROR if
the index cannot (or should not as determined by the engine) be used to access
the indexed table given the passed in conditions.
f.	 
VII)	Make sure that an index with HA_INDEX_PUSH_DOWN set is always potentially
SARGable as long as there is at least one column set on it, no matter where the
column is in the index. 
VIII)	In get_key_scans_params in the loop that loops through the potential keys
(~line 4751) replace the call to check_quick_select with a call to
index_push_down_cost for each key with HA_INDEX_PUSH_DOWN set. Set
found_read_time and found_records in check_quick_select.  
IX)	TDB – How does this impact ROR Selectivity?
X)	TDB _ Execution
 
Much more to come ....