WL#3929: PSE API Index Condition and Aggregate Pushdown
Affects: Server-5.2 — Status: Un-Assigned — Priority: Medium
This replaces WL#3925 and WL#3924 The specific goal of this enhancement is to add support for Nitro indexes and Nitro index based aggregation accelration functions. The general goal is to extend the MySQL PSE API to support any index type by: 1) pushing down the cost calculations and 2) pushing down the execution of accesses to such indexes to the storage engine. While the primary goal is to push-down the index conditions in support of the push-down of aggregation functions, the index condition push-down should also work when no aggregates are present. General overview of proposed solution is to: 1) If the HA_PUSH_DOWN_INDEXS flag is set then replace the current 1 or more calls to records_in_range per index with a single call to a new (or overloaded records_in_range) function that returns both a record count and and I/O count. The select list (including aggregates) will also be pushed down and the function will also "mark" which of the items in the select list it can return during its scan. 2) The MySQL optimizer will choose the “best” index based on the returned values. 3) The access to the index will be completely pushed down to the storage engine via a new function (?index_read_push?) which will simply push down the conditions and read index records via index_next. The index will be scaned just like a range scan does now. In the case where there are pushed down select items (i.e. aggregates) it will behave like a covering index containing ALL OF THE VALUES which returns one row per group by value set. Push Down Aggregation: Don’t ask me how they do it, it’s their big secret, but Nitro can determine the value of an aggregate function such as SUM, AVG, STDDEV, (not just MIN, Max, and COUNT) in 2 I/Os or less even on tables with billions of rows. This is a big feature of their storage engine and its utilization is critical to the success of their storage engine. In order to use this functionality an index must be defined with the aggregate (using our new index comments.) These accelerated aggregates are then maintained along with the index. The accelerated aggregates need to be accessed via the index. Obviously, since the calculation of the aggregate is pushed down to the storage engine it is a prerequisite that all of the conditions on that index also be passed down. There are other prerequisites of the queries where the aggregate(s) can be pushed down including: a) All conditions must be on fields in the index b) All of the group by columns must be on columns in the index. c) No OR conditions are left after the optimizer rewrite phase. d) No joins (there are cases where a join would work, but Nitro is not interested in solving that problem at this time) NOTE also that the "no joins" restriction only applies to the push down aggregates, not to the push down indexes! The pushed down aggregation will be including in the calculation of the cost (number of records and number of I/Os to read those records) as will the access of “hitch-hiker” columns that may be part of the non-indexed columns of an index. Support for Skip Scanning indexes: As a side effect the general push-down of index conditions will also allow MySQL to support Nitro’s skip-scanning indexes for both general query use and with the push down aggregates. Unlike the aggregates, index push down should work with join conditions and in any other case a range scan should work. Any place an index range scan or read exact will work the index push down should work. This includes index merge, etc. Functionality needed to support Nitro indexes and Nitro Aggregates in indexes: 1) Add a new engine flag HA_PUSH_DOWN_INDEXES 2) Add optimizer support for pushing down the calculation of the number of records and the number of I/Os to the storage engine via a generic pushdown of index conditions (overloaded records_in_range?) 3) create a new execution function to access such indexes that is similar to a range scan (do we need two, one for a single record read and another for scans? If so we need a way to differentiate between the two since a returned record count of 1 might just be an estimate.) Issues: A) Do we want to support the index push-down by engine or on an index by index basis? For Nitro they will be happy with a engine flag, but will that be easier to code? B) It makes sense to have some way of letting the storage engine determine which conditions it will execute and which it will not. Nitro will always take all or none, but we should design with the intention of supporting selective index push-downs where the returns which conditions it “accepted” and which it did not. C) The above aggregate push-down can be thought of as all or nothing from a single index for Nitro. While this is the primary condition we are targeting, there is another case for aggregate queries with no group bys where aggregates might be pushed down to different indexes for the same query. This will be discussed in detail later. Added 20070629 - Brian M We do not need to push down fully qualified (all columns are specified) accesses to unique indexes. Since only one record is returned there is little to gain here. There is no reason to do this unless it is easier to include that to exclude.
Push Down Index Conditions Why do we need push down conditions for indexes and why can’t we use the interface we already have? How does this differ from what we do now, we already push down conditions to indexes? Well yes and no. For a simple query: Create table T ( A int, B int, C, int, D int, E varchar(32), Primary key(a,b,c,d) ) ; Select a, b, c, d , e from T where A > 1000 ; We push down the value 1000 as a parameter to records_in_range and in read_index. But consider the query: Select a, b, c, d, e from T where a in (23, 67, 56, … , 9123) ; To get the cost to access the results using the index we: 1) Call records_in_range 1 time for each value in the “IN” list. 2) Call read_index for each value in the IN list. In this case we do not push down the full condition the storage engine, but rather pass it down piecemeal one value at a time. While the above example illustrates a case of simply pushing down the execution for some, possibly very minimal, performance gain, there are other cases where we would push down the index conditions to allow the storage engine to execute n index access behavior that MySQL DOES NOT SUPPORT. An example of this more interesting case is a skip-scanning index, such as is supported by Nitro. A skip scanning index will scan part of an index, skip to another part, scan some more, possibly scan many more segments of the index before it is complete. MySQL is working on this in the special case of aggregation; see WL#3220, “Loose index scan for aggregate functions”. Nitro uses it generally whenever it can and keeps detailed index statistics so that its optimizer can accurately calculate the cost of a skip-scan index range access to a table. For example: Using the same table as in the previous examples: Select a, b, c, d, e from T where B = 5; For this query MySQL has no option other than a full table scan unless someone adds another index. If the primary key was a skip-scanning index then the engine could use the index to skip scan through the data. Obviously if the engine is to be able to efficiently determine the cost of the query during the optimization then the engine must have good statistics on the index. (Not really our problem, the storage engine needs to solve this issue. Nitro for instance, keeps detailed column level statistics on their indexes.) The current interface could be easily made to work for the last example. But what about: Select a, b, c, d, e from T where B = 5 and D = 2 Select a, b, c, d, e from T where B = 5 and C IN (3,78,102) Select a, b, c, d, e from T where B = 5 and C < 10 and D > 98 All of these queries have conditions that are too complex to be passed using the pair of key_range values from records_in_range or key buffer and mode_flag in read_index. In addition, if we make any assumptions on what an index might accept we may not all index types. Exceptions: 1) Do not use index push-down if the index is unique and its values are fully specified.
Detailed design I) Add the HA_INDEX_PUSH_DOWN index flag. II) Modify the COND (Item derived) object so that it has the ability to store a flag for each condition that indicates if the storage engine has selected (accepted) it for push down and used it as part of the cost and record calculations. III) Add an COND pointer called cond to the KEY structure. IV) Create a new function that builds an COND similar to make_cond_for_table called make_cond_for_index. The function will build an COND from the table COND and the index based on the columns in the index. If there is a condition with a column in an index that that index’s COND will contain that condition. V) In the loop in SQL_SELECT::test_quick_select (~line 2195) where the key_parts are added to each KEY object for the key to be tested check for the HA_INDEX_PUSH_DOWN flag. If the current index has the flag set, then call make_cond_for_index and set the value in idx_cond instead of filling out all of the key_parts. VI) Create a new handler method index_pushdown_cost method that takes as parameters: a. The index number. b. An COND object pointer. c. A pointer to a place to store the number of I/Os the storage engine expects to do to access the records. This is critical for push down indexes since it is impossible for MySQL to be able to make valid assumptions on how the index might be scanning to get the records it is returning. d. A pointer to a Boolean is_ror_scan which is set by the storage engine if the returned values are returned in ror order. e. And returns the number of records that will be returned or HA_POS_ERROR if the index cannot (or should not as determined by the engine) be used to access the indexed table given the passed in conditions. f. VII) Make sure that an index with HA_INDEX_PUSH_DOWN set is always potentially SARGable as long as there is at least one column set on it, no matter where the column is in the index. VIII) In get_key_scans_params in the loop that loops through the potential keys (~line 4751) replace the call to check_quick_select with a call to index_push_down_cost for each key with HA_INDEX_PUSH_DOWN set. Set found_read_time and found_records in check_quick_select. IX) TDB – How does this impact ROR Selectivity? X) TDB _ Execution Much more to come ....
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.