WL#8084: Condition pushdown for materialized derived tables
Affects: Server-8.0
—
Status: Complete
For a query of the form SELECT * FROM (SELECT i, j FROM t1) as dt WHERE i > 10; , if the derived table is materialized and not merged, then MySQL currently materializes the entire table "t1" and then qualifies the rows with the WHERE condition. However in this case, the outer where condition can be pushed down to derived table "dt" . The transformation of the above query to SELECT * FROM (SELECT i, j FROM t1 WHERE i > 10) as dt could be done to reduce the number of rows that need to be processed. The goal of this worklog is to be able to pushdown such conditions from outer select to a materialized derived table if possible. In simple queries like the above, MySQL tries to merge the derived table into the outer select. However for queries where the derived table cannot be merged; For ex if the derive table has aggregation; In such cases, the above optimization should improve the query performance if any of the outer where conditions are pushed into the derived table. A where condition can be pushed fully or partially to the derived table For ex: SELECT i,j FROM (SELECT i FROM t1 GROUP BY i) as dt, t2 WHERE i > 2 and j < 3; This should be transformed to SELECT i,j FROM (SELECT i FROM t1 WHERE i > 2 GROUP BY i) as dt WHERE j < 3; A where condition from the outer select is attached to the where clause or the having clause of derived table depending on certain conditions. If the derived table does not have any aggregations or window functions, it is safe to attach the pushed condition to the where clause of the derived table. Ex: SELECT * FROM (SELECT i,j FROM t1) as dt WHERE i > 10; After condition pushdown - SELECT * FROM (SELECT i,j FROM t1 WHERE i > 10) as dt; If the derived table has group by, but not window functions, then it is safe to attach to the having condition. Ex: SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt WHERE SUM > 10; After condition pushdown - SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j HAVING SUM > 10) as dt; However if the fields in the condition are actually group by columns, it is safe to push the condition past group by into the where clause. Ex: SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt WHERE i > 10; After condition pushdown - SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 WHERE i > 10 GROUP BY i,j ) as dt; SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt WHERE i > 10 AND SUM > 10; After condition pushdown - SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 WHERE i > 10 GROUP BY i,j HAVING SUM > 10) as dt; Similarly, if the derived table has window functions, then it is safe to push past window functions and to the having clause of derived table iff the condition has columns that are used in partition clause of the window functions present. Ex: SELECT * FROM (SELECT i,j, SUM(k) OVER (PARTITION BY j) as SUM FROM t1 ) as dt WHERE j > 10 ; After condition pushdown - SELECT * FROM (SELECT i,j, SUM(k) OVER (PARTITION BY j) as SUM FROM t1 WHERE j > 10) as dt; SELECT * FROM (SELECT i,j, MIN(k) as MIN , SUM(k) OVER (PARTITION BY i) as SUM FROM t1 GROUP BY i,j ) as dt WHERE i > 10 AND MIN < 3; After condition pushdown - SELECT * FROM (SELECT i,j, MIN(k) as MIN , SUM(k) OVER (PARTITION BY i) as SUM FROM t1 WHERE i > 10 GROUP BY i,j HAVING MIN < 3) as dt; The worklog does not aim to push conditions down to derived table if the derived table has UNIONS. This will be addressed as part of WL#13730.
Functional Requirements: F1: The SQL query resulting from condition pushdown should be semantically correct when compared with the original query. F2: If a condition can be pushed down to a derived table whether partially or fully, MySQL should be able to do it. F3: Conditions should be pushed past window functions if possible to the HAVING condition of the derived table. F4: Conditions should be pushed past GROUP BY if possible to the WHERE clause of the derived table. F5: Conditions should be pushed as far down as possible if there are nested derived tables. F6: If a materialized derived table has UNION, condition pushdown will not be done. F7: If the materialized derived tables has LIMIT, condition pushdown will not be done. F8: If the materialized derived table is a inner table of an outer join, condition pushdown is not done. F9: Conditions having subqueries are not pushed down to materialized derived tables. F10: If a materialized derived table is a CTE, condition is not pushed down if the derived table is referenced multiple times. F11: Conditions having parameters will be pushed down if the condition is of the form derived_column > '?'. However if a derived column in a condition is an expression having a '?' in the underlying derived table, such conditions are not pushed down. Non-functional requirements: If optimizer_switch "derived_condition_pushdown" is turned off, then there should not be any change in plans for existing queries. Optimizer table level hint "DERIVED_CONDITION_PUSHDOWN" will enable pushing conditions down for a derived table resulting in plan changes. "NO_DERIVED_CONDITION_PUSHDOWN" will not result in any plan changes as condition is not pushed down for that table. Condition pushdown to derived table is expected to improve performance of a query where MySQL is able to push the condition to derived table.
In SELECT_LEX::prepare, after all the transformations are performed, a WHERE condition from the query block is checked to see if it can be pushed to any of the materialized derived tables in that query block. We do the following to push conditions to derived tables. - SELECT_LEX::prepare - For every materialized derived table - if there is a where condition in this query block - make condition that can be pushed down to the derived table - extract a part of the condition that has fields belonging to only this derived table. - Check if this condition can be pushed past window functions if any to the having clause of the derived table - make a condition that could not be pushed past. This will remain in the outer query block. - Check if this condition can be pushed past group by if present to the where clause of the derived table. - make a condition that could not be pushed past. This will be part of the having clause of the derived table query. - get the remainder condition which could not be pushed to the derived table. REPEAT ABOVE for the rest of the derived tables. - Push the condition down to derived table's query expression - For every query expression inside the current query block - REPEAT THE ABOVE to keep pushing as far down as possible. In addition to the above changes, some more changes are introduced in both resolver and parser. This is to create cloned items for the fields in a materialized derived table. We achieve that by re-parsing fields of the materialized derived table. The need to do this is explained below: Consider a query like this one: SELECT * FROM (SELECT i+10 AS n FROM (SELECT a+7 AS i FROM t1) AS dt1 ) AS dt2 WHERE n > 100; Call to push_conditions_to_derived for the top most query would result in the following query. "n" in the where clause is replace with (i+10). SELECT * FROM (SELECT i+10 AS n FROM (SELECT a+7 AS i FROM t1) AS dt1 WHERE (dt1.i+10) > 100) as dt2; The next call to push_conditions_to_derived should result in the following query. "i" is replaced with "a+7". SELECT * FROM (SELECT i+10 AS n FROM (SELECT a+7 AS i FROM t1 WHERE ((t1.a+7)+10) > 100) AS dt1) as dt2; However without cloned fields, it would be SELECT * FROM (SELECT ((t1.a+7)+10) AS n FROM (SELECT a+7 AS i FROM t1 WHERE ((t1.a+7)+10) > 100) AS dt1) as dt2; Notice that the field "i" in derived table dt2 is getting replaced with (a+7) because we replaced the argument of the function Item_func_plus in (i+10) with (a+7). The arguments to the function (i+10) need to be different so as to be able to replace them with some other expressions later. Since the derived table fields become part of the where clause of different derived table in case of nested derived tables, we need to keep the sanity of select clause of the derived table from where the condition is pushed down. To do the same, we clone the select expressions of every materialized derived table. As part of this worklog we have the following changes: Parser: A synthetic token is introduced to let the parser know that it will be parsing derived table fields. This will parse the expression and return an Item to be used later to replace fields in the pushed down condition. Resolver: At the end of prepare, after all transformations are done, as explained above for every query expression, we first check if there is condition that could be pushed down to the having or where clause. If there is one, we attach to the existing having or where clause of the derived table query block.This is repeated while traversing the query blocks top down. While replacing the pushed conditions with fields of derived table, the expressions present in the condition are re-parsed to make copies. These copies are used in the pushed conditions. Note that it is not needed to clone when a condition is pushed to having clause of the derived table as it does not get pushed further down.It is only when a condition is pushed down to the where clause of the derived table that the expressions are cloned and used because then those conditions might get pushed further down in case of nested derived tables. Optimizer: None Executor: None Exceptions: Conditions having subqueries are not pushed down. This should be lifted later. Conditions will not be pushed to derived tables having unions because then we need to clone the where clause based on the number of selects in the union. This will be addressed in the follow up task in WL#13730. Other changes: Introduced a optimizer_switch "derived_condition_pushdown" to enable/disable this feature. It is "on" by default. The switch is similar to other switchable optimizations in its use. Introduced optimizer table level hints "DERIVED_CONDITION_PUSHDOWN" and "NO_DERIVED_CONDITION_PUSHDOWN". Similar to other optimizer hints, hint value overrides the corresponding optimizer switch value. "DERIVED_CONDITION_PUSHDOWN" will enable pushing conditions to the derived table specified in the hint and "NO_DERIVED_CONDITION_PUSHDOWN" will disable pushing conditions to the derived table specified in the hint. Added changes to print some details when optimizer_trace is enabled.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.