WL#8084: Condition pushdown for materialized derived tables

Affects: Server-8.0   —   Status: Complete

For a query of the form SELECT * FROM (SELECT i, j FROM t1) as dt WHERE i > 10; 
, if the derived table is materialized and not merged, then MySQL currently 
materializes the entire  table "t1" and then qualifies the rows with the WHERE 
condition. However in this case, the outer where condition can be pushed down to 
derived table "dt" .

The transformation of the above query to SELECT * FROM (SELECT i, j FROM t1 
WHERE i > 10) as dt  could be done to reduce the number of rows that need to be 
processed. The goal of this worklog is to be able to pushdown such conditions 
from outer select to a materialized derived table if possible.

In simple queries like the above, MySQL tries to merge the derived table into 
the outer select. However for queries where the derived table cannot be merged; 
For ex if the derive table has aggregation; In such cases, the above 
optimization should improve the query performance if any of the outer where 
conditions are pushed into the derived table.

A where condition can be pushed fully or partially to the derived table

For ex:
SELECT i,j FROM (SELECT i FROM t1 GROUP BY i) as dt, t2 WHERE i > 2 and j < 3;

This should be transformed to
SELECT i,j FROM (SELECT i FROM t1 WHERE i > 2 GROUP BY i) as dt WHERE j < 3;

A where condition from the outer select is attached to the where clause or the 
having clause of derived table depending on certain conditions.

If the derived table does not have any aggregations or window functions, it is 
safe to attach the pushed condition to the where clause of the derived table.

Ex: SELECT * FROM (SELECT i,j FROM t1) as dt  WHERE i > 10;

After condition pushdown -
    SELECT * FROM (SELECT i,j FROM t1 WHERE i > 10) as dt;

If the derived table has group by, but not window functions, then it is safe to 
attach to the having condition.

Ex:
SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt  WHERE SUM 
> 10;

After condition pushdown -
SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j HAVING SUM > 10) 
as dt;

However if the fields in the condition are actually group by columns, it is safe 
to push the condition past group by into the where clause.

Ex:
SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt  WHERE i > 
10;

After condition pushdown -
SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 WHERE i > 10 GROUP BY i,j ) as 
dt;

SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 GROUP BY i,j ) as dt  WHERE i > 
10 AND SUM > 10;  

After condition pushdown -
SELECT * FROM (SELECT i,j, SUM(k) as SUM FROM t1 WHERE i > 10 GROUP BY i,j 
HAVING SUM > 10) as dt;

Similarly, if the derived table has window functions, then it is safe to push 
past window functions and to the having clause of derived table iff the 
condition has columns that are used in partition clause of the window functions 
present.

Ex:
SELECT * FROM (SELECT i,j, SUM(k) OVER (PARTITION BY j) as SUM FROM t1 ) as dt  
WHERE j > 10 ;  

After condition pushdown -
SELECT * FROM (SELECT i,j, SUM(k) OVER (PARTITION BY j) as SUM FROM t1 WHERE j > 
10) as dt;

SELECT * FROM (SELECT i,j, MIN(k) as MIN , SUM(k) OVER (PARTITION BY i) as SUM 
FROM t1 GROUP BY i,j ) as dt  WHERE i > 10 AND MIN < 3;  

After condition pushdown -
SELECT * FROM (SELECT i,j, MIN(k) as MIN , SUM(k) OVER (PARTITION BY i) as SUM 
FROM t1 WHERE i > 10 GROUP BY i,j HAVING MIN < 3) as dt;

The worklog does not aim to push conditions down to derived table if the derived 
table has UNIONS. This will be addressed as part of WL#13730.


Functional Requirements:

F1: The SQL query resulting from condition pushdown should be semantically 
correct when compared with the original query.

F2: If a condition can be pushed down to a derived table whether partially or 
fully, MySQL should be able to do it.

F3: Conditions should be pushed past window functions if possible to the HAVING 
condition of the derived table.

F4: Conditions should be pushed past GROUP BY if possible to the WHERE clause of 
the derived table.

F5: Conditions should be pushed as far down as possible if there are nested 
derived tables.

F6: If a materialized derived table has UNION, condition pushdown will not be 
done.

F7: If the materialized derived tables has LIMIT, condition pushdown will not be 
done.

F8: If the materialized derived table is a inner table of an outer join, 
condition 
pushdown is not done.

F9: Conditions having subqueries are not pushed down to materialized derived 
tables.

F10: If a materialized derived table is a CTE, condition is not pushed down if 
the derived table is referenced multiple times.

F11: Conditions having parameters will be pushed down if the condition is of the 
form derived_column > '?'. However if a derived column in a condition is an 
expression having a '?' in the underlying derived table, such conditions are not 
pushed down.

Non-functional requirements:

If optimizer_switch "derived_condition_pushdown" is turned off, then there 
should not be any change in plans for existing queries.

Optimizer table level hint "DERIVED_CONDITION_PUSHDOWN" will enable pushing 
conditions 
down for a derived table resulting in plan changes.
"NO_DERIVED_CONDITION_PUSHDOWN" will not result in any plan changes as condition
is not pushed down for that table.

Condition pushdown to derived table is expected to improve performance of a 
query where MySQL is able to push the condition to derived table.

In SELECT_LEX::prepare, after all the transformations are performed, a WHERE 
condition from the query block is checked to see if it can be pushed to any of 
the materialized derived tables in that query block.

We do the following to push conditions to derived tables.
  - SELECT_LEX::prepare
    - For every materialized derived table
    - if there is a where condition in this query block
        - make condition that can be pushed down to the derived table
           - extract a part of the condition that has fields
             belonging to only this derived table.
           - Check if this condition can be pushed past window functions
             if any to the having clause of the derived table
              - make a condition that could not be pushed past. This
                will remain in the outer query block.
           - Check if this condition can be pushed past group by if
             present to the where clause of the derived table.
               - make a condition that could not be pushed past. This
                 will be part of the having clause of the derived table
                 query.
        - get the remainder condition which could not be pushed to
          the derived table.
          REPEAT ABOVE for the rest of the derived tables.
   - Push the condition down to derived table's query expression
   - For every query expression inside the current query block
     - REPEAT THE ABOVE to keep pushing as far down as possible.

In addition to the above changes, some more changes are introduced in both 
resolver and parser. This is to create cloned items for the fields in a 
materialized derived table. We achieve that by re-parsing fields of the 
materialized derived table. The need to do this is explained below:

 Consider a query like this one:
 SELECT * FROM (SELECT i+10 AS n FROM
 (SELECT a+7 AS i FROM t1) AS dt1 ) AS dt2 WHERE n > 100;

 Call to push_conditions_to_derived for the top most query would result in the 
 following query. "n" in the where clause is replace with (i+10).
 SELECT * FROM (SELECT i+10 AS n FROM
 (SELECT a+7 AS i FROM t1) AS dt1 WHERE (dt1.i+10) > 100) as dt2;

 The next call to push_conditions_to_derived should result in the following
 query. "i" is replaced with "a+7".
 SELECT * FROM (SELECT i+10 AS n FROM
 (SELECT a+7 AS i FROM t1 WHERE ((t1.a+7)+10) > 100) AS dt1) as dt2;

 However without cloned fields, it would be

 SELECT * FROM (SELECT ((t1.a+7)+10) AS n FROM
 (SELECT a+7 AS i FROM t1 WHERE ((t1.a+7)+10) > 100) AS dt1) as dt2;

 Notice that the field "i" in derived table dt2 is getting replaced
 with (a+7) because we replaced the argument of the function
 Item_func_plus in (i+10) with (a+7). The arguments to the function
 (i+10) need to be different so as to be able to replace them with
 some other expressions later.

Since the derived table fields become part of the where clause of different 
derived table in case of nested derived tables, we need to keep the sanity of 
select clause of the derived table from where the condition is pushed down. To 
do the same, we clone the select expressions of every materialized derived 
table.

As part of this worklog we have the following changes:

Parser:

A synthetic token is introduced to let the parser know that it will be parsing 
derived table fields. This will parse the expression and return an Item to be 
used later to replace fields in the pushed down condition.

Resolver:

At the end of prepare, after all transformations are done, as explained above 
for every query expression, we first check if there is condition that could be 
pushed down to the having or where clause. If there is one, we attach to the 
existing having or where clause of the derived table query block.This is 
repeated while traversing the query blocks top down.

While replacing the pushed conditions with fields of derived table, the 
expressions present in the condition are re-parsed to make copies. These copies
are used in the pushed conditions. 
Note that it is not needed to clone when a condition is pushed to having clause 
of the derived table as it does not get pushed further down.It is only when a 
condition is pushed down to the where clause of the derived table that the 
expressions are cloned and used because then those conditions might get pushed 
further down in case of nested derived tables.

Optimizer:

None

Executor:

None

Exceptions:

Conditions having subqueries are not pushed down. This should be lifted later.

Conditions will not be pushed to derived tables having unions because then we 
need to clone the where clause based on the number of selects in the union. This 
will be addressed in the follow up task in WL#13730.

Other changes:

Introduced a optimizer_switch "derived_condition_pushdown" to enable/disable 
this feature. It is "on" by default. The switch is similar to other switchable 
optimizations in its use.

Introduced optimizer table level hints "DERIVED_CONDITION_PUSHDOWN" and 
"NO_DERIVED_CONDITION_PUSHDOWN". Similar to other optimizer hints, hint value 
overrides the corresponding optimizer switch value. "DERIVED_CONDITION_PUSHDOWN" 
will enable pushing conditions to the derived table specified in the hint and 
"NO_DERIVED_CONDITION_PUSHDOWN" will disable pushing conditions to the derived 
table specified in the hint.

Added changes to print some details when optimizer_trace is enabled.