WL#8244: Hints for subquery strategies
Status: Complete — Priority: Medium
Using the new syntax and infrastructure for hints (see WL#8016 and WL#8017), add hints to control which subquery execution strategies should be used. This includes whether or not to use semi-join, which semi-join strategy to use, and, in case semi-join is not used, whether to use subquery materialization or in-to-exists transformation. Note that this work also should make it possible to prevent any of the semi-join strategy, also Duplicate Weed-out, which is not possible to turn off with optimizer_switch. User Documentation ================== * http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-8.html * http://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html
F-1: The hints specified in this work log shall conform with the syntax for hints as specified in the HLS. F-2: If SEMIJOIN hint is not listing any strategies, semi-join shall be used if possible. A strategy for which the corresponding optimizer_switch is on will be used if possible. If none of these strategies are applicable, the Duplicate Weed-out strategy shall be used. F-3: If SEMIJOIN hint is given with a specification of possible strategies, one of the listed strategies shall be used if possible. F-4: If the listed strategies of a SEMIJOIN hint are not applicable for the given query, the Duplicate Weed-out strategy shall be used. F-5: If NO_SEMIJOIN hint is given without listing any strategies, semi-join shall not be used. F-6: If NO_SEMIJOIN hint is given with a specification of strategies not to use, none of the listed strategies shall be used unless the list include all strategies that are applicable for the query. F-7: If the listed strategies of a NO_SEMIJOIN hint include all applicable strategies for the given query, the Duplicate Weed-out strategy shall be used. F-9: If a subquery is nested within another subquery and both subqueries are merged into a semi-join of an outer query, any specification of semi-join strategies for the inner-most query will be ignored. SEMIJOIN/NO_SEMIJOIN hints can still be used force/prevent semi-join transformation for such nested subqueries. F-10: If a SUBQUERY hint is given, the given strategy shall be used for subquery execution if possible. F-11: A SUBQUERY(MATERIALIZATION) hint may be ignored when the query cannot be executed with subquery materialization. F-12: If two or more of the SEMIJOIN, NO_SEMIJON, and SUBQUERY hints are specified (of same or different type), the first hint will have effect while subsequent hints will be ignored. If this happens a warning will be given. F-13: If new optimizer_switch, duplicateweedout, is off, Duplicate Weed-out strategy shall not be used unless all other strategies that are applicable for the given query, is also turned off.
The hints to be added this work log are all query block level hints as specified in WL#8017. The hints have the following syntax: HINT([@QB_NAME] [arguments]) If QB_NAME is specified, this hint affects the query block with give the given query block name, otherwise it belongs to current query block. For subquery hints this means that the hint should normally either be placed in the relevant subquery block, or in the top query block with a specification of which query block it should apply to. The following optimizer hints will be implemented as part of this worklog: 1. Specify which semi-join strategies can be used Syntax: /*+ SEMIJOIN( [strategy [,strategy] ... ] ) */ strategy ::= MATERIALIZATION|FIRSTMATCH|LOOSESCAN|DUPSWEEDOUT If no strategy is specified, all strategies, for which the corresponding optimizer_switch is on, are allowed. If semi-join transformation is possible, but none of the specified strategies are applicable for the query, the list of strategies will be ignored. That is, another strategy may be used instead. 2. Specify which semi-join strategies should NOT be used Syntax: /*+ NO_SEMIJOIN( [strategy [,strategy] ... ] ) */ strategy ::= MATERIALIZATION|FIRSTMATCH|LOOSESCAN|DUPSWEEDOUT If no strategy is specified, semi-join transformation will not be used. If both Duplicate Weed-out and FirstMatch is turned off by a NO_SEMIJOIN hint, one these strategies may still be selected if the two other possible strategies are not applicable. A warning that the hint was ignored will then be given. 3. Specify whether Subquery Materialiation or In-to-exists should be used. Syntax: /*+ SUBQUERY(strategy) */ strategy ::= MATERIALIZATION|INTOEXISTS Specifying this hint will also turn off semijoin transformation for the given query block. If MATERIALIZATION is specified when subquery materialization is not possible, the hint will be ignored. Conflicting hints: Only one of the above hints may be specified for each sub-query block. If two or more hints are specified (of same or different type), the first hint will have effect while subsequent hints will be ignored. Warnings: A warning will be given if multiple of above hints are given for the same query block. In all other cases, a warning will NOT be given when a hint is ignored. Duplicate Weed-out strategy: A new optimizer_switch, duplicateweedout, will be added to make it possible to turn off Duplicate Weed-out. Existing tests will be updated to make use of this optimizer_switch when testing specific strategies. As for hints, if both firstmatch and duplicateweedout is set to off, duplicateweedout may still be selected if remaining strategies are not applicable to the query. If Duplicate Weed-out is turned off (either by optimizer_switch or hints), it has been observed that one sometimes will get a plan that is far from optimal. This is due to the heuristic pruning during greedy search, and one can avoid this behavior by setting optimizer_prune_level=0.
Basic Approach --------------- When checking whether to transform a subquery to semi-join, SEMIJOIN/NO_SEMIJOIN hints will be checked and decision based on rules described above. Likewise, SUBQUERY hint will be checked when deciding which subquery execution strategy to choose. Which semi-join strategies that are available for use will be stored in a bit mask in the nested_join object of the corresponding semi-join nest when subquery is converted to semi-join. This bit mask will take into account both hints and optimizer_switch settings, and will be checked where optimizer_switch settings are currently checked. For an IN-subquery that is nested within another IN- subquery where both are merged into the same semi-join nest, hints will be ignored. Handling Duplicate Weed-out --------------------------- Currently, it is not possible to turn-off Duplicate Weed-out strategy with optimizer_switch. The main reason for this is that greedy search assumes that for any join order, at least one semi-join strategy can be applied. In order to still satisfy this assumption, it will during greedy search still be possible to select Duplicate Weed-out strategy even when it is turned off. However, if one later in greedy search find another join order where a different strategy applies, one will switch to this plan even if it is more costly than the earlier found plan with Duplicate Weed-out. This approach also requires changes to how greedy search does plan pruning: If Duplicate Weed-out is turned off, plans should not be pruned as long as the only plan found, use Duplicate Weed-out. Otherwise, we risk that we never evaluate join orders that permit alternative strategies. In more detail, this approach requires the following changes: 1. A new member in Optimize_table_order, found_plan_without_dupsweedout, that indicates whether plans without Duplicate Weed-out have been found so far during greedy search. (Note that only semi-join nest for which Duplicate Weed-out is turned off will matter. Duplicate Weed-out may still be used for other semi-join nests.) 2. Change Optimize_table_order::consider_plan() to switch to a more costly plan if the best plan so far use Duplicate Weed-out for semi-join nests where it is turned off, while the current plan does not. 3. Do not prune plans if found_plan_without_dupsweedout is false. There are two types of pruning that will be changed. The pruning of partial plans that are more costly than the best complete plan found so far, and the heuristic pruning that prunes less promising partial plans. 4. In Optimize_table_order::advance_sj_state(), if Duplicate Weed-out strategy is disabled, do not select it when there are other applicable strategies for this join order. Alternatives that have been considered -------------------------------------- The original idea was to give plans with Duplicate Weed-out a very high cost so that other strategies are selected if possible. However, this has some issues: 1. We would no longer be able to distinguish cost of different plans with Duplicate Weed-out. Hence, we may end up with a bad plan with Duplicate Weed-out instead of a good plan with Duplicate Weed-out. 2. The cost of a Duplicate Weed-out plans as presented by EXPLAIN FORMAT=JSON would be misleading
Copyright (c) 2000, 2019, Oracle Corporation and/or its affiliates. All rights reserved.