WL#2980: Subquery optimization: Semijoin
Affects: Server-6.0
—
Status: Complete
The optimizer should handle certain subqueries so that they run as quickly as joins. The optimizer may choose the driving table, may use semijoins, may skip index keys for some index structures. The EXPLAIN statement will explain. Example: SELECT * FROM t1 WHERE t1.key in (SELECT t2.key FROM t2) -> SELECT * FROM t1, t2(distinct t2.key) WHERE t1.key=t2.key Note: As agreed during the Athens Dev-MTX meeting, Igor will write the specification for this item (it may then be split into two separate tasks). The current estimate is that it will take one man-month to write the specification. Note (added by Trudy Pelzer, 2006-05-15): Per Brian Aker, the subquery optimization must handle statements of the type: UPDATE foo SET a=10 WHERE a in (SELECT a FROM foo,bar WHERE foo.b=20 AND foo.a=17 AND bar.c=foo.d) Brian and Igor have discussed this requirement.
WL#3740: Subquery optimization: Semijoin: Pull-out of inner tables
WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
WL#3750: Subquery optimization: Semijoin: First-match strategy
WL#3751: Subquery optimization: Semijoin: Inside-out strategy
WL#3952: Add @@optimizer_switch variable
WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
WL#3750: Subquery optimization: Semijoin: First-match strategy
WL#3751: Subquery optimization: Semijoin: Inside-out strategy
WL#3952: Add @@optimizer_switch variable
WL#2980: Subquery optimization: Semijoin: HLS *********************************************1. Task scope 2. NL-SJ applicability criteria 2.1 No issues with NULL comparisons 3. Subquery to semijoin conversion 4. Automatic satisfaction 5. Duplicate outer rows elimination strategy 6. First match strategy 7. Inner-to-outer strategy 8. Nested-Loops Semi-Join implementation 8.1 Possible join orders 8.1.1 Interleaving between semijoins 8.1.2 Relationship with outer joins 8.1.3 The check to be used to determine if we can add table 8.2 Executioner adjustments 8.3 Join optimizer adjustments 9. EXPLAIN changes 10. To be covered in the LLD 11. Other notes 1. Task scope ============= Under certain conditions, a SELECT with IN/=ANY subquery at top level in the WHERE/ON clause can be executed by running Nested-Loops SemiJoin (further NL-SJ). That is, a select with subquery SELECT ... FROM ot1 ... otN WHERE (oe1, ..., oeN) IN (SELECT ie1, ... ieN FROM it1, ... itN WHERE ...) is converted into select with a semi-join SELECT ... FROM (ot1, ..., otN) SJ (it1, ..., itK) ON sj_cond(ot*, it*) which can be evaluated using NL-join code with the exception that we'll need to satisfy this requirement: for any row combination of outer tables we may not generate more than one row combination (INNER-UNIQ-REQ) of inner tables. To satisfy the requirement we will use the following strategies: 1. AutomaticSatisfaction: prove that the query's equalities and unique/primary keys on the queried tables will guarantee that the requirement is met. 2. DuplicateOuterRowsElimination: use a temporary table to weed out the duplicate combinations of outer table rows. 3. FirstMatch: in certain join orders we can avoid producing duplicate row combinations by short-cutting the join execution. 4. DistinctStreamProduction: Scan the inner (subquery's) tables in a way that produces at most one match per each combination of outer rows. The implementation will be structured as follows: - Subquery predicate to semi-join nest (i.e. TABLE_LIST) converter. This is used by all strategies - Procedure to pull the tables out of semi-join nests (AutomaticSatisfaction implementation) - Adjustments in the join optimizer to account for use of strategies #2-#4 on the execution phase. - Adjustments in the join executioner for strategies #2-#4. 2. NL-SJ applicability criteria =============================== (WL#3740 has a copy of this section contents) A subquery predicate can be replaced with semi-join if the following conditions are met: SUBQ-TO-SJ-CRITERIA: 1. Subquery predicate has form (...) IN (SELECT ...) (...) =ANY (SELECT ...) 2. Subquery is a single SELECT (no UNIONs) 3. Subquery does not have GROUP BY or ORDER BY 4. Subquery does not use aggregate functions 5. #tables-in-parent-query + #tables-in-subquery <= MAX_TABLES 6. Subquery predicate is at the AND-top-level of ON/WHERE clause Note that the subquery may have DISTINCT or LIMIT clauses. 2.1 No issues with NULL comparisons ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A note about NULL values: The subuqery predicate is required to be at top-level of WHERE/ON (where we don't distinguish between FALSE and NULL). The subquery predicate is not a NOT IN, so we will not need any special handling for NULL values. 3. Subquery to semijoin conversion ================================== This is covered in WL#3740. 4. Automatic satisfaction ========================= This is covered in WL#3740. 5. Duplicate outer rows elimination strategy ============================================ This is covered in WL#3741. 6. First match strategy ======================= This is covered in WL#3750. 7. Inner-to-outer strategy ========================== This is covered in WL#3751. 8. Nested-Loops Semi-Join implementation ======================================== 8.1 Possible join orders ------------------------ 8.1.1 Interleaving between semijoins ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Interleaving between semijoins is ok. Tables of one semi-join are considered "unrelated" tables for the other. TODO: need to say more here? it's "intuitively clear".. 8.1.2 Relationship with outer joins ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tables within an outer join nest must remain within that nest. this rule has priority over all semijoin rules. 8.1.3 The check to be used to determine if we can add table ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Definitions of acceptable join orders were already given. For JoinShortcutting, we can just use table dependencies. For DistinctStream, we'll have to keep track in which part of the pattern we are. Since interleaving is allowed, we'll need to do it for each semi-join. A candidate implementation: Have an array of ints to show in which pattern we are. We'll need to - notice when we start atX tables (and set some flag so best_access_path only chooses appropriate access methods) - Notice when we end atX tables - prevent appearance of outer tables before the inner. For each semijoin we have a kind of 4-state DFA. For each of the states it will have - table_map of tables that may be added to partial join_order - next_state(table#) function . 8.2 Executioner adjustments --------------------------- We will need to add the "jump-outs" to certain JOIN_TABs. The above sections have a description of what kind of "jump outs" and where to. Nothing more to cover here for the HLS. 8.3 Join optimizer adjustments ------------------------------ { A check if given table can be used to extend given partial join order } { Modifications to join cost calculations} { - the fanout changes } 9. EXPLAIN changes ================== After all * Semi-joined tables show up in the upper SELECT. * EXPLAIN EXTENDED + SHOW WARNINGS shows the semijoin structure (this will give idea about which tables were pulled out of the semi-join). * Temporary table use is indicated by "Start temporary" and "End temporary" in the "Extra" column. Tables that were not pulled out (see previous entry) and are in the [Start temporary; End temporary] range will have their rowid in the temp table. * Join shortcutting is indicated by "FirstMatch(tableX)" in the "Extra" column. TODO: Add the WL#3750 part here. 10. To be covered in the LLD ============================ * TODO: what do we do if the parent and/or subquery's SELECT has a STRAIGHT_JOIN attribute(s)? * TODO: How does the conversion relate to prepared statements? In order for conversions to be valid we need to rely on table DDLs not to be changed (e.g. "automatic satisfaction" needs UNIQUE key to remain UNIQUE) 11. Other notes =============== We can use a temporary table to run anti-semijoin to handle NOT IN.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.