WL#3750: Subquery optimization: Semijoin: First-match strategy
Affects: Server-6.0
—
Status: Complete
This is a subtask of WL#2980 "Subquery optimization: Semijoin". This WL entry covers implementation of First-match (formerly known as "Join ShortCutting) semi-join execution strategy.
1. The idea of FirstMatch strategy 2. What join orders can be handled 2.1 Why inner tables cannot be before the outer 2.2 Interleaving with "unrelated" tables 2.3 Acceptable join orders for FirstMatch strategy 3. Use of FirstMatch together with DuplicateWeedout 3. FirstMatch and block NL-join 4. Implementation of FirstMatch 4.1 Accounting for FirstMatch in join optimizer 4.2 Executioner 4.3 EXPLAIN output 1. The idea of FirstMatch strategy ================================== Suppose we have semi-join, and the join order is ot1, ot2, ... otN, it1, it2, ... itK (the naming convention is same as in from WL#3741: ot{i} is a table from the parent select, it{i} is a table from within a semi-join, i.e. from the subquery) Let us start executing NL-Join algorithm for this join order. As soon as we've got a First Match, i.e. full row combination (ot1.row, .... itK.row), we return (or short-cut) to the last outer table otN, and continue scanning from there. It is apparent that we will still enumerate all (ot1.row, ..., onN.row) row combinations. It is also apparent that the short-cutting will prevent generation of any INNER-UNIQ-REQ (see WL#2980 for what it is) violations. 2. What join orders can be handled ================================== As shown in previous section, FirstMatch strategy can be used when all outer tables are followed by all inner tables. 2.1 Why inner tables cannot be before the outer ----------------------------------------------- If the tables are intermixed, we may run into a problem as demonstrated in the following example: Consider a semijoin (ot1, ot2) SJ it1 ON sj_cond Let the join order be ot1, it1, ot2. Let's start running the NL-join. Soon we'll have the state as demonstrated on this swimlane diagram: ot1 | it1 | ot2 | | ot1.row1 | it1.row1 | ot2.row1 | | ot2.row2 | |(*)<--- we are here We've found two matching (i.e. result) tuples: 1. (ot1.row1, it1.row1, ot2.row1) 2. (ot1.row1, it1.row1, ot2.row2) Now there are no more matching records in ot2 and we must return. If we return to it1, then we may find a row it1.row2 and then proceed to generate (ot1.row1, it1.row2, ot2.row[1|2]), which will violate the INNER-UNIQ-REQ. If we return to ot1, then we will miss row combinations with ot1.row, like row combination: (ot1.row1, it1.somerow, ot2.row3). In general case, we'll need to keep track of which rows from ot2 we've returned, which requires O(#examined_rows(ot2)) and is out of scope of this WL entry. 2.2 Interleaving with "unrelated" tables ---------------------------------------- Suppose there are outer tables wrt which the subquery is not correlated, and also those tables are not referred to from sj_cond. We'll call those tables "unrelated" and denote them as ntX. We can have unrelated tables interleaved with subquery tables if we use a modified jump-out strategy: * When we get a matching (it1, ... itN) row combination, we jump back to last table ntX * When we get EOF on some table ntX, we jump to the previous outer table (either related or unrelated) Access/selection conditions of unrelated tables don't depend on what are the current rows of the inner tables and our jumps don't make us skip any outer table accesses, so we will not miss any row combinations of outer tables. Now let's see what we have for INNER-UNIQ-REQ. Suppose we've reached the last inner table and got a matching row combination. The jump-out rules require that we jump back until we get a new row for some outer table (related or unrelated). When we get that new row we'll have a new row combination of outer tables and will have lost any chance to generate INNER-UNIQ-REQ violation for row combination we've previously had. 2.3 Acceptable join orders for FirstMatch strategy -------------------------------------------------- Putting the above considerations together, we get this criterion: Denoting the - outer tables that are not referenced from subquery predicate as ntX - outer tables that are referenced as otX - inner tables as itX The acceptable join orders are those that match this BNF pattern: (otX | ntX)* , (itX | ntX)* This pattern will be referred to as FirstMatchPattern. A duplicate-producing range can be handled by FirstMatch strategy if it matches the FirstMatchPattern. 3. Use of FirstMatch together with DuplicateWeedout =================================================== For a fixed join order and a duplicate-producing range within it, FirstMatch strategy is cheaper than DuplicateWeedout because * It "takes shortcuts" and so eliminates fewer rows than DuplicateWeedout. * It doesn't have the costs of using temptable. For this reason, FirstMatch strategy will preferred over DuplicateWeedout when it is applicable. There are cases where FirstMatch strategy cannot eliminate duplicates itself, hence we use DuplicateWeedout, and use the approach from the FirstMatch strategy for additional speedup. Illustrating it with an example: Suppose we have a join [sub]order that can't be served by FirstMatch strategy: it1 ot1 it2 However, once we got a matching row combination of (it1.rowX, ot1.rowX, it2.rowX) we don't have to enumerate row combinations like (it1.rowX, ot1.rowX, it2.*) and can return to enumerating the last outer table, in our example ot1. When we generalize this, we get the following rule: If a duplicate-producing range has a suffix of (itX | ntX)*, then we can set up FirstMatch-type shortcutting within that suffix. The validity of this approach is proved in the same way as validity of FirstMatch and DuplicateWeedout are proved. Addition of short-cutting does not allow to make any improvements to the way DuplicateWeedout strategy works. That is, the temptable tuple remains the same, temptable flush and weedout-check remain at the same locations. The shortcuts will only reduce the number of times the weedout-check will be performed. 4. FirstMatch and block NL-join =============================== For the same reasons as in WL#3741, FirstMatch strategy cannot be used when one of the tables in the duplicate-generating range (concept defined in WL#3741) uses Block NL-join. FirstMatch can be used without any adjustments if Block NL-join is used by some table that is outside of the duplicate-generating range. 5. Implementation of FirstMatch =============================== 5.1 Accounting for FirstMatch in join optimizer ----------------------------------------------- In scope of this WL use of FirstMatch will not be taken into account in join optimizer. We'll follow the approach of WL#3741: first choose the join order and then determine how semi-join duplicates should be eliminated. This will be done according to the following rule: for each duplicate-generating range R { if (a table in R uses Block NL-join) { Remove all DuplicateWeedout strategies we've set up for \ the preceding ranges; setup_combined(); } else if (R matches FirstMatchPattern) { Setup R to be handled with FirstMatch pattern by adding \ appropriate shortcutting calls; setup_combined(R, first_table(R)); } } setup_combined(R, first_tbl) { Setup R to be handled with DuplicateWeedout strategy; S = range_suffix(R, first_tbl, last_table(R)); if (S has a suffix that matches FirstMatchPattern) { Add appropriate shortcutting calls in the suffix; } } 5.2 Executioner --------------- FirstMatch shortcutting code has already been added in WL#3741. 5.3 EXPLAIN output ------------------ Use of FirstMatch will be indicated in the same way as it is done in WL#3741, the appropriate tables will have FirstMatch(table_name) in the Extra column.
1. Implementation 1.1 Setup, plan refinement stage 1.2 NL-Join algorithm fixes 2. EXPLAIN output 3. Schedule 1. Implementation ================= 1.1 Setup, plan refinement stage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The setup will be done in setup_semijoin_dups_elimination(), a WL#3741 function that sets up use of temporary tables. We'll be able to set up all of the duplication elimination in one forward pass over the join order (as it is done in WL#3741). (Prepared Statements are not a concern for this WL because all the used structures are per-execution, and are pointed from class JOIN which is per-execution too) 1.2 NL-Join algorithm fixes ~~~~~~~~~~~~~~~~~~~~~~~~~~~ WL#3741 already included NL-join fixes neded for FirstMatch strategy. For this WL we'll need to make sure that a "jump out" and a "temptable weedout" on the same table are performed correctly. 2. EXPLAIN output ================= A small change is needed to take into account that "End temporary" and "FirstMatch(tableX)" may occur at the same row. 3. Schedule =========== There will be only one milestone, the entire WL entry
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.