WL#3751: Subquery optimization: Semijoin: Inside-out strategy
Affects: Server-6.0
—
Status: Complete
This is a subtask of WL#2980 "Subquery optimization: Semijoin". This WL entry covers implementation of Inside-out semi-join execution strategy.
1. General idea 1.1 Limits on handling correlated subqueries 1.2 Definition of InsideOut strategy 2. Acceptable join orders 2.1 Prefix of outer tables is ok 2.2 No correlated tables in the suffix 2.3 Table ordering within the sj-nest range 2.3.1 SJ-nest range, ordered enumeration part 2.3.2 SJ-nest range, FirstMatch part 3. Ordered streams and duplicate skipping 3.1 Required orderings 3.1.1 Constant parts removal effects 3.1.2 Equality propagation effects 3.2 Ordered scans 3.3 Access methods that produce ordered output 3.4 Ordered stream production summary 4. Executioner adjustments 4.1 Join shortcutting 4.2 Ordered scan adjustments 4.2.1 Removing duplicates from ordered stream 4.3 EXPLAIN output changes 5. Join optimizer adjustments 6. Things that will not be implemented 6.1 best_access_path() adjustments 6.1.1 Idea about how to handle best_access_path() 6.2 No accounting for functional deps when ordering 6.3 Not taking advantage of cross-join sj-nest ranges 6.4 Not passing info about distinct stream into MRR interface 1. General idea =============== Here is one more way to satisfy INNER-UNIQ-REQ: Consider a case of one inner table and uncorrelated subquery: ot1 SJ it1 ON (it1.a = ot1.a) We'll be able to use join order (it1, ot1) if we can make access to table it1 to produce a distinct stream of {it1.a}. The execution will proceed according to this swimlane diagram: it1 | ot1 | ===== ot1.a=it1.a=c1 group ===== it1.rowX1 | it1.rowX2 | it1.rowX3 -|------------------ pick one row and get the value of c1 | | ot1.rowX1 | ot1.rowX2 | ... (other matching rows, no repetitions | because this is a single scan through table ot1) ===== ot1.a=it1.a=c2 group ===== it1.rowY1 | it1.rowY2 | it1.rowY3 -|----------------- pick one row and get the value of c2 | | ot1.rowX1 | ot1.rowX2 | ... (other rows, no intersection | with the ot1.a=c1 group because | this is the ot1.a=c2 group, and no repetitions) ... | It is apparent that this execution will enumerate all rows from ot1 and will not produce any INNER-UNIQ-REQ violations. The approach relies on that it doesn't matter which it1's row in the group we pick (e.g. it1.rowX1 or it1.rowX2). This means it works only if the subquery is uncorrelated (or correlated only wrt ot1.a). 1.1 Limits on handling correlated subqueries -------------------------------------------- It seems that it is not possible to use this approach to handle correlated subqueries when the outer correlated tables follow the subquery's inner tables. Consider this diagram: it1 | ot1 | --- ot1.a=it1.a=c1 group --- it1.rowX1 | || ot1.rowP1 (corr. check fails) \ | ot1.rowP2 (corr check ok) |- (*) | ot1.rowP3 (corr check ok) / | | it1.rowX2 | | | ot1.rowP1 (corr. check ok) \ | ot1.rowP2 (corr check fails) |- (**) | ot1.rowP3 (corr check ok) / | it1.rowX3 | ... | ... The first outer table scan (marked (*)) does not cause problems. The subquery is correlated, so we will have to scan table ot1 for every other it1.rowX{i} as well. This is necessary as otherwise we might miss matching row combinations like {it1.rowX2, t1.rowP1} But if we continue and do scan (**), we will not be able to tell which ot1's rows have already been passed to output by scan (*). From this we conclude that InsideOut strategy cannot handle a correlated subquery if there is an outer correlated table that is located after the inner tables in the join order. A practically-important special case is when the only source of the correlation is one or several equalities which are AND-parts of the subquery's WHERE clause. In this case it is possible to decorrelate the subquery by converting the equalities to IN-equalities. This case will not be handled within the scope of this WL entry. 1.2 Definition of InsideOut strategy ------------------------------------ Suppose we have a semi-join (ot1 ... otN) SJ (it1 ... itN) ON (oe1=ie1 AND oe2=ie2 AND ...) Executing the semi-join with NL-join algorithm will produce INNER-UNIQ-REQ violations. Each violation is a pair (ot1.rowX, ..., otN.rowX, it_rows1) (ot1.rowX, ..., otN.rowX, it_rows2) such that ie1(it_rows1) = ie1(it_rows2) AND ... ieK(it_rows2) = ieK(it_rows2) AND That is, the set of all row combinations of inner tables is partitioned into subsets. Each partition has its unique value of (ie1, ... ieN) tuple. Any two members of the same partition will form an INNER-UNIQ-REQ violation. Two members of different partitions won't form a violation. We'll avoid producing INNER-UNIQ-REQ violations if we pick one row combination from each partition. This can be achieved if we enumerate rows in a way that row combinations get grouped by their partition, i.e. by the value of (ie1, ..., ieN). One way to do such grouping is to enumerate row combinations in a way that the output is ordered by some-permutation-of(ie1, ..., ieN). Row combination orderings that allow for removal of INNER-UNIQ-REQ violations will be called "InsideOut orderings". 2. Acceptable join orders ========================= Throughout this section we'll assume that the join order consists of three parts: the_prefix the_sj_nest the_suffix (ntX|otX)* (itX|ntX|otX)* (ntX|otX)* In subsequent sections we'll establish the allowed/required properties of the parts of the join order. NOTE: For this WL entry we will follow the approach of WL#3741/WL#3750 and assume that every "semi-join range" in the join order is handled by only one strategy. (A "semi-join range" is a range in join order that contains all sj-inner tables of one or several semi-joins, and no sj-inner tables from any other semi-joins. Semi-join ranges cannot be nested within one another). It is possible to modify semi-join range definition so ranges contain one another and are served by different strategies but this will not be implemented at this phase. 2.1 Prefix of outer tables is ok -------------------------------- Suppose access to inner tables produces an InsideOut ordering of some-permutation-of(ie1, ie2, ..., ieN). We make no assumptions about the_prefix, so there is no warranty that the entire result of join execution will be in InsideOut order. However, for each row combination of the_prefix, row combinations of (inner_tbls, the_suffix) will be enumerated in InsideOut order: the_prefix.row1 inner_tbls.row_x1 \ inner_tbls.row_x2 >-- those go in InsideOut order inner_tbls.row_x3 / the_prefix.row2 inner_tbls.row_y1 \ inner_tbls.row_y2 >-- those go in InsideOut order inner_tbls.row_y3 / the_prefix.row3 ... This will prevent INNER-UNIQ-REQ violations from being formed within one the_prefix.rowX group. the_prefix.rowX != the_prefix.rowY, so a pair of row combinations from different value groups cannot form an INNER-UNIQ-REQ violation either. Also, the subquery can be correlated wrt tables located in the_prefix: when all the outer references in the correlated condition are bound, we can pick any matching inner_tbls row combination within a (ie1, ..., ieN) value group. If some of oe_i refer to tables in the_prefix, then those oe_i, and their matching ie_i remain constants throughout the InsideOut "pass". Because of that they can be removed from InsideOut ordering list: any-permutation-of(ie1, ie2, ie3, ..., ieN) becomes any-permutation-of(ie_i1, ..., ie_ik). When all outer tables are in the prefix, the requirement of InsideOut ordering becomes a confluent "ORDER BY (empty-list)", i.e. ordering is no longer required. The execution algorithm gets reduced to use of FirstMatch strategy. 2.2 No correlated tables in the suffix -------------------------------------- As shown above in section "Limits on handling correlated subqueries", InsideOut strategy cannot be used when the_suffix contains correlated tables. Outer tables that are only referred from IN-equality can be present in the suffix. 2.3 Table ordering within the sj-nest range ------------------------------------------- To recall, we're looking at the join order as the concatenation of three parts: the_prefix the_sj_nest the_suffix (ntX|otX)* (itX|ntX|otX)* (ntX|otX)* the_sj_nest part is divided into two parts: 1. Ordered enumeration part 2. FirstMatch part The first part "is responsible" for enumerating row combinations in a way that IN-expression value groups are not interleaved with each other. The second part is "responsible" for picking one row combination from the IN-expression value group. We will assume that the first part is minimal, i.e. removal of the last table from the first part will cause the part to stop producing the required grouping. EXAMPLE: SELECT * FROM ot1, ... WHERE ot1.col IN (SELECT ie1.col FROM ie1, ie2 WHERE pred(ie1,ie2)) Possible join order: it1, it2, ot1, ... ^ ^ ^-------^-- suffix | +-------------- the_sj_nest, FirstMatch part +------------------- the_sj_nest, Ordered enumeration part 2.3.1 SJ-nest range, ordered enumeration part ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This will be covered further in the "Ordered streams and duplicate skipping" section. 2.3.2 SJ-nest range, FirstMatch part ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can reuse the results of WL#3741: * none of the tables must use join buffering * interleaving of inner and unrelated outer tables is ok. In addition, for outer tables we can note that * the subquery is uncorrelated, * for every "invocation" of the FirstMatch the values of all IN-expressions are constant. which means that access to the outer tables does not depend on the inner tables that are in the FirstMatch part. This means that outer tables have the same properties as unrelated tables, and we can treat them in the same way. To sum up: Within the FirstMatch part, any interleaving of outer, unrelated, and inner tables is ok. The executioner code should work like WL#3741, treating outer tables as unrelated. 3. Ordered streams and duplicate skipping ========================================= 3.1 Required orderings ---------------------- 3.1.1 Constant parts removal effects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As has been mentioned earlier, the required ordering is any-permutation-of(ie1, ..., ieN) If there are outer tables that are located before the sj-nest, some of these expressions can be treated as constants and removed from the list. The remaining expressions give us the "nonconfluent" required ordering: any-permutation-of(iex1, ... iexN) Within the scope of this WL we use only one method to produce orderings - do index scans, so each iex{i} needs to be it_i.keyXpartY. NOTE: It is not possible to handle the case when ie_x are arbitrary functions. For example, for ie1=x+y, producing a distinct stream of pairs (x,y) will not allow to produce a distinct stream of (x+y) It might be possible to handle injective functions as they have this property: (x1,y1,...) != (x1,x2,...) => func(x1, y2) != func(x2,y2) However this doesn't seem to be practically useful so we will not implement this. 3.1.2 Equality propagation effects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The set of allowed orderings becomes broader when we take into account equality propagation. The set becomes: non_constant_IN_components = { equ1 = (tbl1.keyX1partY1 , ...,tblK.keyXKpartYK), ... equN = (tbl1.keyX1partY1 , ...,tblK.keyXKpartYK), } and we must provide ordering on any-permutation-of(any-element(equ1), any-element(equ2), ... any-element(equN) ); 3.2 Ordered scans ----------------- Ordered enumeration is performed by using index-based access methods: * range * index * ref (but not ref_or_null) At first, it seems that NL-join execution can do ordered enumeration on orderings that span multiple tables. However, we're only interested in orderings that cannot be reduced to single-table orderings by means of - equality propagation (handled in previous section) - use of "constant table" property - all such constant sj-inner tables will be pulled out of the sj-nest (handled in the section before the previous) - use of "locally-constant table" property, when the table is accessed using eq_ref(prefix-tables) and therefore is constant within the given join order suffix. Tables like those will not be considered sj-inner. Taking those considerations into account, lets try to see if it is possible to perform ordered enumeration for multi-table ordering: suppose we need an ordering on order= ( t1.key, t2.key ) and the join order is (t1,t2). As stated above, t1.key is not a unique index, and there are different values of t1.key. Then the enumeration will proceed in a way like this: t1.key t2.key 'bar' 1 2 3 4 'bar' 2 3 4 'bar' 1 3 'foo' 1 2 3 which shows that the rows will not be enumerated in (t1.key, t2.key) order. The only scenario where we can have multi-table ordering is when table t2 is accessed using eq_ref(prefix-tables, t1). This case will not be handled in scope of this WL entry. The limitation will be denoted as "No accounting for functional deps between the inner tables when resolving the ordering". (Note: I have no proof that it is not possible to produce a multi-table ordering with some combination of inner and outer tables, like "it1 ot1 it2". Thorough investigation will not be done in scope of this WL due to lack of time) 3.3 Access methods that produce ordered output ---------------------------------------------- By definition, a range scan over index(kp1, ..., kpN) produces (or, in case of MRR, can be set to produce) a stream that is ordered by (kp1, kp2, ..., kpN). In addition, if several of the first components are fixed (this can be checked by either analyzing the ranges or analyzing the KEYUSE array), then the output is ordered by the remaining suffix: (kp_n, kp_n+1, ... kpN) The situation is completely analogous for "ref" and "index" scans. Following the action taken by ORDER BY code (in test_if_skip_sort_order), we will ignore the fact that some special cases of index_merge access can produce ordered output. Due to lack of time, we will not implement no-duplicates range scan. We chose to omit non-duplicates range access because - use of InsideOut strategy together with range access for the InsideOut table is a relatively uncommon case. - "range" will use join buffering by default, so we'll have to make a choice between range+join buffering+duplicate elimination and InsideOut strategy + no-duplicates range + no join buffering. This is not immediately trivial. We will handle a case where the InsideOut scan is a duplicate-removal scan of the entire index. This case will be evaluated as a confluent "ref" scan on 0 keyparts. 3.4 Ordered stream production summary ------------------------------------- Summing up the results of the previous sections: - ordering (i.e. grouping) is produced by exactly one inner table IGT - IGT must not use join buffering - IGT's access method must be "ref", which uses all non-constant components of the ordering as its prefix, or its infix, where the prefix of the key is fixed by "keypartX=const" constraints imposed by IGT's access method. Definition Table that produces the ordering (IGT) will be referred to as "InsideOut table". An appropriate scan of that table will be referred to as "InsideOut scan". 4. Executioner adjustments ========================== 4.1 Join shortcutting --------------------- As mentioned in the previous sections, InsideOut strategy will be used together with FirstMatch in certain cases. In that cases it will be neccessary to set up appropriate join shortcutting. It will work in the same way as WL#3741's shortcutting. 4.2 Ordered scan adjustments ---------------------------- As has been shown in earlier sections, grouping/ordering will be provided by one table. There are two possible cases: 1. The first table row in the value group is guaranteed to be the match. This is true when the sj-nest range has no FirstMatch part (see preceding sections for explanation), and the entire table condition is guaranteed to be true by the table's access method. In this case the engine can be instructed to produce a distinct stream by passing a corresponding MRR interface flag. This will not be done within the scope of this WL entry. 2. We need to enumerate rows (or row combinations) in the value group until we find a matching row (combination). When we've found it, we'll need to jump over the remainder of the group. 4.2.1 Removing duplicates from ordered stream ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "jump" will be performed at SQL-layer level. Jumping will be done by skipping the unneded rows. Another possible option would be to use true jump calls handler->index_read(..., HA_READ_AFTER_KEY) but we will not do it as this approach has proved to be slow (TODO: link!). 4.3 EXPLAIN output changes -------------------------- Use of FirstMatch strategy will be reflected in the same way as it was reflected in WL#3741. Use of "jump over to next value group" will be indicated by displaying LooseScan(m..n) in the Extra column, where m and n are the appropriate keypart numbers. 5. Join optimizer adjustments ============================= (See the note in "Things that will not be implemented" section below) For now, we'll settle for this simplistic apporach: when best_access_path() considers access methods for the first inner table it1 for which it could use InsideOut strategy: +------------ The "InsideOut table" | +=======+---- this will be the "FirstMatch tail" .... it1 it2 ... itN We will assume that #rows fanout and access cost of table it will be 1/(SUBQ_INSIDEOUT_SELECTIVITY=0.5) = 2 times lower than the values obtained by current best_access_path() code. The special case of uncorrelated one-inner-table semi-join will be treated in a special way. In this case we will perform a loose index scan, so the cost and #rows will be calculated accordingly (details to be covered in the LLD) 6. Things that will not be implemented ====================================== 6.1 best_access_path() adjustments ---------------------------------- At the moment, best_access_path() picks the cheapest available access method. For join optimization this is ok, because choice of access method has no effect on the optimization of the rest of the join order. This property is what prevents from taking ORDER BY into account by the join optimizer. Introduction of InsideOut strategy poses a similar challenge. Applicability of InsideOut depends on scanning certain tables in certain orders, so the cheapest table access method is not necessarily the best one now. [according to discussion with Igor:] This problem should be solved using this approach: 1. Find the cheapest join order without taking any orderings into account (that's what current code does) 2. Find the cheapest join order that allows to - for ORDER BY case, skip sorting - for InsideOut case, use the InsideOut strategy then compare the two and pick the best one. In order to do the comparison we'll need to calculate the cost of InsideOut and the alternate strategy. Since we don't have any of those we will not implement this in full scope in this WL. 6.1.1 Idea about how to handle best_access_path() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make best_access_path() (and other functions as needed) return some flag to inform the join optimizer that it needs to "fork" the search at this point. The first best_access_path() call will ignore the presense of sj-nest. The second call (the one after fork) will pick the loose index scan access method. NOTE: Perhaps this could be used immediately for 1-table semi-joins. Here we'll have immediate cost difference due to use of loose index scan. 6.2 No accounting for functional deps when ordering --------------------------------------------------- We will not give any special treatment to cases where one inner table is functionally dependent on the other and that allows to simplify the required ordering (but not in a way that equality propagation would allow). (search above for "No accounting for functional deps when resolving the ordering" to see this in the context) 6.3 Not taking advantage of cross-join sj-nest ranges ----------------------------------------------------- In certain cases, a semi-join range (as "range in the join order") can contain a cross join, i.e. it will have tables ita_1, ... ita_K, itb_1, .. itb_n where the two collections are either from two different subqueries or they are from the same subquery that was a cross join (that is, it calculated a full cartesian product). It is possible to take advantage of the information about the cross join, but we will not do that within the scope of this WL entry. 6.4 Not passing info about distinct stream into MRR interface ------------------------------------------------------------- In certain cases, it is possible to use MRR interface flag to inform the table engine that the SQL layer is interested in only one record (or index tuple) from the (t.keypart1, ..., t.keypartN) value group. Table engines that batch or re-order index reads could benefit from this information. This will not be done within the scope of this WL entry.
1. Code-level work overview 2. InsideOut table scan detection in best_access_path() 2.1 Maintaning cur_emb_sj_nests 3. Plan refinement phase fixes 4. Join executioner adjustments 5. Milestones 1. Code-level work overview =========================== Implementation of InsideOut strategy will consist of: Join preparation phase - No work is required, as InsideOut will reuse the semi-join structures created by other semi-join WLs. We might need to do some (simple) preprocessing for work done in the other phases, like saving the separate bitmap of outer non-trivially correlated tables. Join optimizer - best_access_path() needs to be adjusted to use modified cost and #rows fanout values for "jumping" index scans. Plan refinement - setup_semijoin_dups_elimination() needs to be modified to include the code that will detect when InsideOut or a combination of InsideOut+FirstMatch can be used and make according query plan setups. Join executioner - Add code to skip over records that have duplicate value of (ie1, ... ieN). Other - EXPLAIN output should show "LooseScan(m..n)" when necessary. 2. InsideOut table scan detection in best_access_path() ======================================================= The main design consideration is to make the check in best_access_path() as lightweight as possible as join optimizer may invoke it lots of times. When a subquery is converted to semi-join its IN-equalities are injected into parent join's WHERE/ON clause. We will mark those equalities by setting their Item::name to a pre-defined const char* pointer. The 'ref' optimizer will detect the marked equalities and mark their corresponding KEYUSE elements. Then best_access_path() code will detect those and construct the InsideOut scan. Some definitions: for a subquery (oe1, ... oeN) IN (SELECT ie1, ... ieN ...) and given join order prefix JP: * an equality "oe_i=ie_i" is called "bound" if JP contains all tables needed to evaluate ie_i. * an equality "oe_i=ie_i" is called "handled" if the InsideOut scan will produce a distinct stream of tuples (ie_... , ie_i ,...). best_access_path() { ... in_equality_map bound_equalities= 0; in_equality_map handled_equalities= 0; key_part_map distinct_scan_on_this= 0; /* Discover the bound equalites. We need to do this, if 1. The next table is an SJ-inner table, and 2. It is the first table from that semijoin, and 3. We're not within a semi-join range (i.e. all semi-joins either have all or none of their tables in join_table_map). (See the subsequent section on how cur_emb_sj_nests is maintained) 3. All correlation references from this sj-nest are bound */ if (table->emb_sj_nest && // (1) !(table->emb_sj_nest->map & cur_table_map) && // (2) !cur_emb_sj_nests && // (3) bitmap_covers(cur_table_map, table->emb_sj_nest->corr_map)) // (4) { /* This table is an InsideOut scan candidate */ /* Walk over IN-equalites and see which are bound. This step has an additional merit over KEYUSE-based bind detection because it handles ie_i's that are not keyparts. */ for (each element e in emb_sj_nest->IN_equalities) { /* Q: should this take into account equality propagation and how? A: If e->outer_side is an Item_field, walk over the equality class and see if there is an element that is bound? (this is an optional feature) */ if (e->outer_side->used_tables()) { bound_equalities.set_bit(e's number); } } } ... while (...) /* best_access_path()'s KEYUSE scan loop */ { /* Start of handling the next index */ handled_equalities= 0; { ... if (keyuse->in_equality_num) { /* This KEYUSE represents an equality it.keyXpartY = outer_expr which was created from an IN-equality. */ if (bitmap_covers(cur_table_map, keyuse->val->used_tables()) bound_equalities.set_bit(keyuse->in_equality_num); else handled_equalities.set_bit(keyuse->in_equality_num); } ... } if (((handled_equalities | bound_equalities) cover all IN-equalities) && index access follows pattern of ^(constant_part|subq_used_part)*, subq_used_part+, anything-else) { InsideOut strategy is applicable, adujust index scan costs; } } ... } 2.1 Maintaning cur_emb_sj_nests ------------------------------- We will maintain a bitmap of semi-join nests that have only some of their tables in the join order prefix. on_table_add(tbl) { if (tbl->emb_sj_nest) { cur_emb_sj_nests |= tbl->emb_sj_nest->sj_inner_tables; // remove the sj_nest if all of its SJ-inner tables are in cur_table_map if (bitmap_covers(cur_table_map, tbl->emb_sj_nest->sj_inner_tables)) cur_emb_sj_nests &= ~tbl->emb_sj_nest->sj_inner_tables; } } on_table_remove(tbl) { if (tbl->emb_sj_nest) { // If we're removing the last SJ-inner table, remove the sj-nest if ((cur_table_map & tbl->emb_sj_nest->sj_inner_tables) == tbl->map) { cur_emb_sj_nests &=~tbl->emb_sj_nest->sj_inner_tables; } } } 3. Plan refinement phase fixes ============================== setup_semijoin_dups_elimination() should make choices according to the following rules: - InsideOut|FirstMatch should be preferred over DuplicateElimination (note that this is not always the optimal choice - in some cases using temptable + cheap scan may be cheaper than an expensive InsideOut scan (TODO: prove or refute this)) - There is no need to ever make a choice between InsideOut and FirstMatch strategies: in every case there is an optional "head" handled by InsideOut followed by the FirstMatch "tail" 4. Join executioner adjustments =============================== Consider a join execution swimlane diagram (partial copy from the HLS): it1 | it2 | itN | | ==== ot1.a=it1.a=c1 group ===== | | it1.rowX1 | it2.rowX | | it2.rowY | (1) | | it1.rowX2 | | | it2.rowY | | it2.rowZ | it2.rowX (2) (3) | | | | (1) here we return to table it1 without having found a matching row combination of SJ-inner tables, so we must continue scanning this value group. (2) we've found a match. When we return to location (3) we must skip over to the next value group. Implementing this functionality calls for use of a flag. The current proposal is that when we start scanning it1's group, we set itN->status= STATUS_GARBAGE when we receive a request to continue scanning it1, we check itN->status. If there is anything other than STATUS_GARBAGE, we continue. (TODO: need to check if this really works or we have to introduce a new, separate flag) 5. Milestones ============= TODO
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.