WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
Affects: Server-6.0
—
Status: Complete
This is a subtask of WL#2980 "Subquery optimization: Semijoin". It includes "Duplicate Elimination" semi-join execution strategy, where we use NL-join execution algorithm and a temporary table with unique key/constraint to weed out duplicate outer row combinations. This WL covers both adjustments needed for join optimizer and for join executioner. This WL is to be done after WL#3740. When this WL is complete we will have the minimal working semi-join subquery implementation (i.e. always correct results, no errors for any valid subqueries).
1. Description 2. Temptable use minimization 2.1 Functionally dependent tables 2.2 Prefix of the outer tables 2.3 Suffix of the uncorrelated outer tables 2.4 Combining prefix and suffix 2.5 Case of several semi-joins 2.5.1 On semi-joins "boundaries" 2.5.2 Temptable use in case of several semi-joins 3. Accounting for use of temptables in the join optimization 3.1 The details 4. Changes in the join executioner 5. EXPLAIN output 1. Description ============== The idea of the Duplicate Elimination strategy is as follows: Execute semi-join as a regular nested-loops join. We will get duplicate row combinations of outer tables. They can be identified by comparing rowids. We can use temporary table with unique constraint (or key) to weed out the duplicate combination of outer rows. Codewise, NL-join algorithm will need this adjustment: when we get a matching row combination: WEED-DUPS-OUT: rowid_tuple= {ot1.rowid, ... otN.rowid}; if (!temp_table.insert(rowid_tuple)) { pass row combination to output; } else { /* This is a duplicate */ return; } This strategy allows to handle several semi-joins at once, and it works for any join order. A note about outer joins: NULL-complemented rows do not naturally have rowids, so we'll need to assign them some $invalid_rowid value. Two rows with rowid=$invalid_rowid will be considered indentical. This will allow us to handle arbitrary nested outer joins. Also the subquery predicates could be located anywhere, either in the WHERE or in some nested join's ON clause (this is "intuitively clear" when one looks at outer joins algorithm but atm I have difficulty writing some kind of a proof). NOTE: Temporary table lookups become expensive when the table overflows to disk. In certain cases we could use a better approach: collect row combinations in a Unique object, and then scan the Unique and join it with the rest of the tables. This will not be implemented as a part of this WL entry (at the moment we don't plan this in other WL entries either as we beleive other subquery optimizations have more priority) 2. Temptable use minimization ============================= We can reduce the cost of temptable use. We can - make the unique key shorter - make the table contain fewer records (by emptying it in the course of join execution) - make fewer temptable lookups by moving WEED-DUPS-OUT call into the "middle" of the join. if we use one of the approaches described in the subsequent sections. 2.1 Functionally dependent tables --------------------------------- If an outer table is not within an outer join nest, and - is a constant/system table, or - is accessed via eq_ref(some-other-table) or, if we have a semi-join nest with not-exists optimization (i.e. only the NULL-complemented row combination will pass the WHERE clause) then we do not need to store its rowid in the temptable. This approach shows how this WL interplays with WL#3740. WL#3740 pulls the tables out semi-join nests. That action does not "make things worse" for duplicate elimination strategy - we will not need to add pulled-out tables' rowids into the temptable tuple. NOTE: if there are several tables that are in a circular functional dependency we can make a choice which of the tables will have rowid in the temptable. The choice will affect the cost of temp table use. We'll want to pick - table with shortest rowid (so the temptable tuple is smaller) - the table for which the E(partial join result cardinality) is the lowest (so we'll have fewer WEED-DUPS-OUT calls, see subsequent 2.x sections) We will not make any such optimizations and will always ue the first table in the join order. 2.2 Prefix of the outer tables ------------------------------ If the join order has a prefix of outer tables, then we can exclude the prefix tables from duplicate rows elimination. - Prefix table rowids don't need to be in the temptable. - We'll need to empty the temptable when transitioning from one prefix row combination to another. This is apparent from this diagram: ot1 | ... | otN | it1 | | | rowX | rowX | rowX | row1 | | | ... | | | rowN eof, go left (*) rowY | rowY | rowY | The point (*) partitions the time axis into two parts. The part above has (ot1,...otN) = (ot1.rowX,...,otN.rowX). NL-execution guarantee that this row combination won't be encountered anywhere below the point (*). At location (*) we will need to place this code: for every new prefix row combination: RESET-TMP-TABLE: tmp_table_handler->delete_all_rows(); 2.3 Suffix of the uncorrelated outer tables ------------------------------------------- Suppose we have a join order of oit1 ... oit1N nt1 ... ntK ^(*) where nt_i are the outer tables that are not referred to from within the subquery predicate. Suppose we're executing NL-join with this join order. At location (*) we will be getting duplicate row combinations. All of the tables that subquery predicate referred to are before the (*), so each duplicate row combination is a match, and consequently it doesn't matter which of the duplicates we pick. The "tail" after the (*) has only outer tables and therefore will not produce duplicates. Therefore we can place WEED-DUPS-OUT after the last subquery table or the last table that was referred from the subquery predicate. 2.4 Combining prefix and suffix ------------------------------- The two previous rules can be combined together. This gives us: SJ-TEMP-TABLE-USE: Given a join order and a semi-join nest S, we can find the first and last tables: first_tbl(S) := first inner table last_tbl(S) := the latest of the - last inner table - last correlated outer table The range between those two tables will need to be "covered" by a temptable: we'll need to - put RESET-TMP-TABLE before the first_tbl(S), - put WEED-DUPS-OUT after the last_tbl(S) Note: we can get a special case where there are no inner tables in the [first_tbl(S_i), last_tbl(S_i)) range. In that case, temporary table has zero-length tuple, which corresponds to FirstMatch strategy from WL#3750. 2.5 Case of several semi-joins ------------------------------ Our first observation is that the bounds between semi-joins are not really set in concrete: 2.5.1 On semi-joins "boundaries" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subqueries that can be converted to semi-joins can be merged together into one subquery that is a cross-join of the involved subqueries. For example, the query SELECT ... FROM ot1 ... otN WHERE (oeA1, ...) IN (SELECT ieA1,... FROM itA1, ... itAk) AND (oeB1, ...) IN (SELECT ieB1,... FROM ieB1, ... ieBk) AND subq_where can be converted to SELECT ... FROM ot1 ... otN WHERE (oeA1, ...oeB1, ...) IN (SELECT ieA1,... ieB1, ... FROM (itA1, ... itAk) inner join (ieB1, ... ieBk) ) AND subq_where; The reverse is also possible: if a subquery is (or can be converted to) a cross-join then it can be split into two. 2.5.2 Temptable use in case of several semi-joins ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The case of several semi-join nests is handled as follows: - For each semi-join nest S_i, find the interval [first_tbl(S_i), last_tbl(S_i)]. - A semi-join whose interval doesn't have interesection with any other is covered as described in SJ-TEMP-TABLE-USE. - If several semi-joins have intervals with non-zero intersection then we can think of them as one "interleaved" semi-join. For that semi-join, first_tbl(S_sum) = min(first_tbl(S_i)) last_tbl(S_sum) = max(last_tbl(S_i) ) and then it is covered as described in SJ-TEMP-TABLE-USE. (todo: saner wording) It is apparent that once every interval is "covered" (either on itself or together with others), then the duplicate outer row combinations will not be produced. 3. Accounting for use of temptables in the join optimization ============================================================ WEED-DUPS-OUT code "removes" the fanout produced by the subquery tables, so we'll need to adjust the size of the partial join output. We do not have a formula for the cost of using the temporary table. We could use the following hack: * Assume we'll have to write all temporary table contents to disk, using random disk seeks. We'll have to do this write the number of times we call RESET-TMP-TABLE. with the hack, the cost of temptable will be - directly proportional to its key length. - directly proportional to E(#rows in temp table) ( this is not true. there is O(#rows * log(#rows)) part. the cost will be taken into account "immediately", e.g. if we have a semi-join ot1, ot2, ot3 SJ (it1, it2) on sj_cond(ot1, ot2, ot3, it1, it2) and a partial join order of ot1 sj1 ot2 ^ then the cost of it will include cost of use of temptable with {ot2.rowid} key. 3.1 The details --------------- when we position a TABLE in the join order we can tell whether we've just put all of the semi-join tables into it. 4. Changes in the join executioner ================================== We'll need those changes in the join executioner: * Add code to create the temptables * Insert WEED-DUPS-OUT check * Insert RESET-TMP-TABLE check * Add code to handle the special case of zero-length temptable * Add code to clean up the temptables after parent join execution is finished (if the parent join is inside a subselect itself we will probably not want to create/delete temp. tables on each execution? or do we?) The details will be covered in the LLD. 5. EXPLAIN output ================= Temptable use shall be visible in EXPLAIN output. The "Extra" column will show * "Start temporary" for the table before which we have RESET-TMP-TABLE code. * "End temporary" for the table after which we have WEED-DUPS-OUT code. If the temporary table has zero-length row, i.e. we're actually using join shortcutting the "Extra" column will show: * "Shortcut to $table", where $table is what is in the "table" column for the table we'll shortcut to. EXPLAIN EXTENDED ...; SHOW WARNINGS will show the semi-join operation. The operation will be shown as ... outer_tables semi join (inner_tables) ... where inner_tables shows the tables that were not pulled out. Note that semi-join operation here lacks the ON condition. This is because semi-join's ON condition is merged into parent's WHERE. Q: is that OK? Or we should save and display the semi-join's ON? . TODO: check how this will work with BKA.
1. Data structures 2. Join optimizer adjustments 3. Setting temptable use 3.1 Rules for inclusion of table rowid into temptable 3.2 Finding out covered ranges 3.2.1 Data structure 3.2.2 Temporary table assignment 4. Temptable operations 3.1 Place to create/drop temporary table 3.2 Temptable properties 3.5 Getting table ROWIDs 3.5.1 prepare_for_position() for smart access methods 3.5.2 ROWIDs and join buffering 3.5.3 ROWIDs and filesort() 4. Join executioner changes 5. Schedule 6. Not scheduled functionality 6.1 Fanout adjustments 6.2 Temptable use cost 6.3 Join optimizer fixes 1. Data structures ================== WL#3740 code provides this WL with the following data structures: - The TABLE_LIST join nest tree contains SJ-nests. There may be several SJ-nests that contain or are contained within OJ-nests, but SJ-nests do not contain one another. - Some tables in SJ-nests are actually pulled out of them. WL#3740 code does calculate which tables are pulled out but does not save this information. We'll need to use this structure to - make adjustments to join cost/cardinality - setup temptable use, position the WEED-DUPS-OUT and RESET-TMP-TABLE calls. 2. Join optimizer adjustments ============================= In the scheduled functionality, join optimizer will execute unaffected by presense of SJ-nests. 3. Setting temptable use ======================== This is done at plan refinement stage, when the order of the tables is already known. Inclusion of table's rowid into temptable is governed by the following rules: 3.1 Rules for inclusion of table rowid into temptable ----------------------------------------------------- The rules do not depend on join order (*) and are as follows: 1. SJ-inner tables that were pulled are not included. 2. SJ-inner tables are not included. 3. SJ-outer OJ-outer const tables are not included, as there is one possible rowid value anyway. (the check is: table->map & const_table_map & !table->embedding) 4. Tables functionally dependent on other included tables are not included. This is harder to check because of circular dependencies. For now we will use this simple but incomplete criterion: If the table is accessed with eq_ref access dependent on sj-outer, constant, or include tables, then the table is not included. (i.e. (*) will actually be not true in this implementation. But that is temporary) It is possible to have a more comprehensive check but we put this off to the unscheduled part of this WL. 3.2 Finding out covered ranges ------------------------------ 3.2.1 Data structure ~~~~~~~~~~~~~~~~~~~~ class JOIN will have a list of SJ_TMP_TABLE stuctures. /* A temporary table that is used to weed out semi-join duplicates */ struct { uint end_idx; /* Array of pointers to tables that should be "used" */ JOIN_TAB **tables; } SJ_TMP_TABLE; 3.2.2 Temporary table assignment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We determine temporary table contents in one pass over the join order. The code for the pass is as follows: setup_oj_tables() { join->sj_tmp_tables.empty(); SJ_TMP_TABLE *cur_tmp_table= NULL; /* A bitmap of sj-nests we're currently in */ table_map emb_sj_map= 0; for (each join tab join_tab) { table= join_tab->table; if (emb_sj_map) { /* Check if the partial join order now contains all tables of all leaving coverage of all semi-joins. */ if (bitmap_covers(cur_map | table->bit, emb_sj_map)) { /* Ok, current table is the last table that was missing in the set of tables that should be in this temptable. - Save this temp table. */ if (included(join_tab) { cur_tmp_table.tables.push_back(table); cur_tmp_table.end_idx= join_tab; } join->push_back(cur_tmp_table); emb_sj_map= 0; cur_table= new SJ_TMP_TABLE(); /* The current table is either - an sj-inner table. A table is contained within only one sj-nest, that sj-nest is "served" by the temptable that we've just created, or - a correlated outer table. This table cannot be the first table of sj-nest's range. For both cases, we're certain that this table is not the 1st table of some sj-nest, and further steps in this loop are not needed. */ continue; } } /* If we're entering the coverage of another SJ-nest, record that we need all of its tables */ emb_sj_map |= get_emb_sj(table)->tables_map; if (included(join_tab)) { /* Record that this table is included in the temptable */ cur_tmp_table.end_idx= join_tab->idx; cur_tmp_table.tables.push_back(join_tab); } } } After the pass we have an array of SJ_TMP_TABLE structures that describe the needed temptables. 4. Temptable operations ======================= 3.1 Place to create/drop temporary table ---------------------------------------- Temp table is to be created before the join execution. If the join is in a (non-flattened) subselect, we should avoid re-creating the temp. table for every subselect execution (Q: does this matter? A: yes, since similar arrangements are already made for other optimizations AA: no, just save and reuse the info.) TODO: see what is done to other tmp tables. 3.2 Temptable properties ------------------------ There is no point in creating multi-part key. In fact, that is not always possible since max # of keyparts is 16 while max # tables in a join is 64. Therefore we will use the The table will have the following structure: CREATE TABLE t (`rowids` BINARY(N) NOT NULL, UNIQUE(`rowids`)); N is the length of the rowid tuple: N= null_bytes + SUM(rowid_length) null_bytes = ceil(#(tables inner wrt some outer join) / 8) The tuple format is [null-bytes] [rowid1] [rowid2] ... for $invalid_rowid rowid, the null bit is set and the rowid bytes a bzero'ed. 3.5 Getting table ROWIDs ------------------------ We'll need to be able to get rowids of tables that are "served" by a temptable. The general rule is that we'll need to call file->prepare_for_position(); for all "served" tables. This doesn't work in all cases, the exceptions are covered in the subsequent sections. 3.5.1 prepare_for_position() for smart access methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We'll need some some other solution for index_merge/DS-MRR and similar methods which clone/reinitialize the table handlers during the scan. This will be done as follows: First, introduce bool st_table::save_rowid; and have post- set it accordingly. Then call file->prepare_for_position(), no matter which access method we're using. This will serve the "simple" access methods. "Smart" access methods (the two known are DS-MRR and index_merge) will check st_table::need_rowid themselves. Note: checking "if this is a null-complemented row" is easy: just look at st_table::maybe_null. 3.5.2 ROWIDs and join buffering ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Join buffering accumulates table rows in a buffer and then iterates over the buffer. When we iterate over the buffer, calling handler->position() will not return the rowid of current row in the buffer. If we use join buffering within the [RESET-TMP-TABLE; WEED-DUPS-OUT] range, we will have to store {table_row, rowid} sequence in the buffer. If several tables use join buffers, we will need to arrange so that rowids are copied from one buffer to another. (psergey-todo: will this happen automatically if we set some 'length' member? check it.) 3.5.3 ROWIDs and filesort() ~~~~~~~~~~~~~~~~~~~~~~~~~~~ filesort() makes rowids unavailable just like join buffering does. Current code may perform filesort: 1. After the 1st non-const table 2. After the last table (we intend to perform filesort at arbitrary location in the join order but that is not done yet) In its current state, filesort will not make any rowids unavailable. In case #1, the 1st table is the outer table, because parent select can't have "ORDER BY subquery_table.col" (psergey-todo: check if this can be "arranged" by equality propagation?) In case #2 there is no problem: WEED-DUPS-OUT should be placed before the filesort, not after. Note when/if we need handle rowid loss in filesort, the solution is same as with join buffering: save the rowids as "addon" fields. 4. Join executioner changes =========================== 5. Schedule =========== START: 19.04 midday T1: Code the temp table detection 8 hrs MS1: EXPLAINs now work as specified, SELECTs produce 20.04 midday incorrect results with extra row combinations. T2: Implement temp. table creation/deletion 16 hrs MS2: SELECTs that use "simple" access methods (ALL, index, and ref, w/o use of join buffering) now produce correct results 25.04 midday T3: Handle rowid saving for range/index_merge access methods 8 hrs MS3: Queries with DS-MRR-range and index_merge produce correct 26.04 midday results now. T4: Handle join buffering. 16 hrs MS4: Now all queries produce correct results. 28.04 evening T5: Fix some unexpected problems 16 hrs MS5: Now all queries produce correct results. 30.04 evening 6. Not scheduled functionality ============================== This will not be implemented. It will either be implemented later or superceded by code in other WLs. 6.1 Fanout adjustments ---------------------- WEED-DUPS-OUT code removes all of the fanout produced by the tables within the semi-join nest. This should be taken into account in prev_record_reads(). 6.2 Temptable use cost ---------------------- We assume we read and write this much: bytes_to_write = fanout(start_tbl) * // Cycles of temp table use SUM(rowid_length) * // temp. table row length fanount(end_tbl) / fanout (start_tbl) // #rows in temp table i.e. use_cost = cost_of_read_and_write( SUM(rowid_length) * fanout(end_tbl) ); where cost_of_read_and_write() assumes random disk seeks (4K = 1 seek) Q: is this adequate or we need to come up with a better model that accounts for index lookups, etc? 6.3 Join optimizer fixes ------------------------ Before the optimization: Introduce this variable: /* Pointer to embedding sj-nest, or NULL if the table is not in a sj-nest */ TABLE_LIST *JOIN_TAB::emb_sj_nest and set it for every table in the join. prev_record_reads(): * Let us accumulate a bitmap of tables that are in the partial join order and are within a nested join nest. * Let prev_record_reads() ignore these tables when calculating the partial join result size. best_access_path(): In order to account for cost of temptable use, we'll need to track the use of temp table while we're in the join optimizer.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.