MySQL 8.4.2
Source Code Documentation
|
Namespaces | |
namespace | anonymous_namespace{sql_planner.cc} |
Functions | |
static double | prev_record_reads (JOIN *join, uint idx, table_map found_ref) |
static void | trace_plan_prefix (JOIN *join, uint idx, table_map excluded_tables) |
Helper function to write the current plan's prefix to the optimizer trace. More... | |
static uint | max_part_bit (key_part_map bits) |
static uint | cache_record_length (JOIN *join, uint idx) |
double | find_cost_for_ref (const THD *thd, TABLE *table, unsigned keyno, double num_rows, double worst_seeks) |
Find the cost for a ref lookup on the given index, assumed to return “num_rows” rows. More... | |
float | calculate_condition_filter (const JOIN_TAB *const tab, const Key_use *const keyuse, table_map used_tables, double fanout, bool is_join_buffering, bool write_to_trace, Opt_trace_object &parent_trace) |
Calculate 'Post read filtering' effect of JOIN::conds for table 'tab'. More... | |
static ulonglong | get_bound_sj_equalities (const JOIN_TAB *tab, table_map not_available_tables) |
Returns a bitmap of bound semi-join equalities. More... | |
static int | semijoin_order_allows_materialization (const JOIN *join, table_map remaining_tables, const JOIN_TAB *tab, uint idx) |
Check whether a semijoin materialization strategy is allowed for the current (semi)join table order. More... | |
void | get_partial_join_cost (JOIN *join, uint n_tables, double *cost_arg, double *rowcount_arg) |
Calculate a cost of given partial join order. More... | |
static bool | almost_equal (double left, double right) |
Helper function that compares two doubles and accept these as "almost equal" if they are within 10 percent of each other. More... | |
Optimize_table_order::Optimize_table_order (THD *thd_arg, JOIN *join_arg, Table_ref *sjm_nest_arg) | |
Key_use * | Optimize_table_order::find_best_ref (const JOIN_TAB *tab, const table_map remaining_tables, const uint idx, const double prefix_rowcount, bool *found_condition, table_map *ref_depends_map, uint *used_key_parts) |
Find the best index to do 'ref' access on for a table. More... | |
double | Optimize_table_order::calculate_scan_cost (const JOIN_TAB *tab, const uint idx, const Key_use *best_ref, const double prefix_rowcount, const bool found_condition, const bool disable_jbuf, double *rows_after_filtering, Opt_trace_object *trace_access_scan) |
Calculate the cost of range/table/index scanning table 'tab'. More... | |
double | Optimize_table_order::lateral_derived_cost (const JOIN_TAB *tab, const uint idx, const double prefix_rowcount, const Cost_model_server *cost_model) |
If table is a lateral derived table, calculates the "cost of
materialization", which is the cost of a single materialization (available in the DT's underlying JOIN final plan) multiplied by the number of rows output by the last-in-plan table which DT references (available in a POSITION structure). More... | |
void | Optimize_table_order::best_access_path (JOIN_TAB *tab, const table_map remaining_tables, const uint idx, bool disable_jbuf, const double prefix_rowcount, POSITION *pos) |
Find the best access path for an extension of a partial execution plan and add this path to the plan. More... | |
bool | Optimize_table_order::semijoin_loosescan_fill_driving_table_position (const JOIN_TAB *s, table_map remaining_tables, uint idx, double prefix_rowcount, POSITION *loose_scan_pos) |
Fills a POSITION object of the driving table of a semi-join LooseScan range, with the cheapest access path. More... | |
bool | Join_tab_compare_default::operator() (const JOIN_TAB *jt1, const JOIN_TAB *jt2) const |
"Less than" comparison function object used to compare two JOIN_TAB objects based on a number of factors in this order: More... | |
bool | Optimize_table_order::choose_table_order () |
Entry point to table join order optimization. More... | |
static uint | Optimize_table_order::determine_search_depth (uint search_depth, uint table_count) |
Heuristic procedure to automatically guess a reasonable degree of exhaustiveness for the greedy search procedure. More... | |
void | Optimize_table_order::optimize_straight_join (table_map join_tables) |
Select the best ways to access the tables in a query without reordering them. More... | |
bool | Optimize_table_order::greedy_search (table_map remaining_tables) |
Find a good, possibly optimal, query execution plan (QEP) by a greedy search. More... | |
bool | Optimize_table_order::consider_plan (uint idx, Opt_trace_object *trace_obj) |
Cost calculation of another (partial-)QEP has been completed. More... | |
bool | Optimize_table_order::best_extension_by_limited_search (table_map remaining_tables, uint idx, uint current_search_depth) |
Find a good, possibly optimal, query execution plan (QEP) by a possibly exhaustive search. More... | |
table_map | Optimize_table_order::eq_ref_extension_by_limited_search (table_map remaining_tables, uint idx, uint current_search_depth) |
Heuristic utility used by best_extension_by_limited_search(). More... | |
bool | Optimize_table_order::fix_semijoin_strategies () |
Fix semi-join strategies for the picked join order. More... | |
bool | Optimize_table_order::check_interleaving_with_nj (JOIN_TAB *next_tab) |
Check interleaving with an inner tables of an outer join for extension table. More... | |
bool | Optimize_table_order::semijoin_firstmatch_loosescan_access_paths (uint first_tab, uint last_tab, table_map remaining_tables, bool loosescan, double *newcount, double *newcost) |
Find best access paths for semi-join FirstMatch or LooseScan strategy and calculate rowcount and cost based on these. More... | |
void | Optimize_table_order::semijoin_mat_scan_access_paths (uint last_inner_tab, uint last_outer_tab, table_map remaining_tables, Table_ref *sjm_nest, double *newcount, double *newcost) |
Find best access paths for semi-join MaterializeScan strategy and calculate rowcount and cost based on these. More... | |
void | Optimize_table_order::semijoin_mat_lookup_access_paths (uint last_inner, Table_ref *sjm_nest, double *newcount, double *newcost) |
Find best access paths for semi-join MaterializeLookup strategy. More... | |
void | Optimize_table_order::semijoin_dupsweedout_access_paths (uint first_tab, uint last_tab, double *newcount, double *newcost) |
Find best access paths for semi-join DuplicateWeedout strategy and calculate rowcount and cost based on these. More... | |
void | Optimize_table_order::advance_sj_state (table_map remaining_tables, const JOIN_TAB *tab, uint idx) |
Do semi-join optimization step after we've added a new tab to join prefix. More... | |
void | Optimize_table_order::backout_nj_state (const table_map remaining_tables, const JOIN_TAB *tab) |
Nested joins perspective: Remove the last table from the join order. More... | |
table_map | Optimize_table_order::calculate_lateral_deps_of_final_plan (uint tab_no) const |
Calculate the lateral dependencies of the suffix of JOIN_TABs from tab_no to join->tables-1 in the final join plan, i.e. More... | |
void | Optimize_table_order::recalculate_lateral_deps (uint first_tab_no) |
Set join->deps_of_remaining_lateral_derived_tables to the set of lateral dependencies of the tables in the suffix of the join plan from 'tab_no' and on. More... | |
void | Optimize_table_order::recalculate_lateral_deps_incrementally (uint first_tab_no) |
Update join->deps_of_remaining_lateral_derived_tables after adding JOIN_TAB first_tab_no-1 to the plan. More... | |
bool | Optimize_table_order::plan_has_duplicate_tabs () const |
Check if any Table_ref appears twice in the plan (which is an error). More... | |
Variables | |
static constexpr const ha_rows | MATCHING_ROWS_IN_OTHER_TABLE {10} |
Number of rows in a reference table when refereed through a not unique key. More... | |
Optimize_table_order::Optimize_table_order | ( | THD * | thd_arg, |
JOIN * | join_arg, | ||
Table_ref * | sjm_nest_arg | ||
) |
|
private |
Do semi-join optimization step after we've added a new tab to join prefix.
This function cannot work with nested SJ nests, for two reasons: (a) QEP_TAB::emb_sj_nest points to the most inner SJ nest, and this function looks only at it, so misses to do any SJ strategy choice for outer nests (b) POSITION has only one set of SJ-info (e.g. first_firstmatch_table): so planning for two nested nests would require more info than we have. And indeed, SJ nests cannot be nested, because: (c) a SJ nest is not nested in another SJ or anti SJ nest (it would have been dissolved into the outer nest by simplify_joins()). (d) an anti SJ nest is not nested inside another SJ or anti SJ nest (this case is blocked during resolving.
remaining_tables | Tables not in the join prefix |
new_join_tab | Join tab that we are adding to the join prefix |
idx | Index in join->position storing this join tab (i.e. number of tables in the prefix) |
Update semi-join optimization state after we've added another tab (table and access method) to the join prefix.
The state is maintained in join->positions[prefix_size]. Each of the available strategies has its own state variables.
for each semi-join strategy { update strategy's state variables;
if (join prefix has all the tables that are needed to consider using this strategy for the semi-join(s)) { calculate cost of using the strategy if ((this is the first strategy to handle the semi-join nest(s) || the cost is less than other strategies)) { Pick this strategy pos->sj_strategy= .. .. } } }
Most of the new state is saved in join->positions[idx] (and hence no undo is necessary).
See setup_semijoin_dups_elimination() for a description of what kinds of join prefixes each strategy can handle.
A note on access path, rowcount and cost estimates:
|
inlinestatic |
Helper function that compares two doubles and accept these as "almost equal" if they are within 10 percent of each other.
Handling of exact 0.0 values: if one of the values are exactly 0.0, the other value must also be exactly 0.0 to be considered to be equal.
left | First double number to compare |
right | Second double number to compare |
|
private |
Nested joins perspective: Remove the last table from the join order.
Remove the last table from the partial join order and update the nested joins counters and cur_embedding_map. It is ok to call this function for the first table in join order (for which check_interleaving_with_nj has not been called)
This function rolls back changes done by:
The algorithm is the reciprocal of check_interleaving_with_nj(), hence parent join nest nodes are updated only when the last table in its child node is removed. The ASCII graphic below will clarify.
A table nesting such as t1 x [ ( t2 x t3 ) x ( t4 x t5 ) ]
is represented by the below join nest tree.
NJ1 _/ / \ _/ / NJ2 _/ / / \ / / / \ t1 x [ (t2 x t3) x (t4 x t5) ]
At the point in time when check_interleaving_with_nj() adds the table t5 to the query execution plan, QEP, it also directs the node named NJ2 to mark the table as covered. NJ2 does so by incrementing its counter
member. Since all of NJ2's tables are now covered by the QEP, the algorithm proceeds up the tree to NJ1, incrementing its counter as well. All join nests are now completely covered by the QEP.
backout_nj_state() does the above in reverse. As seen above, the node NJ1 contains the nodes t2, t3, and NJ2. Its counter being equal to 3 means that the plan covers t2, t3, and NJ2, and that the sub-plan (t4 x t5) completely covers NJ2. The removal of t5 from the partial plan will first decrement NJ2's counter to 1. It will then detect that NJ2 went from being completely to partially covered, and hence the algorithm must continue upwards to NJ1 and decrement its counter to 2. A subsequent removal of t4 will however not influence NJ1 since it did not un-cover the last table in NJ2.
remaining_tables | remaining tables to optimize, must contain 'tab' |
tab | join table to remove, assumed to be the last in current partial join order. |
|
private |
Find the best access path for an extension of a partial execution plan and add this path to the plan.
The function finds the best access path to table 'tab' from the passed partial plan where an access path is the general term for any means to access the data in 'tab'. An access path may use either an index scan, a table scan, a range scan or ref access, whichever is cheaper. The input partial plan is passed via the array 'join->positions' of length 'idx'. The chosen access method for 'tab' and its cost is stored in 'join->positions[idx]'.
tab | the table to be joined by the function | |
remaining_tables | set of tables not included in the partial plan yet. | |
idx | the index in join->position[] where 'tab' is added to the partial plan. | |
disable_jbuf | true<=> Don't use join buffering | |
prefix_rowcount | estimate for the number of records returned by the partial plan | |
[out] | pos | Table access plan |
|
private |
Find a good, possibly optimal, query execution plan (QEP) by a possibly exhaustive search.
The procedure searches for the optimal ordering of the query tables in set 'remaining_tables' of size N, and the corresponding optimal access paths to each table. The choice of a table order and an access path for each table constitutes a query execution plan (QEP) that fully specifies how to execute the query.
The maximal size of the found plan is controlled by the parameter 'search_depth'. When search_depth == N, the resulting plan is complete and can be used directly as a QEP. If search_depth < N, the found plan consists of only some of the query tables. Such "partial" optimal plans are useful only as input to query optimization procedures, and cannot be used directly to execute a query.
The algorithm begins with an empty partial plan stored in 'join->positions' and a set of N tables - 'remaining_tables'. Each step of the algorithm evaluates the cost of the partial plan extended by all access plans for each of the relations in 'remaining_tables', expands the current partial plan with the access plan that results in lowest cost of the expanded partial plan, and removes the corresponding relation from 'remaining_tables'. The algorithm continues until it either constructs a complete optimal plan, or constructs an optimal plartial plan with size = search_depth.
The final optimal plan is stored in 'join->best_positions'. The corresponding cost of the optimal plan is in 'join->best_read'.
best_extension_by_limited_search()
and eq_ref_extension_by_limited_search()
are closely related to each other and intentionally implemented using the same pattern wherever possible. If a change/bug fix is done to either of these also consider if it is relevant for the other.remaining_tables | set of tables not included into the partial plan yet |
idx | length of the partial QEP in 'join->positions'; since a depth-first search is used, also corresponds to the current depth of the search tree; also an index in the array 'join->best_ref'; |
current_search_depth | maximum depth of recursion and thus size of the found optimal plan (0 < current_search_depth <= join->tables+1). |
|
static |
float calculate_condition_filter | ( | const JOIN_TAB *const | tab, |
const Key_use *const | keyuse, | ||
table_map | used_tables, | ||
double | fanout, | ||
bool | is_join_buffering, | ||
bool | write_to_trace, | ||
Opt_trace_object & | parent_trace | ||
) |
Calculate 'Post read filtering' effect of JOIN::conds for table 'tab'.
Only conditions that are not directly involved in the chosen access method shall be included in the calculation of this 'Post read filtering' effect.
The function first identifies fields that are directly used by the access method. This includes columns used by range and ref access types, and predicates on the identified columns (if any) will not be taken into account when the filtering effect is calculated.
The function will then calculate the filtering effect of any predicate that applies to 'tab' and is not depending on the columns used by the access method. The source of information with highest accuracy is always preferred and is as follows: 1) Row estimates from the range optimizer 2) Row estimates from index statistics (records per key) 3) Guesstimates
Thus, after identifying columns that are used by the access method, the function will look for rows estimates made by the range optimizer. If found, the estimates from the range optimizer are calculated into the filtering effect.
The function then goes through JOIN::conds to get estimates from any remaining predicate that applies to 'tab' and does not depend on any tables that come later in the join sequence. Predicates that depend on columns that are either used by the access method or used in the row estimate from the range optimizer will not be considered in this phase.
tab | The table condition filtering effect is calculated for |
keyuse | Describes the 'ref' access method (if any) that is chosen |
used_tables | Tables earlier in the join sequence than 'tab' |
fanout | The number of rows read by the chosen access method for each row combination of previous tables |
is_join_buffering | Whether or not condition filtering is about to be calculated for an access method using join buffering. |
write_to_trace | Whether we should print the filtering effect calculated by histogram statistics and the final aggregated filtering effect to optimizer trace. |
parent_trace | The parent trace object where the final aggregated filtering effect will be printed if "write_to_trace" is set to true. |
|
private |
Calculate the lateral dependencies of the suffix of JOIN_TABs from tab_no to join->tables-1 in the final join plan, i.e.
the plan contained in join->best_positions. This function is only called from asserts that verify the values cached in join->best_positions[table_no].m_suffix_lateral_deps.
tab_no | index of the first table in the suffix for which we calculate the dependencies. |
|
private |
Calculate the cost of range/table/index scanning table 'tab'.
Returns a hybrid scan cost number: the cost of fetching rows from the storage engine plus CPU cost during execution for evaluating the rows (estimate) that will be filtered out by predicates relevant to the table. The cost does not include the CPU cost during execution for rows that are not filtered out.
This hybrid cost is needed because if join buffering is used to reduce the number of scans, then the final cost depends on how many times the join buffer had to be filled.
tab | the table to be joined by the function |
idx | the index in join->position[] where 'tab' is added to the partial plan. |
best_ref | description of the best ref access method for 'tab' |
prefix_rowcount | estimate for the number of records returned by the partial plan |
found_condition | whether or not there exists a condition that filters away rows for this table. |
disable_jbuf | don't use join buffering if true | |
[out] | rows_after_filtering | fanout of the access method after taking condition filtering into account |
trace_access_scan | The optimizer trace object info is appended to |
|
private |
Check interleaving with an inner tables of an outer join for extension table.
Check if table tab can be added to current partial join order, and if yes, record that it has been added. This recording can be rolled back with backout_nj_state().
The function assumes that both current partial join order and its extension with tab are valid wrt table dependencies.
IMPLEMENTATION LIMITATIONS ON JOIN ORDER The nested [outer] joins executioner algorithm imposes these limitations on join order: 1. "Outer tables first" - any "outer" table must be before any corresponding "inner" table. 2. "No interleaving" - tables inside a nested join must form a continuous sequence in join order (i.e. the sequence must not be interrupted by tables that are outside of this nested join). #1 is checked elsewhere, this function checks #2 provided that #1 has been already checked. WHY NEED NON-INTERLEAVING Consider an example: select * from t0 join t1 left join (t2 join t3) on cond1 The join order "t1 t2 t0 t3" is invalid: table t0 is outside of the nested join, so WHERE condition for t0 is attached directly to t0 (without triggers, and it may be used to access t0). Applying WHERE(t0) to (t2,t0,t3) record is invalid as we may miss combinations of (t1, t2, t3) that satisfy condition cond1, and produce a null-complemented (t1, t2.NULLs, t3.NULLs) row, which should not have been produced. If table t0 is not between t2 and t3, the problem doesn't exist: If t0 is located after (t2,t3), WHERE(t0) is applied after nested join processing has finished. If t0 is located before (t2,t3), predicates like WHERE_cond(t0, t2) are wrapped into condition triggers, which takes care of correct nested join processing. HOW IT IS IMPLEMENTED The limitations on join order can be rephrased as follows: for valid join order one must be able to: 1. write down the used tables in the join order on one line. 2. for each nested join, put one '(' and one ')' on the said line 3. write "LEFT JOIN" and "ON (...)" where appropriate 4. get a query equivalent to the query we're trying to execute. Calls to check_interleaving_with_nj() are equivalent to writing the above described line from left to right. A single check_interleaving_with_nj(A,B) call is equivalent to writing table B and appropriate brackets on condition that table A and appropriate brackets is the last what was written. Graphically the transition is as follows: +---- current position | ... last_tab ))) | ( tab ) )..) | ... X Y Z | +- need to move to this position. Notes about the position: The caller guarantees that there is no more then one X-bracket by checking "!(remaining_tables & s->dependent)" before calling this function. X-bracket may have a pair in Y-bracket. When "writing" we store/update this auxiliary info about the current position: 1. cur_embedding_map - bitmap of pairs of brackets (aka nested joins) we've opened but didn't close. 2. {each NESTED_JOIN structure not simplified away}->counter - number of this nested join's children that have already been added to to the partial join order.
tab | Table we're going to extend the current partial join with |
false | Join order extended, nested joins info about current join order (see NOTE section) updated. |
true | Requested join order extension not allowed. |
bool Optimize_table_order::choose_table_order | ( | ) |
Entry point to table join order optimization.
Selects and invokes a search strategy for an optimal query join order.
For further description, see class header and private function headers.
The function checks user-configurable parameters that control the search strategy for an optimal plan, selects the search method and then invokes it. Each specific optimization procedure stores the final optimal plan in the array 'join->best_positions', and the cost of the plan in 'join->best_read'. The function can be invoked to produce a plan for all tables in the query (in this case, the const tables are usually filtered out), or it can be invoked to produce a plan for a materialization of a semijoin nest. Set a non-NULL emb_sjm_nest pointer when producing a plan for a semijoin nest to be materialized and a NULL pointer when producing a full query plan.
< The tables involved in order selection
|
private |
Cost calculation of another (partial-)QEP has been completed.
If this is our 'best' plan explored so far, we record this query plan and its cost.
idx | length of the partial QEP in 'join->positions'; also corresponds to the current depth of the search tree; also an index in the array 'join->best_ref'; |
trace_obj | trace object where information is to be added |
|
staticprivate |
Heuristic procedure to automatically guess a reasonable degree of exhaustiveness for the greedy search procedure.
The procedure estimates the optimization time and selects a search depth big enough to result in a near-optimal QEP, that doesn't take too long to find. If the number of tables in the query exceeds some constant, then search_depth is set to this constant.
search_depth | Search depth value specified. If zero, calculate a default value. |
table_count | Number of tables to be optimized (excludes const tables) |
|
private |
Heuristic utility used by best_extension_by_limited_search().
Adds EQ_REF-joined tables to the partial plan without extensive 'greedy' cost calculation.
When a table is joined by an unique key there is a 1::1 relation between the rows being joined. Assuming we have multiple such 1::1 (star-)joined relations in a sequence, without other join types in between. Then all of these 'eq_ref-joins' will be estimated to return the exact same number of rows and having identical 'cost' (or 'read_time').
This leads to that we can append such a contiguous sequence of eq_ref-joins to a partial plan in any order without affecting the total cost of the query plan. Exploring the different permutations of these eq_refs in the 'greedy' optimizations will simply be a waste of precious CPU cycles.
Once we have appended a single eq_ref-join to a partial plan, we may use eq_ref_extension_by_limited_search() to search 'remaining_tables' for more eq_refs which will form a contiguous set of eq_refs in the QEP.
Effectively, this chain of eq_refs will be handled as a single entity wrt. the full 'greedy' exploration of the possible join plans. This will reduce the 'N' in the O(N!) complexity of the full greedy search.
The algorithm start by already having a eq_ref joined table in position[idx-1] when called. It then search for more eq_ref-joinable 'remaining_tables' which are added directly to the partial QEP without further cost analysis. The algorithm continues until it either has constructed a complete plan, constructed a partial plan with size = search_depth, or could not find more eq_refs to append.
In the later case the algorithm continues into 'best_extension_by_limited_search' which does a 'greedy' search for the next table to add - Possibly with later eq_ref_extensions.
The final optimal plan is stored in 'join->best_positions'. The corresponding cost of the optimal plan is in 'join->best_read'.
best_extension_by_limited_search()
and eq_ref_extension_by_limited_search()
are closely related to each other and intentionally implemented using the same pattern wherever possible. If a change/bug fix is done to either of these also consider if it is relevant for the other.remaining_tables | set of tables not included into the partial plan yet |
idx | length of the partial QEP in 'join->positions'; since a depth-first search is used, also corresponds to the current depth of the search tree; also an index in the array 'join->best_ref'; |
current_search_depth | maximum depth of recursion and thus size of the found optimal plan (0 < current_search_depth <= join->tables+1). |
'table_map' | Map of those tables appended to the EQ_REF-joined sequence |
~(table_map)0 | Fatal error |
|
inlineprivate |
Find the best index to do 'ref' access on for a table.
The best index chosen using the following priority list 1) A clustered primary key with equality predicates on all keyparts is always chosen. 2) A non nullable unique index with equality predicates on all keyparts is preferred over a non-unique index, nullable unique index or unique index where there are some keyparts without equality predicates. 3) Otherwise, the index with best cost estimate is chosen.
As a side-effect, bound_keyparts/read_cost/fanout is set for the first Key_use of every considered key.
tab | the table to be joined by the function | |
remaining_tables | set of tables not included in the partial plan yet. | |
idx | the index in join->position[] where 'tab' is added to the partial plan. | |
prefix_rowcount | estimate for the number of records returned by the partial plan | |
[out] | found_condition | whether or not there exists a condition that filters away rows for this table. Always true when the function finds a usable 'ref' access, but also if it finds a condition that is not usable by 'ref' access, e.g. is there is an index covering (a,b) and there is a condition only on 'b'. Note that all dependent tables for the condition in question must be in the plan prefix for this to be 'true'. Unmodified if no relevant condition is found. |
[out] | ref_depend_map | tables the best ref access depends on. Unmodified if no 'ref' access is found. |
[out] | used_key_parts | Number of keyparts 'ref' access uses. Unmodified if no 'ref' access is found. |
double find_cost_for_ref | ( | const THD * | thd, |
TABLE * | table, | ||
unsigned | keyno, | ||
double | num_rows, | ||
double | worst_seeks | ||
) |
Find the cost for a ref lookup on the given index, assumed to return “num_rows” rows.
The cost will be capped by “worst_seeks” (see find_worst_seeks()).
|
private |
Fix semi-join strategies for the picked join order.
Fix semi-join strategies for the picked join order. This is a step that needs to be done right after we have fixed the join order. What we do here is switch join's semi-join strategy description from backward-based to forwards based.
When join optimization is in progress, we re-consider semi-join strategies after we've added another table. Here's an illustration. Suppose the join optimization is underway:
1) ot1 it1 it2 sjX – looking at (ot1, it1, it2) join prefix, we decide to use semi-join strategy sjX.
2) ot1 it1 it2 ot2 sjX sjY – Having added table ot2, we now may consider another semi-join strategy and decide to use a different strategy sjY. Note that the record of sjX has remained under it2. That is necessary because we need to be able to get back to (ot1, it1, it2) join prefix. what makes things even worse is that there are cases where the choice of sjY changes the way we should access it2.
3) [ot1 it1 it2 ot2 ot3] sjX sjY – This means that after join optimization is finished, semi-join info should be read right-to-left (while nearly all plan refinement functions, EXPLAIN, etc proceed from left to right)
This function does the needed reversal, making it possible to read the join and semi-join order from left to right.
|
static |
Returns a bitmap of bound semi-join equalities.
If we consider (oe1, .. oeN) IN (SELECT ie1, .. ieN) then ieK=oeK is called sj-equality. If ieK or oeK depends only on tables available before 'tab' in this plan, then such equality is called "bound".
tab | table |
not_available_tables | bitmap of not-available tables. |
void get_partial_join_cost | ( | JOIN * | join, |
uint | n_tables, | ||
double * | cost_arg, | ||
double * | rowcount_arg | ||
) |
Calculate a cost of given partial join order.
join | Join to use. positions holds the partial join order | |
n_tables | Number of tables in the partial join order | |
[out] | cost_arg | Store read time here |
[out] | rowcount_arg | Store record count here |
This is needed for semi-join materialization code. The idea is that we detect sj-materialization after we've put all sj-inner tables into the join prefix
prefix-tables semi-join-inner-tables tN ^–we're here
and we'll need to get the cost of prefix-tables prefix again.
|
private |
Find a good, possibly optimal, query execution plan (QEP) by a greedy search.
The search procedure uses a hybrid greedy/exhaustive search with controlled exhaustiveness. The search is performed in N = card(remaining_tables) steps. Each step evaluates how promising is each of the unoptimized tables, selects the most promising table, and extends the current partial QEP with that table. Currently the most 'promising' table is the one with least expensive extension.\
There are two extreme cases:
All other cases are in-between these two extremes. Thus the parameter 'search_depth' controls the exhaustiveness of the search. The higher the value, the longer the optimizaton time and possibly the better the resulting plan. The lower the value, the fewer alternative plans are estimated, but the more likely to get a bad QEP.
All intermediate and final results of the procedure are stored in 'join':
The final optimal plan is stored in 'join->best_positions', and its corresponding cost in 'join->best_read'.
where 'best_extension' is a placeholder for a procedure that selects the most "promising" of all tables in 'remaining_tables'. Currently this estimate is performed by calling 'best_extension_by_limited_search' to evaluate all extensions of the current QEP of size 'search_depth', thus the complexity of 'greedy_search' mainly depends on that of 'best_extension_by_limited_search'.
search_depth
from Optimize_table_order controls the exhaustiveness of the search, and prune_level
controls the pruning heuristics that should be applied during search.remaining_tables | set of tables not included into the partial plan yet |
|
private |
If table is a lateral derived table, calculates the "cost of materialization", which is the cost of a single materialization (available in the DT's underlying JOIN final plan) multiplied by the number of rows output by the last-in-plan table which DT references (available in a POSITION structure).
For example if plan is t1 (outputs 2 rows) - t2 (outputs 20 rows) - dt and dt's definition references only t1, we multiply by 2, not by 20. This cost is divided by the number of times the DT will be read (20, here), to provide a number which best_access_path() can add to best_read_cost.
|
static |
"Less than" comparison function object used to compare two JOIN_TAB objects based on a number of factors in this order:
jt1 | first JOIN_TAB object |
jt2 | second JOIN_TAB object |
a: dependent = <none> found_records = 3 b: dependent = <none> found_records = 4 c: dependent = b found_records = 2
a < b: because a has fewer records b < c: because c depends on b (e.g outer join dependency) c < a: because c has fewer records
This implies that the result of a sort using the relation implemented by Join_tab_compare_default () depends on the order in which elements are compared, i.e. the result is implementation-specific.
|
private |
Select the best ways to access the tables in a query without reordering them.
Find the best access paths for each query table and compute their costs according to their order in the array 'join->best_ref' (thus without reordering the join tables). The function calls sequentially 'best_access_path' for each table in the query to select the best table access method. The final optimal plan is stored in the array 'join->best_positions', and the corresponding cost in 'join->best_read'.
join_tables | set of the tables in the query |
If many plans have identical cost, which one will be used depends on how compiler optimizes floating-point calculations. this fix adds repeatability to the optimizer. (Similar code in best_extension_by_li...)
|
private |
Check if any Table_ref appears twice in the plan (which is an error).
void Optimize_table_order::recalculate_lateral_deps | ( | uint | first_tab_no | ) |
Set join->deps_of_remaining_lateral_derived_tables to the set of lateral dependencies of the tables in the suffix of the join plan from 'tab_no' and on.
first_tab_no | index (in the join order) of the first JOIN_TAB in the suffix. |
void Optimize_table_order::recalculate_lateral_deps_incrementally | ( | uint | first_tab_no | ) |
Update join->deps_of_remaining_lateral_derived_tables after adding JOIN_TAB first_tab_no-1 to the plan.
Precondition: deps_of_remaining_lateral_derived_tables must contain the dependencies of the plan suffix from first_tab_no-1 and on.
first_tab_no | index (in the join order) of the first JOIN_TAB in the suffix. |
This method intends to be faster than recalculate_lateral_deps(), as it only calculates the increment change of adding on more table. But it requires the precondition above to be fulfilled.
|
private |
Find best access paths for semi-join DuplicateWeedout strategy and calculate rowcount and cost based on these.
first_tab | The first tab to calculate access paths for | |
last_tab | The last tab to calculate access paths for | |
[out] | newcount | New output row count |
[out] | newcost | New join prefix cost |
Notice that new best access paths need not be calculated. The proper access path information is already in join->positions, because DuplicateWeedout can handle any join buffering strategy. The only action performed by this function is to calculate output rowcount, and an updated cost estimate.
The cost estimate is based on performing a join over the involved tables, but we must also add the cost of creating and populating the temporary table used for duplicate removal, and the cost of doing lookups against this table.
Some times, some outer fanout is "absorbed" into the inner fanout. In this case, we should make a better estimate for outer_fanout that is used to calculate the output rowcount. If we have inner table(s) before an outer table, there are dependencies between these tables. The fanout for the outer table is not a good estimate for the final number of rows from the weedout execution, therefore we convert some of the inner fanout into an outer fanout, limited to the number of possible rows in the outer table.
|
private |
Find best access paths for semi-join FirstMatch or LooseScan strategy and calculate rowcount and cost based on these.
first_tab | The first tab to calculate access paths for, this is always a semi-join inner table. | |
last_tab | The last tab to calculate access paths for, always a semi-join inner table for FirstMatch, may be inner or outer for LooseScan. | |
remaining_tables | Bitmap of tables that are not in the [0...last_tab] join prefix | |
loosescan | If true, use LooseScan strategy, otherwise FirstMatch | |
[out] | newcount | New output row count |
[out] | newcost | New join prefix cost |
Calculate best access paths for the tables of a semi-join FirstMatch or LooseScan strategy, given the order of tables provided in join->positions (or join->best_positions when calculating the cost of a final plan). Calculate estimated cost and rowcount for this plan. Given a join prefix [0; ... first_tab-1], change the access to the tables in the range [first_tab; last_tab] according to the constraints set by the relevant semi-join strategy. Those constraints are:
For FirstMatch, the handled range of tables may be a mix of inner tables and non-dependent outer tables. The first and last table in the handled range are always inner tables. For LooseScan, the handled range can be a mix of inner tables and dependent and non-dependent outer tables. The first table is always an inner table.
Depending on member 'got_final_plan', the function uses and updates access path data in join->best_positions, otherwise uses join->positions and updates a local buffer.
|
private |
Fills a POSITION object of the driving table of a semi-join LooseScan range, with the cheapest access path.
This function was created by copying the code from best_access_path, and then eliminating everything which isn't related to semi-join LooseScan.
Preconditions:
tab | the driving table | |
remaining_tables | set of tables not included in the partial plan yet. | |
idx | the index in join->position[] where 'tab' is added to the partial plan. | |
prefix_rowcount | estimate for the number of records returned by the partial plan | |
[out] | pos | If return code is 'true': table access path that uses loosescan |
|
private |
Find best access paths for semi-join MaterializeLookup strategy.
and calculate rowcount and cost based on these.
last_inner | Index of the last inner table | |
sjm_nest | Pointer to semi-join nest for inner tables | |
[out] | newcount | New output row count |
[out] | newcost | New join prefix cost |
All outer tables may use join buffering, so there is no need to recalculate access paths nor costs for these. Add cost of materialization and scanning the materialized table to the costs of accessing the outer tables.
|
private |
Find best access paths for semi-join MaterializeScan strategy and calculate rowcount and cost based on these.
last_inner_tab | The last tab in the set of inner tables | |
last_outer_tab | The last tab in the set of outer tables | |
remaining_tables | Bitmap of tables that are not in the join prefix including the inner and outer tables processed here. | |
sjm_nest | Pointer to semi-join nest for inner tables | |
[out] | newcount | New output row count |
[out] | newcost | New join prefix cost |
Calculate best access paths for the outer tables of the MaterializeScan semi-join strategy. All outer tables may use join buffering. The prefix row count is adjusted with the estimated number of rows in the materialized tables, before taking into consideration the rows contributed by the outer tables.
|
static |
Check whether a semijoin materialization strategy is allowed for the current (semi)join table order.
join | Join object |
remaining_tables | Tables that have not yet been added to the join plan |
tab | Join_tab of the table being considered |
idx | Index in join->position[] with Join_tab "tab" |
SJ_OPT_NONE | - Materialization not applicable |
SJ_OPT_MATERIALIZE_LOOKUP | - Materialization with lookup applicable |
SJ_OPT_MATERIALIZE_SCAN | - Materialization with scan applicable |
The function checks applicability of both MaterializeLookup and MaterializeScan strategies. No checking is made until "tab" is pointing to the last inner table of a semijoin nest that can be executed using materialization - for all other cases SJ_OPT_NONE is returned.
MaterializeLookup and MaterializeScan are both applicable in the following two cases:
In this case, MaterializeLookup is returned based on a heuristic decision.
Helper function to write the current plan's prefix to the optimizer trace.
|
staticconstexpr |
Number of rows in a reference table when refereed through a not unique key.
This value is only used when we don't know anything about the key distribution.