A position of table within a join order.
This structure is primarily used as a part of join->positions
and join->best_positions
arrays.
One POSITION element contains information about:
- Which table is accessed
- Which access method was chosen = Its cost and #of output records
- Semi-join strategy choice. Note that there are two different representation formats:
- The one used during join optimization
- The one used at plan refinement/code generation stage. We call fix_semijoin_strategies_for_picked_join_order() to switch between #1 and #2. See that function's comment for more details.
- Semi-join optimization state. When we're running join optimization, we main a state for every semi-join strategy which are various variables that tell us if/at which point we could consider applying the strategy. The variables are really a function of join prefix but they are too expensive to re-caclulate for every join prefix we consider, so we maintain current state in join->positions[#tables_in_prefix]. See advance_sj_state() for details.
This class has to stay a POD, because it is memcpy'd in many places.
float POSITION::filter_effect |
The fraction of the 'rows_fetched' rows that will pass the table conditions that were NOT used by the access method.
If, e.g.,
"SELECT ... WHERE t1.colx = 4 and t1.coly @> 5"
is resolved by ref access on t1.colx, filter_effect will be the fraction of rows that will pass the "t1.coly @> 5" predicate. The valid range is 0..1, where 0.0 means that no rows will pass the table conditions and 1.0 means that all rows will pass.
It is used to calculate how many row combinations will be joined with the next table,
- See also
- prefix_rowcount below.
- Note
- With condition filtering enabled, it is possible to get a fanout = rows_fetched * filter_effect that is less than 1.0. Consider, e.g., a join between t1 and t2:
"SELECT ... WHERE t1.col1=t2.colx and t2.coly OP @<something@>"
where t1 is a prefix table and the optimizer currently calculates the cost of adding t2 to the join. Assume that the chosen access method on t2 is a 'ref' access on 'colx' that is estimated to produce 2 rows per row from t1 (i.e., rows_fetched = 2). It will in this case be perfectly fine to calculate a filtering effect <0.5 (resulting in "rows_fetched * filter_effect @< 1.0") from the predicate "t2.coly OP @<something@>". If so, the number of row combinations from (t1,t2) is lower than the prefix_rowcount of t1.
The above is just an example of how the fanout of a table can become less than one. It can happen for any access method.
double POSITION::prefix_rowcount |
prefix_rowcount and prefix_cost form a stack of partial join order costs and output sizes
prefix_rowcount: The number of row combinations that will be joined to the next table in the join sequence.
For a joined table it is calculated as prefix_rowcount = last_table.prefix_rowcount * rows_fetched * filter_effect
- See also
- filter_effect
For a semijoined table it may be less than this formula due to duplicate elimination.
double POSITION::read_cost |
Cost of accessing the table in course of the entire complete join execution, i.e.
cost of one access method use (e.g. 'range' or 'ref' scan ) multiplied by estimated number of rows from tables earlier in the join sequence.
read_cost does NOT include cost of processing rows within the executor (row_evaluate_cost).
double POSITION::rows_fetched |
The number of rows that will be fetched by the chosen access method per each row combination of previous tables.
That is:
rows_fetched = selectivity(access_condition) * cardinality(table)
where 'access_condition' is whatever condition was chosen for index access, depending on the access method ('ref', 'range', etc.)
- Note
- For index/table scans, rows_fetched may be less than the number of rows in the table because the cost of evaluating constant conditions is included in the scan cost, and the number of rows produced by these scans is the estimated number of rows that pass the constant conditions.
- See also
- Optimize_table_order::calculate_scan_cost() . But this is only during planning; make_join_readinfo() simplifies it for EXPLAIN.
uint POSITION::sj_strategy |
Current optimization state: Semi-join strategy to be used for this and preceding join tables.
Join optimizer sets this for the last join_tab in the duplicate-generating range. That is, in order to interpret this field, one needs to traverse join->[best_]positions array from right to left. When you see a join table with sj_strategy!= SJ_OPT_NONE, some other field (depending on the strategy) tells how many preceding positions this applies to. The values of covered_preceding_positions->sj_strategy must be ignored.