WL#7052: Make best_access_path() more readable
Status: Complete
The responsibility of Optimize_table_order::best_access_path() is to pick the cheapest table access method. It is a large function (700 lines long) that basically does two things: fine-grained cost calculation for possible 'ref' accesses and coarse-grained cost based selection of access method. The function consists of very long and deeply nested if() blocks in combination with variable names that have little or no meaning (like "double best", "double tmp" etc). All this makes the function almost impossible to read. This WL is for refactoring best_access_path() to improve readability. User Documentation ================== Code refactoring. No user docs required.
Functional requirements:
------------------------
F1: best_access_path() readability shall improve by moving ref access cost
calculations to a separate member function that will be called from
best_access_path(). This will both reduce the length of best_access_path()
and make the abstraction level of it more consistent.
F2: best_access_path() readability shall improve by giving variables
meaningful names.
Non-Functional requirements:
----------------------------
NF1: The functionality of best_access_path() shall remain unchanged. Bug
reports will be created for any bug found during the work on this WL.
NF2: Performance degradations are not accepted.
The overall responsibility of Optimize_table_order::best_access_path() is to
pick the cheapest possible table access method. It does this by calculating the
cost of all possible 'ref' access methods and compare this to the (mostly)
precalculated costs for tables scan/range scan/index scan.
While the overall responsibility is easy to understand ("find the cheapest
access method"), it is very hard to fully grasp how
Optimize_table_order::best_access_path() comes to its conclusion. We have seen
a multitude of bugs in this function earlier.
The main pain-points are:
-------------------------
P1) There is a mismatch between the fine-grained cost calculations done for 'ref'
access and the coarse-grained comparison done between different access
methods. The table scan and range scan access method data fetch costs
are e.g. already calculated when best_access_path() is called.
P2) The function is very long (700 lines)
P3) The function consists of huge nested if() blocks, especially the part of
it that calculates 'ref' access cost. This makes the function hard to
read since you can't simply see whether the code you currently read is
inside the 'ref' code block or after it.
P4) A great number of variables either have names that carry no meaning or names
that can easily be confused with something else.
P5) Variables are reused in different settings, making it hard to understand what
the variable is used for at a particular place in the code.
P6) There are hidden bugs because variables are used slightly different than the
implementer thought.
P7) There are four if-blocks with heuristics that skip calculation of scan
costs if "scan cannot be better". But the logic behind these heuristics
are not well understood.
This WL will fix best_access_path by:
-------------------------------------
F1) Moving the 'ref' access cost calculations to a separate member function.
This will reduce the size of best_access_path() to half it's size. In
addition, the remaining cost calculations and comparisons in
best_access_path() will be at the same level of detail. Fixes: P1, P2
F2) Variables will be renamed and new variables introduced so that it's easier
to understand the code. Variables will no longer be reused for different
purposes. Fixes: P4, P5, P6
F3) New variable names with well-defined values will make it easier to grasp
the intention behind the four if-blocks with heuristics for skipping
scan cost calculations. Together with improved code comments, this will
fix: P7. If they are found to contain bugs, bug reports or separate WLs
will be created to fix them.
Will not be handled by this WL:
-------------------------------
Pain-point P3 (huge if() blocks in the 'ref' access part of the function)
will not be fixed by this WL. The 'ref' access part will be factored out to
it's own function but a further breakdown into sub-functions will not be done.
However, this part of the code will also be easier to read because F2
(variable renaming/new variables) will also be done for this part of the code.
New function created by WL:
---------------------------
/**
Find the best index to do 'ref' access on for a table.
@param tab the table to be joined by the function
@param remaining_tables set of tables not included in the
partial plan yet.
@param idx the index in join->position[] where 'tab'
is added to the partial plan.
@param found_condition [out] whether or not there exists a condition
that filters away rows for this table
@param prefix_rowcount estimate for the number of records returned
by the partial plan
@param ref_depend_map [out] tables the best ref access depends on
Unmodified if no 'ref' access is found.
@param used_key_parts [out] Number of key parts 'ref' access uses.
Unmodified if no 'ref' access is found.
@return pointer to Key_use for the index with best 'ref' access, NULL if
no 'ref' access method is found.
*/
Key_use*
Optimize_table_order::find_best_ref(JOIN_TAB *tab,
table_map remaining_tables,
uint idx,
bool *found_condition,
double prefix_rowcount,
table_map *ref_depends_map,
uint *used_key_parts)
This member function will replace the whole 'ref' analysis part of
best_access_path(), i.e., the code inside the following if block:
/*
This isn't unlikely at all, but unlikely() cuts 6% CPU time on a 20-table
search when s->keyuse==0, and has no cost when s->keyuse!=0.
*/
if(unlikely(s->keyuse != NULL))
{
// THIS
}
Changes to Optimize_table_order::best_access_path()
---------------------------------------------------
* Parameter JOIN_TAB *s is renamed: JOIN_TAB *tab
* The big if() block for ref access analysis is moved to
Optimize_table_order::find_best_ref(). See above
* Parameter renaming: double record_count => prefix_rowcount
Rationale: make it clearer that this is the estimated number of
rows produced by tables earlier in the join sequence. Also make
consistent use of row instead of record.
* A bunch of variables with no meaningful names are renamed, and
multiple usages of generic variables (e.g., "tmp" was used for many
different things) have been removed by introducing new variables with
meaningful names.
Changes to class Key_use:
-------------------------
* Renaming: double rowcount => double fanout
Rationale: As evident from the documentation of this variable, this
is the estimated fanout by using 'ref' access on the given index
* Renaming: double cost => double read_cost
Rationale: This is the cost of fetching rows from storage engines and
does not include server side processing. I.e., ROW_EVALUATION_COST is
not reflected in this variable.
Changes to struct st_position:
------------------------------
* Renaming: double records_read => double fanout
Rationale: same as for Key_use::rowcount
* Renaming: double read_time => double read_cost
Rationale: Consistency of variable names (Key_use::read_cost)
Found bugs:
-----------
* BUG#17303649
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.