MySQL 8.4.0
Source Code Documentation
anonymous_namespace{graph_simplification.cc} Namespace Reference

Classes

struct  JoinStatus
 

Functions

bool IsSubjoin (Hyperedge a, Hyperedge b)
 Returns whether A is already a part of B, ie., whether it is impossible to execute B before A. More...
 
bool CombiningWouldViolateConflictRules (const Mem_root_array< ConflictRule > &conflict_rules, const int *in_component, int left_component, int right_component)
 
int GetComponent (const NodeMap *components, const int *in_component, NodeMap tables)
 
template<class Func >
void ConnectComponentsThroughJoins (const JoinHypergraph &graph, const OnlineCycleFinder &cycles, Func &&callback_on_join, NodeMap *components, int *in_component, NodeMap *lateral_dependencies)
 Helper algorithm for GetCardinality() and GraphIsJoinable(); given a set of components (each typically connecting a single table at the start), connects them incrementally up through joins and calls a given callback every time we do it. More...
 
double GetCardinality (THD *thd, NodeMap tables_to_join, const JoinHypergraph &graph, const OnlineCycleFinder &cycles)
 For a given set of tables, try to estimate the cardinality of joining them together. More...
 
double GetCardinalitySingleJoin (THD *thd, NodeMap left, NodeMap right, double left_rows, double right_rows, const JoinHypergraph &graph, const JoinPredicate &pred)
 A special, much faster version of GetCardinality() that can be used when joining two partitions along a known edge. More...
 
OnlineCycleFinder FindJoinDependencies (const Hypergraph &graph, MEM_ROOT *mem_root)
 Initialize a DAG containing all inferred join dependencies from the hypergraph. More...
 
bool IsQueryGraphSimpleEnough (THD *thd, const JoinHypergraph &graph, int subgraph_pair_limit, MEM_ROOT *mem_root, int *seen_subgraph_pairs)
 
JoinStatus SimulateJoin (THD *thd, JoinStatus left, JoinStatus right, const JoinPredicate &pred)
 Simulate the (total) costs and cardinalities of joining two sets of tables, without actually having an AccessPath for each (which is a bit heavyweight for just cost and cardinality). More...
 
JoinStatus SimulateJoin (THD *thd, double left_rows, JoinStatus right, const JoinPredicate &pred)
 
JoinStatus SimulateJoin (THD *thd, JoinStatus left, double right_rows, const JoinPredicate &pred)
 
JoinStatus SimulateJoin (THD *thd, double left_rows, double right_rows, const JoinPredicate &pred)
 
bool GraphIsJoinable (const JoinHypergraph &graph, const OnlineCycleFinder &cycles)
 See if a given hypergraph is impossible to join, in any way. More...
 

Function Documentation

◆ CombiningWouldViolateConflictRules()

bool anonymous_namespace{graph_simplification.cc}::CombiningWouldViolateConflictRules ( const Mem_root_array< ConflictRule > &  conflict_rules,
const int *  in_component,
int  left_component,
int  right_component 
)

◆ ConnectComponentsThroughJoins()

template<class Func >
void anonymous_namespace{graph_simplification.cc}::ConnectComponentsThroughJoins ( const JoinHypergraph graph,
const OnlineCycleFinder cycles,
Func &&  callback_on_join,
NodeMap *  components,
int *  in_component,
NodeMap *  lateral_dependencies 
)

Helper algorithm for GetCardinality() and GraphIsJoinable(); given a set of components (each typically connecting a single table at the start), connects them incrementally up through joins and calls a given callback every time we do it.

The callback must be of type

bool callback(int left_component, int right_component, const JoinPredicate &pred, int num_changed);

where num_changed is the number of tables that was in right_component but has now been combined with the ones in left_component and were moved there (we always move into the component with the lowest index). The algorithm ends when callback() returns true, or if no more joins are possible.

In theory, it would be possible to accelerate this mechanism by means of the standard union-find algorithm (see e.g. https://en.wikipedia.org/wiki/Disjoint-set_data_structure), but since MAX_TABLES is so small, just using bitsets seems to work just as well. And instead of spending time on that, it would probably be better to find a complete join inference algorithm that would make GraphIsJoinable() obsolete and thus reduce the number of calls to this function.

◆ FindJoinDependencies()

OnlineCycleFinder anonymous_namespace{graph_simplification.cc}::FindJoinDependencies ( const Hypergraph graph,
MEM_ROOT mem_root 
)

Initialize a DAG containing all inferred join dependencies from the hypergraph.

These are join dependencies that we cannot violate no matter what we do, so we need to make sure we do not try to force join reorderings that would be in conflict with them (whether directly or transitively) – and the returned OnlineCycleFinder allows us to check out exactly that, and also keep maintaining the DAG as we impose more orderings on the graph.

This graph doesn't necessarily contain all dependencies inherent in the hypergraph, but it usually contains most of them. For instance, {t2,t3}-t4 is not a subjoin of t1-{t2,t4}, but must often be ordered before it anyway, since t2 and t4 are on opposite sides of the former join. See GraphSimplificationTest.IndirectHierarcicalJoins for a concrete test.

Also, in the case of cyclic hypergraphs, the constraints in this DAG may be too strict, since it doesn't take into account that in cyclic hypergraphs we don't end up using all the edges (since the cycles are caused by redundant edges). So even if a constraint cannot be added because it would cause a cycle in the DAG, it doesn't mean that the hypergraph is unjoinable, because one of the edges involved in the cycle might be redundant and can be bypassed. See GraphSimplificationTest.CycleNeighboringHyperedges for a concrete test.

We really ought to fix this, but it's not obvious how to implement it; it seems very difficult to create a test that catches all cases and does not have any false positives in the presence of cycles (which often enable surprising orderings). Because it doesn't, we need additional and fairly expensive checks later on; see comments on GraphIsJoinable().

◆ GetCardinality()

double anonymous_namespace{graph_simplification.cc}::GetCardinality ( THD thd,
NodeMap  tables_to_join,
const JoinHypergraph graph,
const OnlineCycleFinder cycles 
)

For a given set of tables, try to estimate the cardinality of joining them together.

(This essentially simulates the cardinality we'd get out of CostingReceiver, but without computing any costs or actual AccessPaths.)

This is a fairly expensive operation since we need to iterate over all hyperedges several times, so we cache the cardinalities for each hyperedge in GraphSimplifier's constructor and then reuse them until the hyperedge is changed. We could probably go even further by having a cache based on tables_to_join, as many of the hyperedges will share endpoints, but it does not seem to be worth it (based on the microbenchmark profiles).

◆ GetCardinalitySingleJoin()

double anonymous_namespace{graph_simplification.cc}::GetCardinalitySingleJoin ( THD thd,
NodeMap  left,
NodeMap  right,
double  left_rows,
double  right_rows,
const JoinHypergraph graph,
const JoinPredicate pred 
)

A special, much faster version of GetCardinality() that can be used when joining two partitions along a known edge.

It reuses the existing cardinalities, and just applies the single edge and any missing WHERE predicates; this allows it to just make a single pass over those predicates and do no other work.

◆ GetComponent()

int anonymous_namespace{graph_simplification.cc}::GetComponent ( const NodeMap *  components,
const int *  in_component,
NodeMap  tables 
)

◆ GraphIsJoinable()

bool anonymous_namespace{graph_simplification.cc}::GraphIsJoinable ( const JoinHypergraph graph,
const OnlineCycleFinder cycles 
)

See if a given hypergraph is impossible to join, in any way.

This is a hack to work around the fact that our inference of implicit join ordering from the hypergraph is imperfect, so that we can end up creating an impossible situation (try to force join A before join B, but B must be done before A due to graph constraints). The paper mentions that joins must be inferred, but does not provide a complete procedure, and the authors were unaware that their assumed procedure did not cover all cases (Neumann, personal communication). Thus, we run this after each join simplification we apply, to see whether we created such a contradiction (if so, we know the opposite ordering is true).

The algorithm is bare-bones: We put each node (table) into its own component, and then run through all join edges to see if we can connect those components into larger components. If we can apply enough edges (by repeated application of the entire list) that everything is connected into the same component, then there is at least one valid join order, and the graph is joinable. If not, it is impossible and we return true.

◆ IsQueryGraphSimpleEnough()

bool anonymous_namespace{graph_simplification.cc}::IsQueryGraphSimpleEnough ( THD thd,
const JoinHypergraph graph,
int  subgraph_pair_limit,
MEM_ROOT mem_root,
int *  seen_subgraph_pairs 
)

◆ IsSubjoin()

bool anonymous_namespace{graph_simplification.cc}::IsSubjoin ( Hyperedge  a,
Hyperedge  b 
)

Returns whether A is already a part of B, ie., whether it is impossible to execute B before A.

E.g., for t1 LEFT JOIN (t2 JOIN t3), the t2-t3 join will be part of the t1-{t2,t3} hyperedge, and this will return true.

Note that this definition is much more lenient than the one in the paper (Figure 4), which appears to be wrong.

◆ SimulateJoin() [1/4]

JoinStatus anonymous_namespace{graph_simplification.cc}::SimulateJoin ( THD thd,
double  left_rows,
double  right_rows,
const JoinPredicate pred 
)

◆ SimulateJoin() [2/4]

JoinStatus anonymous_namespace{graph_simplification.cc}::SimulateJoin ( THD thd,
double  left_rows,
JoinStatus  right,
const JoinPredicate pred 
)

◆ SimulateJoin() [3/4]

JoinStatus anonymous_namespace{graph_simplification.cc}::SimulateJoin ( THD thd,
JoinStatus  left,
double  right_rows,
const JoinPredicate pred 
)

◆ SimulateJoin() [4/4]

JoinStatus anonymous_namespace{graph_simplification.cc}::SimulateJoin ( THD thd,
JoinStatus  left,
JoinStatus  right,
const JoinPredicate pred 
)

Simulate the (total) costs and cardinalities of joining two sets of tables, without actually having an AccessPath for each (which is a bit heavyweight for just cost and cardinality).

Returns the same type, so that we can succinctly simulate joining this to yet more tables.

The paper generally uses merge join as the cost function heuristic, but since we don't have merge join, and nested-loop joins are heavily dependent on context such as available indexes, we use instead our standard hash join estimation here. When we get merge joins, we should probably have a look to see whether switching to its cost function here makes sense. (Of course, we don't know what join type we will actually be using until we're done with the entire planning!)

NOTE: Keep this in sync with the cost estimation in ProposeHashJoin().