MySQL 8.3.0
Source Code Documentation
graph_simplification.h File Reference

Heuristic simplification of query graphs to make them execute faster, largely a direct implementation of [Neu09] (any references to just “the paper” will generally be to that). More...

#include <stddef.h>
#include <string>
#include <vector>
#include "my_compiler.h"
#include "priority_queue.h"
#include "sql/join_optimizer/hypergraph.h"
#include "sql/join_optimizer/online_cycle_finder.h"
#include "sql/mem_root_array.h"
#include "sql/sql_array.h"

Go to the source code of this file.

Classes

class  GraphSimplifier
 
struct  GraphSimplifier::ProposedSimplificationStep
 
struct  GraphSimplifier::SimplificationStep
 
struct  GraphSimplifier::EdgeCardinalities
 
struct  GraphSimplifier::NeighborCache
 
struct  GraphSimplifier::CompareByBenefit
 
struct  GraphSimplifier::MarkNeighborCache
 

Functions

void SetNumberOfSimplifications (int num_simplifications, GraphSimplifier *simplifier)
 
void SimplifyQueryGraph (THD *thd, int subgraph_pair_limit, JoinHypergraph *graph, GraphSimplifier *simplifier, std::string *trace)
 

Detailed Description

Heuristic simplification of query graphs to make them execute faster, largely a direct implementation of [Neu09] (any references to just “the paper” will generally be to that).

This is needed for when query hypergraphs have too many possible (connected) subgraphs to evaluate all of them, and we need to resort to heuristics.

The algorithm works by evaluating pairs of neighboring joins (largely, those that touch some of the same tables), finding obviously bad pairwise orderings and then disallowing them. I.e., if join A must very likely happen before join B (as measured by cost heuristics), we disallow the B-before-A join by extending the hyperedge of B to include A's nodes. This makes the graph more visually complicated (thus making “simplification” a bit of a misnomer), but reduces the search space, so that the query generally is faster to plan.

Obviously, as the algorithm is greedy, it will sometimes make mistakes and make for a more expensive (or at least higher-cost) query. This isn't necessarily an optimal or even particularly good algorithm; e.g. LinDP++ [Rad19] claims significantly better results, especially on joins that are 40 tables or more. However, using graph simplification allows us to handle large queries reasonably well, while still reusing nearly all of our query planning machinery (i.e., we don't have to implement a separate query planner and cost model for large queries).

Also note that graph simplification only addresses the problem of subgraph pair explosion. If each subgraph pair generates large amounts of candidate access paths (e.g. through parameterized paths), each subgraph pair will in itself be expensive, and graph simplification does not concern itself with this at all. Thus, to get a complete solution, we must also have heuristic pruning of access paths within a subgraph, which we're currently missing.

[Neu09] Neumann: “Query Simplification: Graceful Degradation for Join-Order Optimization”. [Rad19] Radke and Neumann: “LinDP++: Generalizing Linearized DP to Crossproducts and Non-Inner Joins”.

Function Documentation

◆ SetNumberOfSimplifications()

void SetNumberOfSimplifications ( int  num_simplifications,
GraphSimplifier simplifier 
)

◆ SimplifyQueryGraph()

void SimplifyQueryGraph ( THD thd,
int  subgraph_pair_limit,
JoinHypergraph graph,
GraphSimplifier simplifier,
std::string *  trace 
)