Heuristic simplification of query graphs to make them execute faster, largely a direct implementation of [Neu09] (any references to just “the paper” will generally be to that). More...

#include <stddef.h>
#include <string>
#include <vector>
#include "my_compiler.h"
#include "priority_queue.h"
#include "sql/join_optimizer/hypergraph.h"
#include "sql/join_optimizer/online_cycle_finder.h"
#include "sql/mem_root_array.h"
#include "sql/sql_array.h"

Classes
class	GraphSimplifier

struct	GraphSimplifier::ProposedSimplificationStep

struct	GraphSimplifier::SimplificationStep

struct	GraphSimplifier::EdgeCardinalities

struct	GraphSimplifier::NeighborCache

struct	GraphSimplifier::CompareByBenefit

struct	GraphSimplifier::MarkNeighborCache

Functions
void	SimplifyQueryGraph (THD thd, int subgraph_pair_limit, JoinHypergraph graph, std::string *trace)

Detailed Description

Heuristic simplification of query graphs to make them execute faster, largely a direct implementation of [Neu09] (any references to just “the paper” will generally be to that).

This is needed for when query hypergraphs have too many possible (connected) subgraphs to evaluate all of them, and we need to resort to heuristics.

The algorithm works by evaluating pairs of neighboring joins (largely, those that touch some of the same tables), finding obviously bad pairwise orderings and then disallowing them. I.e., if join A must very likely happen before join B (as measured by cost heuristics), we disallow the B-before-A join by extending the hyperedge of B to include A's nodes. This makes the graph more visually complicated (thus making “simplification” a bit of a misnomer), but reduces the search space, so that the query generally is faster to plan.

Obviously, as the algorithm is greedy, it will sometimes make mistakes and make for a more expensive (or at least higher-cost) query. This isn't necessarily an optimal or even particularly good algorithm; e.g. LinDP++ [Rad19] claims significantly better results, especially on joins that are 40 tables or more. However, using graph simplification allows us to handle large queries reasonably well, while still reusing nearly all of our query planning machinery (i.e., we don't have to implement a separate query planner and cost model for large queries).

Also note that graph simplification only addresses the problem of subgraph pair explosion. If each subgraph pair generates large amounts of candidate access paths (e.g. through parameterized paths), each subgraph pair will in itself be expensive, and graph simplification does not concern itself with this at all. Thus, to get a complete solution, we must also have heuristic pruning of access paths within a subgraph, which we're currently missing.

[Neu09] Neumann: “Query Simplification: Graceful Degradation for Join-Order Optimization”. [Rad19] Radke and Neumann: “LinDP++: Generalizing Linearized DP to Crossproducts and Non-Inner Joins”.

Function Documentation

◆ SimplifyQueryGraph()

void SimplifyQueryGraph	(	THD *	thd,
		int	subgraph_pair_limit,
		JoinHypergraph *	graph,
		std::string *	trace
	)

Classes

Functions

Detailed Description

Function Documentation

◆ SimplifyQueryGraph()