MySQL 8.0.33
Source Code Documentation

This file implements the DPhyp algorithm for enumerating connected subgraphs of hypergraphs (see hypergraph.h for a hypergraph definition). More...
#include <assert.h>
#include <string>
#include "sql/join_optimizer/bit_utils.h"
#include "sql/join_optimizer/hypergraph.h"
Go to the source code of this file.
Classes  
class  hypergraph::NeighborhoodCache 
Namespaces  
namespace  hypergraph 
Macros  
#define  DEBUGGING_DPHYP 0 
#define  HYPERGRAPH_PRINTF(...) 
Functions  
template<class Receiver >  
bool  hypergraph::EnumerateAllConnectedPartitions (Receiver *receiver) 
std::string  hypergraph::PrintSet (NodeMap x) 
NodeMap  hypergraph::FindNeighborhood (const Hypergraph &g, NodeMap subgraph, NodeMap forbidden, NodeMap just_grown_by, NeighborhoodCache *cache, NodeMap *full_neighborhood_arg) 
Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable from that subgraph. More...  
template<class Receiver >  
bool  hypergraph::EnumerateComplementsTo (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver) 
template<class Receiver >  
bool  hypergraph::ExpandSubgraph (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver) 
template<class Receiver >  
bool  hypergraph::TryConnecting (const Hypergraph &g, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, Receiver *receiver) 
template<class Receiver >  
bool  hypergraph::ExpandComplement (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver) 
template<class Receiver >  
bool  hypergraph::EnumerateAllConnectedPartitions (const Hypergraph &g, Receiver *receiver) 
This file implements the DPhyp algorithm for enumerating connected subgraphs of hypergraphs (see hypergraph.h for a hypergraph definition).
The core idea of the algorithm is that if the join structure of a query is expressed as a hypergraph, where the relations are nodes and the join predicates are hyperedges, one can efficiently find all legal join orders without Cartesian products by finding all possible subpartitions of the hypergraph. (Simple inner joins will have regular edges, but outer joins, antijoins etc., can be encoded as hyperedges to constrain the allowed join orderings, so that we do not join e.g. an inner and outer table together before said inner table has been joined to the entire set. Also, hyperpredicates such as t1.a + t2.b = t3.c will naturally give rise to hyperedges.)
The algorithm is described in the paper “Dynamic Programming Strikes Back” by Neumann and Moerkotte. There is a somewhat extended version of the paper (that also contains a few corrections) in Moerkotte's treatise “Building Query Compilers”. Some critical details are still missing, which we've had to fill in ourselves. We don't currently implement the extension to generalized hypergraphs, but it should be fairly straightforward to do later. The algorithm is simple in concept but hard to grasp; we will only give a very rough outline here:
The entry point for doing this is EnumerateAllConnectedPartitions().
For complex joins, we may have to run DPhyp multiple times in a mode where we just count the number of partitions over various constrained graphs, and this will be a critical part of query planning time. Thus, it is coded as a template over a receiver type that gets callbacks for each partition. If the receiver just interested in counting, this saves a significant amount of call overhead. The templatization also allows the microbenchmarks to more accurately measure changes in the algorithm itself without having to benchmark the receiver.
#define DEBUGGING_DPHYP 0 
#define HYPERGRAPH_PRINTF  (  ...  ) 