MySQL
8.0.28
Source Code Documentation

Classes  
struct  Node 
struct  Hyperedge 
struct  Hypergraph 
class  NeighborhoodCache 
Typedefs  
using  NodeMap = uint64_t 
Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 64bit bit sets. More...  
Functions  
template<class Receiver >  
bool  EnumerateAllConnectedPartitions (Receiver *receiver) 
std::string  PrintSet (NodeMap x) 
NodeMap  FindNeighborhood (const Hypergraph &g, NodeMap subgraph, NodeMap forbidden, NodeMap just_grown_by, NeighborhoodCache *cache, NodeMap *full_neighborhood_arg) 
Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable from that subgraph. More...  
template<class Receiver >  
bool  EnumerateComplementsTo (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver) 
template<class Receiver >  
bool  ExpandSubgraph (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver) 
template<class Receiver >  
bool  TryConnecting (const Hypergraph &g, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, Receiver *receiver) 
template<class Receiver >  
bool  ExpandComplement (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver) 
template<class Receiver >  
bool  EnumerateAllConnectedPartitions (const Hypergraph &g, Receiver *receiver) 
using hypergraph::NodeMap = typedef uint64_t 
Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 64bit bit sets.
This allows for a compact representation and very fast set manipulation; the algorithm does a fair amount of intersections and unions. If we should need extensions to larger graphs later (this will require additional heuristics for reducing the search space), we can use dynamic bit sets, although at a performance cost (we'd probably templatize off the NodeMap type).
bool hypergraph::EnumerateAllConnectedPartitions  (  const Hypergraph &  g, 
Receiver *  receiver  
) 
bool hypergraph::EnumerateAllConnectedPartitions  (  Receiver *  receiver  ) 
bool hypergraph::EnumerateComplementsTo  (  const Hypergraph &  g, 
size_t  lowest_node_idx,  
NodeMap  subgraph,  
NodeMap  full_neighborhood,  
NodeMap  neighborhood,  
Receiver *  receiver  
) 
bool hypergraph::ExpandComplement  (  const Hypergraph &  g, 
size_t  lowest_node_idx,  
NodeMap  subgraph,  
NodeMap  subgraph_full_neighborhood,  
NodeMap  complement,  
NodeMap  neighborhood,  
NodeMap  forbidden,  
Receiver *  receiver  
) 
bool hypergraph::ExpandSubgraph  (  const Hypergraph &  g, 
size_t  lowest_node_idx,  
NodeMap  subgraph,  
NodeMap  full_neighborhood,  
NodeMap  neighborhood,  
NodeMap  forbidden,  
Receiver *  receiver  
) 

inline 
Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable from that subgraph.
There's an additional constraint in that the edges used to do so must not touch the forbidden set of nodes (X). The DPhyp paper calls this function N(S, X) (with a calligraphic N).
How to calculate the neighborhood efficiently is one of the least explicitly described parts of the paper. The definition goes about as follows:
There are a couple of points to note here:
First, adding more nodes than needed to the neighborhood does not affect correctness of the algorithm, only speed. We try all combinations of included/excluded for the neighborhood (2^N in the number of nodes), so this covers all potential subgraphs; in theory, we could even just choose all nonforbidden nodes and reduce to the algorithm known as DPhyp, it just wouldn't be very efficient.
Second, step 3 means that we may very well end up with a nonconnected subgraph. This is harmless; we may eventually grow it to a connected one or we may not, we just won't start looking for any complements until we have a connected one (and we know whether it's connected or not based on whether we saw it as a csgcmp pair in the algorithm earlier).
Third, due to the way we grow our subgraph, only the nodes that we have just grown by can contribute to the E↓'(S,X). The reason is simple; every node from the previous neighborhood will have been added either to S or to X, and both exclude them from the new neighborhood. (Step 2 doesn't affect this, as any hypernode that was subsumed would also have to touch S or X. But there's an exception in that occasionally, we can remove nodes from X; see ExpandSubgraph().)
Fourth, perfect minimization seems to be impossible to actually implement efficiently. This is known as the minimum set problem, and the best known algorithms to do this are in O(n² / log n) time (see e.g. Pritchard: ”An Old SubQuadratic Algorithm for Finding Extremal Sets”), which can be quite a lot when there are lots of edges. (The trivial O(n²) algorithm is to just test every set against every smaller set, and throw it out if it's a superset.) Since loops in our hypergraphs are presumed to be fairly rare, it would not seem worth it to do full minimization.
Instead, we pick the lowhanging fruit only: Every simple edge is trivial to test against. We just collect the simple edges into a mask, and any (complex) hyperedge that overlaps with that bitmap can immediately be discarded. Even more, since we don't have to pick min(S) but can pick something arbitrary, we can let {R2,R3} (which gets R2 as its representative node) subsume {R1,R2}, even though it's not an actual subset, by pretending we picked R2 as the representative node for the latter! This is similar to what Moerkotte describes in his “Building Query Compilers” document, which seems to contain a slightly extended version of the DPhyp paper (under a different name). We could have collected all the simple edges in a separate pass first, but the microbenchmarks show that the added loop overhead isn't worth it.
Note that we also keep E↓'(S,X), the set of interesting hypernodes; we bitwiseor it into “full_neighborhood”. This is useful later when searching for edges to connect the connected subgraph and its complement; we know that only edges into “full_neighborhood” can connect the two.
This function accounts for roughly 20–70% of the total DPhyp running time, depending on the shape of the graph (~40% average across the microbenchmarks). It is fairly big to inline, but it helps speed significantly, probably due to the large amount of parameters to be passed back and forth.

inline 
bool hypergraph::TryConnecting  (  const Hypergraph &  g, 
NodeMap  subgraph,  
NodeMap  subgraph_full_neighborhood,  
NodeMap  complement,  
Receiver *  receiver  
) 