MySQL 8.4.0
Source Code Documentation
hypergraph Namespace Reference

Classes

struct  Hyperedge
 
struct  Hypergraph
 
class  NeighborhoodCache
 
struct  Node
 

Typedefs

using NodeMap = uint64_t
 Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 64-bit bit sets. More...
 

Functions

template<class T >
static void RemoveElement (const T &element, std::vector< T > *vec)
 
bool IsSimpleEdge (NodeMap left, NodeMap right)
 Is an edge between "left" and "right" a simple edge? More...
 
template<class Receiver >
bool EnumerateAllConnectedPartitions (Receiver *receiver)
 
std::string PrintSet (NodeMap x)
 
NodeMap FindNeighborhood (const Hypergraph &g, NodeMap subgraph, NodeMap forbidden, NodeMap just_grown_by, NeighborhoodCache *cache, NodeMap *full_neighborhood_arg)
 Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable from that subgraph. More...
 
template<class Receiver >
bool EnumerateComplementsTo (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver)
 
template<class Receiver >
bool ExpandSubgraph (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
 
template<class Receiver >
bool TryConnecting (const Hypergraph &g, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, Receiver *receiver)
 
template<class Receiver >
bool ExpandComplement (const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
 
template<class Receiver >
bool EnumerateAllConnectedPartitions (const Hypergraph &g, Receiver *receiver)
 

Typedef Documentation

◆ NodeMap

using hypergraph::NodeMap = typedef uint64_t

Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 64-bit bit sets.

This allows for a compact representation and very fast set manipulation; the algorithm does a fair amount of intersections and unions. If we should need extensions to larger graphs later (this will require additional heuristics for reducing the search space), we can use dynamic bit sets, although at a performance cost (we'd probably templatize off the NodeMap type).

Function Documentation

◆ EnumerateAllConnectedPartitions() [1/2]

template<class Receiver >
bool hypergraph::EnumerateAllConnectedPartitions ( const Hypergraph g,
Receiver *  receiver 
)

◆ EnumerateAllConnectedPartitions() [2/2]

template<class Receiver >
bool hypergraph::EnumerateAllConnectedPartitions ( Receiver *  receiver)

◆ EnumerateComplementsTo()

template<class Receiver >
bool hypergraph::EnumerateComplementsTo ( const Hypergraph g,
size_t  lowest_node_idx,
NodeMap  subgraph,
NodeMap  full_neighborhood,
NodeMap  neighborhood,
Receiver *  receiver 
)

◆ ExpandComplement()

template<class Receiver >
bool hypergraph::ExpandComplement ( const Hypergraph g,
size_t  lowest_node_idx,
NodeMap  subgraph,
NodeMap  subgraph_full_neighborhood,
NodeMap  complement,
NodeMap  neighborhood,
NodeMap  forbidden,
Receiver *  receiver 
)

◆ ExpandSubgraph()

template<class Receiver >
bool hypergraph::ExpandSubgraph ( const Hypergraph g,
size_t  lowest_node_idx,
NodeMap  subgraph,
NodeMap  full_neighborhood,
NodeMap  neighborhood,
NodeMap  forbidden,
Receiver *  receiver 
)

◆ FindNeighborhood()

NodeMap hypergraph::FindNeighborhood ( const Hypergraph g,
NodeMap  subgraph,
NodeMap  forbidden,
NodeMap  just_grown_by,
NeighborhoodCache cache,
NodeMap full_neighborhood_arg 
)
inline

Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable from that subgraph.

There's an additional constraint in that the edges used to do so must not touch the forbidden set of nodes (X). The DPhyp paper calls this function N(S, X) (with a calligraphic N).

How to calculate the neighborhood efficiently is one of the least explicitly described parts of the paper. The definition goes about as follows:

  1. Find E↓'(S,X), the set of “interesting hypernodes” (outgoing edge destinations from S). These are the (endpoints of) edges that have one side entirely within S, that have the other side entirely outside S, and none of the sides touch the forbidden set X.
  2. Minimize E↓'(S,X) by removing all “subsumed hypernodes”, giving E↓(S,X). u subsumes v if it is a proper subset; if so, we can never go to where v points to before we've been at u, so it's pointless to keep v.
  3. For each hypernode in E↓(S,X), pick node with lowest index as a representative, because our subset enumeration algorithms cannot enumerate subsets of hypernodes, only subsets of normal nodes. (Actually, any node that's part of the hypernode would do; it does not even need to be consistent.) These nodes together constitute the neighborhood.

There are a couple of points to note here:

First, adding more nodes than needed to the neighborhood does not affect correctness of the algorithm, only speed. We try all combinations of included/excluded for the neighborhood (2^N in the number of nodes), so this covers all potential subgraphs; in theory, we could even just choose all non-forbidden nodes and reduce to the algorithm known as DPhyp, it just wouldn't be very efficient.

Second, step 3 means that we may very well end up with a non-connected subgraph. This is harmless; we may eventually grow it to a connected one or we may not, we just won't start looking for any complements until we have a connected one (and we know whether it's connected or not based on whether we saw it as a csg-cmp pair in the algorithm earlier).

Third, due to the way we grow our subgraph, only the nodes that we have just grown by can contribute to the E↓'(S,X). The reason is simple; every node from the previous neighborhood will have been added either to S or to X, and both exclude them from the new neighborhood. (Step 2 doesn't affect this, as any hypernode that was subsumed would also have to touch S or X. But there's an exception in that occasionally, we can remove nodes from X; see ExpandSubgraph().)

Fourth, perfect minimization seems to be impossible to actually implement efficiently. This is known as the minimum set problem, and the best known algorithms to do this are in O(n² / log n) time (see e.g. Pritchard: ”An Old Sub-Quadratic Algorithm for Finding Extremal Sets”), which can be quite a lot when there are lots of edges. (The trivial O(n²) algorithm is to just test every set against every smaller set, and throw it out if it's a superset.) Since loops in our hypergraphs are presumed to be fairly rare, it would not seem worth it to do full minimization.

Instead, we pick the low-hanging fruit only: Every simple edge is trivial to test against. We just collect the simple edges into a mask, and any (complex) hyperedge that overlaps with that bitmap can immediately be discarded. Even more, since we don't have to pick min(S) but can pick something arbitrary, we can let {R2,R3} (which gets R2 as its representative node) subsume {R1,R2}, even though it's not an actual subset, by pretending we picked R2 as the representative node for the latter! This is similar to what Moerkotte describes in his “Building Query Compilers” document, which seems to contain a slightly extended version of the DPhyp paper (under a different name). We could have collected all the simple edges in a separate pass first, but the microbenchmarks show that the added loop overhead isn't worth it.

Note that we also keep E↓'(S,X), the set of interesting hypernodes; we bitwise-or it into “full_neighborhood”. This is useful later when searching for edges to connect the connected subgraph and its complement; we know that only edges into “full_neighborhood” can connect the two.

This function accounts for roughly 20–70% of the total DPhyp running time, depending on the shape of the graph (~40% average across the microbenchmarks). It is fairly big to inline, but it helps speed significantly, probably due to the large amount of parameters to be passed back and forth.

◆ IsSimpleEdge()

bool hypergraph::IsSimpleEdge ( NodeMap  left,
NodeMap  right 
)
inline

Is an edge between "left" and "right" a simple edge?

◆ PrintSet()

std::string hypergraph::PrintSet ( NodeMap  x)
inline

◆ RemoveElement()

template<class T >
static void hypergraph::RemoveElement ( const T &  element,
std::vector< T > *  vec 
)
static

◆ TryConnecting()

template<class Receiver >
bool hypergraph::TryConnecting ( const Hypergraph g,
NodeMap  subgraph,
NodeMap  subgraph_full_neighborhood,
NodeMap  complement,
Receiver *  receiver 
)