MySQL 9.1.0
Source Code Documentation
subgraph_enumeration.h
Go to the documentation of this file.
1/* Copyright (c) 2020, 2024, Oracle and/or its affiliates.
2
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License, version 2.0,
5 as published by the Free Software Foundation.
6
7 This program is designed to work with certain software (including
8 but not limited to OpenSSL) that is licensed under separate terms,
9 as designated in a particular file or component or in included license
10 documentation. The authors of MySQL hereby grant you an additional
11 permission to link the program and your derivative works with the
12 separately licensed software that they have either included with
13 the program or referenced in the documentation.
14
15 This program is distributed in the hope that it will be useful,
16 but WITHOUT ANY WARRANTY; without even the implied warranty of
17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18 GNU General Public License, version 2.0, for more details.
19
20 You should have received a copy of the GNU General Public License
21 along with this program; if not, write to the Free Software
22 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
23
24#ifndef SUBGRAPH_ENUMERATION_H
25#define SUBGRAPH_ENUMERATION_H 1
26
27/**
28 @file
29
30 This file implements the DPhyp algorithm for enumerating connected
31 subgraphs of hypergraphs (see hypergraph.h for a hypergraph definition).
32
33 The core idea of the algorithm is that if the join structure of a
34 query is expressed as a hypergraph, where the relations are nodes
35 and the join predicates are hyperedges, one can efficiently find
36 all legal join orders without Cartesian products by finding all
37 possible subpartitions of the hypergraph. (Simple inner joins will
38 have regular edges, but outer joins, antijoins etc., can be encoded
39 as hyperedges to constrain the allowed join orderings, so that we
40 do not join e.g. an inner and outer table together before said inner
41 table has been joined to the entire set. Also, hyper-predicates such
42 as t1.a + t2.b = t3.c will naturally give rise to hyperedges.)
43
44 The algorithm is described in the paper “Dynamic Programming Strikes
45 Back” by Neumann and Moerkotte. There is a somewhat extended version
46 of the paper (that also contains a few corrections) in Moerkotte's
47 treatise “Building Query Compilers”. Some critical details are still
48 missing, which we've had to fill in ourselves. We don't currently
49 implement the extension to generalized hypergraphs, but it should be
50 fairly straightforward to do later. The algorithm is simple in concept
51 but hard to grasp; we will only give a very rough outline here:
52
53 1. Pick a seed node of the graph.
54 2. Grow that seed along hyperedges, taking care never to make an
55 unconnected graph or seeing the same subgraph twice.
56 3. For each connected subgraph (csg): Repeat steps 1–2 independently
57 to create a separate connected subgraph (the so-called complement,
58 cmp), and try to connect the subgraph and its complement to create
59 a larger graph (a so-called csg-cmp-pair).
60 4. When such a csg-cmp-pair is found, call the receiver back with the
61 csg and cmp. This is a valid subjoin that can be costed.
62
63 The entry point for doing this is EnumerateAllConnectedPartitions().
64
65 For complex joins, we may have to run DPhyp multiple times in a mode
66 where we just count the number of partitions over various constrained
67 graphs, and this will be a critical part of query planning time.
68 Thus, it is coded as a template over a receiver type that gets callbacks
69 for each partition. If the receiver just interested in counting,
70 this saves a significant amount of call overhead. The templatization
71 also allows the microbenchmarks to more accurately measure changes in
72 the algorithm itself without having to benchmark the receiver.
73 */
74
75#include <assert.h>
76#include <string>
79
80// You can change 0 to 1 here to get debug traces of the algorithm
81// as it is working.
82#define DEBUGGING_DPHYP 0
83
84#if DEBUGGING_DPHYP
85#include <stdio.h>
86#define HYPERGRAPH_PRINTF printf
87#else
88#define HYPERGRAPH_PRINTF(...)
89#endif
90
91namespace hypergraph {
92
93template <class Receiver>
94bool EnumerateAllConnectedPartitions(Receiver *receiver);
95
96inline std::string PrintSet(NodeMap x) {
97 std::string ret = "{";
98 bool first = true;
99 for (size_t node_idx : BitsSetIn(x)) {
100 if (!first) {
101 ret += ",";
102 }
103 first = false;
104
105 char buf[256];
106 snprintf(buf, sizeof(buf), "R%zu", node_idx + 1);
107 ret += buf;
108 }
109 return ret + "}";
110}
111
112/*
113 FindNeighborhood() (see below) is crucial for speed. We can speed it up
114 somewhat by observing that it is often being called many times with the
115 same forbidden set and subgraphs that keep increasing; e.g., when we
116 have the neighborhood {R1,R2}, we need to calculate the neighborhood of
117 {R1}, {R2} and {R1,R2} -- the latter will start with calculating the
118 neighborhood of {R1} and then add {R2} from there. We cannot just union
119 the two neighborhoods due to hyperedges, but we can reuse the start.
120
121 To this end, NeighborhoodCache implements a simple one-element cache.
122 If we start a neighborhood computation that is a superset of the element
123 in the cache, we can just continue with the neighborhood it calculated
124 and add the missing elements. The overhead of managing the cache is a
125 ~15-20% loss for simple graphs with low degrees (e.g. chains), but a
126 huge speedup (60% or more) for graphs with high degrees, such as stars.
127 Given that the simple graphs are already so fast that their time is
128 hardly noticeable, this seems like a good overall tradeoff.
129
130 The default enumeration of power sets given by NonzeroSubsetsOf
131 (e.g. 000, 001, 010, 011, 100, etc. for three nodes in the neighborhood)
132 is not optimal for caching. E.g., for four bits, we can brute-force the
133 optimal order to be
134
135 0001 *0010 0011 *0100 0101 *0110 0111 1110 *1000 1010 1100 *1001
136 1011 *1110 1111
137
138 where we overwrite the element in the cache every time we process a subset
139 marked by *. This yields an optimal 17 loop iterations saved, leaving only 15.
140 However, it is not clear how to efficiently enumerate these orders and choice
141 of elements to cache realtime without huge precalculated tables (e.g., what is
142 the optimal order for 14 potential neighbors?), so instead, we keep the normal
143 order and add a simple heuristic: Keep every other item. The lowest bit will
144 change between 0 and 1 every iteration, so one that ends in 1 cannot possibly
145 be a subset of the the next enumerated subset. This yields:
146
147 0001 *0010 0011 *0100 0101 *0110 0111 *1000 1001 *1010 1011 *1100
148 1101 *1110 1111
149
150 which saves 16 loop iterations, nearly as good. (This pattern does not seem
151 to change markedly for larger subsets; the best pattern for five bits is
152 1 *2 3 6 *10 11 14 *4 *5 7 21 *13 15 *8 9 12 *24 25 26 *28 29 30 *16 17 20
153 *18 22 *19 23 *27 31, saving 49 bits where the heuristic saves 44. Optimal
154 patterns for more than five bits are not known.)
155
156 The only thing that really matters is keeping track of what the lowest bit
157 is; we call it the “taboo bit”, as when we process such a subset, it
158 signals that the result shouldn't replace whatever is in the cache.
159
160 Note that you cannot reuse the cache across calls with different forbidden
161 subsets; that would yield wrong results.
162 */
164 public:
165 explicit NeighborhoodCache(NodeMap neighborhood)
166 : m_taboo_bit(IsolateLowestBit(neighborhood)) {}
167
168 // Tell the cache we intend to start a neighborhood search.
169 // If the cache can reduce our workload, it will update the
170 // two neighborhoods. Returns the actual set of bits we need
171 // to compute the neighborhood for (whether it could save
172 // anything or not).
173 inline NodeMap InitSearch(NodeMap just_grown_by, NodeMap *neighborhood,
174 NodeMap *full_neighborhood) {
175 if (IsSubset(m_last_just_grown_by, just_grown_by)) {
176 // We can use our cache from the last node and continue the search from
177 // there.
178 *full_neighborhood |= m_last_full_neighborhood;
179 *neighborhood = m_last_neighborhood;
180 return just_grown_by & ~m_last_just_grown_by;
181 }
182
183 // We need to do the entire search as usual.
184 return just_grown_by;
185 }
186
187 // Tell the cache we just computed a neighborhood. It can choose to
188 // store it to accelerate future InitSearch() calls.
189 inline void Store(NodeMap just_grown_by, NodeMap neighborhood,
190 NodeMap full_neighborhood) {
191 assert(IsSubset(neighborhood, full_neighborhood));
192 if (Overlaps(just_grown_by, m_taboo_bit)) return;
193
194 m_last_just_grown_by = just_grown_by;
195 m_last_full_neighborhood = full_neighborhood;
196 m_last_neighborhood = neighborhood;
197 }
198
199 private:
202 ~0; // Don't try to use the cache the first iteration.
205};
206
207/**
208 Find the neighborhood of the given subgraph (S); informally, the set of nodes
209 immediately reachable from that subgraph. There's an additional constraint
210 in that the edges used to do so must not touch the forbidden set of nodes (X).
211 The DPhyp paper calls this function N(S, X) (with a calligraphic N).
212
213 How to calculate the neighborhood efficiently is one of the least explicitly
214 described parts of the paper. The definition goes about as follows:
215
216 1. Find E↓'(S,X), the set of “interesting hypernodes” (outgoing edge
217 destinations from S). These are the (endpoints of) edges that have one
218 side entirely within S, that have the other side entirely _outside_ S,
219 and none of the sides touch the forbidden set X.
220 2. Minimize E↓'(S,X) by removing all “subsumed hypernodes”, giving E↓(S,X).
221 u subsumes v if it is a proper subset; if so, we can never go to where v
222 points to before we've been at u, so it's pointless to keep v.
223 3. For each hypernode in E↓(S,X), pick node with lowest index as a
224 representative, because our subset enumeration algorithms cannot
225 enumerate subsets of hypernodes, only subsets of normal nodes.
226 (Actually, any node that's part of the hypernode would do; it does not
227 even need to be consistent.) These nodes together constitute the
228 neighborhood.
229
230 There are a couple of points to note here:
231
232 First, adding more nodes than needed to the neighborhood does not affect
233 correctness of the algorithm, only speed. We try all combinations of
234 included/excluded for the neighborhood (2^N in the number of nodes),
235 so this covers all potential subgraphs; in theory, we could even just choose
236 all non-forbidden nodes and reduce to the algorithm known as DPhyp, it just
237 wouldn't be very efficient.
238
239 Second, step 3 means that we may very well end up with a non-connected
240 subgraph. This is harmless; we may eventually grow it to a connected one or
241 we may not, we just won't start looking for any complements until we have a
242 connected one (and we know whether it's connected or not based on whether we
243 saw it as a csg-cmp pair in the algorithm earlier).
244
245 Third, due to the way we grow our subgraph, only the nodes that we have just
246 grown by can contribute to the E↓'(S,X). The reason is simple; every node
247 from the previous neighborhood will have been added either to S or to X,
248 and both exclude them from the new neighborhood. (Step 2 doesn't affect this,
249 as any hypernode that was subsumed would also have to touch S or X.
250 But there's an exception in that occasionally, we can remove nodes from X;
251 see ExpandSubgraph().)
252
253 Fourth, perfect minimization seems to be impossible to actually implement
254 efficiently. This is known as the minimum set problem, and the best known
255 algorithms to do this are in O(n² / log n) time (see e.g. Pritchard: ”An Old
256 Sub-Quadratic Algorithm for Finding Extremal Sets”), which can be quite a
257 lot when there are lots of edges. (The trivial O(n²) algorithm is to just test
258 every set against every smaller set, and throw it out if it's a superset.)
259 Since loops in our hypergraphs are presumed to be fairly rare, it would not
260 seem worth it to do full minimization.
261
262 Instead, we pick the low-hanging fruit only: Every _simple_ edge is trivial
263 to test against. We just collect the simple edges into a mask, and any
264 (complex) hyperedge that overlaps with that bitmap can immediately be
265 discarded. Even more, since we don't have to pick min(S) but can pick
266 something arbitrary, we can let {R2,R3} (which gets R2 as its representative
267 node) subsume {R1,R2}, even though it's not an actual subset, by pretending
268 we picked R2 as the representative node for the latter! This is similar to
269 what Moerkotte describes in his “Building Query Compilers” document,
270 which seems to contain a slightly extended version of the DPhyp paper
271 (under a different name). We could have collected all the simple edges in a
272 separate pass first, but the microbenchmarks show that the added loop overhead
273 isn't worth it.
274
275 Note that we also keep E↓'(S,X), the set of interesting hypernodes; we
276 bitwise-or it into “full_neighborhood”. This is useful later when searching
277 for edges to connect the connected subgraph and its complement; we know that
278 only edges into “full_neighborhood” can connect the two.
279
280 This function accounts for roughly 20–70% of the total DPhyp running time,
281 depending on the shape of the graph (~40% average across the microbenchmarks).
282 It is fairly big to inline, but it helps speed significantly, probably due
283 to the large amount of parameters to be passed back and forth.
284 */
285inline NodeMap FindNeighborhood(const Hypergraph &g, NodeMap subgraph,
286 NodeMap forbidden, NodeMap just_grown_by,
287 NeighborhoodCache *cache,
288 NodeMap *full_neighborhood_arg) {
289 assert(IsSubset(just_grown_by, subgraph));
290
291 NodeMap full_neighborhood =
292 *full_neighborhood_arg; // Keep us out of aliasing trouble.
293 NodeMap neighborhood = 0;
294
295 NodeMap to_search =
296 cache->InitSearch(just_grown_by, &neighborhood, &full_neighborhood);
297 assert(IsSubset(neighborhood, full_neighborhood));
298
299 for (size_t node_idx : BitsSetIn(to_search)) {
300 // Simple edges.
301 // NOTE: This node's simple neighborhood will be added lazily to
302 // full_neighborhood below. Forbidden nodes will also be removed below.
303 neighborhood |= g.nodes[node_idx].simple_neighborhood;
304
305 // Now go through the complex edges and see which ones point out of the
306 // subgraph.
307 for (size_t edge_idx : g.nodes[node_idx].complex_edges) {
308 const Hyperedge e = g.edges[edge_idx];
309
310 if (IsSubset(e.left, subgraph) &&
311 !Overlaps(e.right, subgraph | forbidden)) {
312 // e.right is an interesting hypernode (part of E↓'(S,X)).
313 full_neighborhood |= e.right;
314 if (!Overlaps(e.right, neighborhood)) {
315 // e.right is also not subsumed by another edge (ie., it is part of
316 // E↓(S,X)), so add a “representative node” for it to the
317 // neighborhood.
318 //
319 // Is is possible to do the Overlaps() test above branch-free by using
320 // -int64_t(e.right & neighborhood) >> 63 as a mask (assuming we do
321 // not have more than 63 tables) but it seems to do better on some
322 // tests and worse on others, so it's not worth it.
323 neighborhood |= IsolateLowestBit(e.right);
324 }
325 }
326 }
327 }
328
329 neighborhood &= ~(subgraph | forbidden);
330 full_neighborhood |= neighborhood;
331
332 cache->Store(just_grown_by, neighborhood, full_neighborhood);
333
335 "Neighborhood of %s (calculated on %s) with forbidden %s = %s\n",
336 PrintSet(subgraph).c_str(), PrintSet(just_grown_by).c_str(),
337 PrintSet(forbidden).c_str(), PrintSet(neighborhood).c_str());
338
339 *full_neighborhood_arg = full_neighborhood;
340 return neighborhood;
341}
342
343// Given a subgraph of g, enumerate all possible complements that do
344// not include anything from the exclusion subset. Works by looking
345// at every possible node of the _neighborhood_ of the given subgraph
346// (see FindNeighborhood()); these are then used as seeds for growing
347// the complement graph.
348//
349// Called EmitCsg() in the DPhyp paper.
350template <class Receiver>
351[[nodiscard]] bool EnumerateComplementsTo(
352 const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph,
353 NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver) {
354 NodeMap forbidden = TablesBetween(0, lowest_node_idx);
355
356 HYPERGRAPH_PRINTF("Enumerating complements to %s, neighborhood=%s\n",
357 PrintSet(subgraph).c_str(), PrintSet(neighborhood).c_str());
358
359 neighborhood &= ~subgraph;
360
361 // Similar to EnumerateAllConnectedPartitions(), we start at seed nodes
362 // counting _backwards_, so that we consider larger and larger potential
363 // graphs. This is critical for the property that we want to enumerate smaller
364 // subsets before larger ones.
365 NeighborhoodCache cache(neighborhood);
366 for (size_t seed_idx : BitsSetInDescending(neighborhood)) {
367 // First consider a complement consisting solely of the seed node;
368 // see if we can find an edge (or multiple ones) connecting it
369 // to the given subgraph.
370 NodeMap seed = TableBitmap(seed_idx);
371 if (Overlaps(g.nodes[seed_idx].simple_neighborhood, subgraph)) {
372 for (size_t edge_idx : g.nodes[seed_idx].simple_edges) {
373 const Hyperedge e = g.edges[edge_idx];
374 assert(e.left == seed);
375 if (Overlaps(e.right, subgraph)) {
376 if (receiver->FoundSubgraphPair(subgraph, seed, edge_idx / 2)) {
377 return true;
378 }
379 }
380 }
381 }
382 for (size_t edge_idx : g.nodes[seed_idx].complex_edges) {
383 const Hyperedge e = g.edges[edge_idx];
384 if (e.left == seed && IsSubset(e.right, subgraph)) {
385 if (receiver->FoundSubgraphPair(subgraph, seed, edge_idx / 2)) {
386 return true;
387 }
388 }
389 }
390
391 // Grow the complement candidate along the neighborhoods to create
392 // a larger, connected complement. Note that we do this even if the
393 // the seed complement wasn't connected to our subgraph, since it
394 // might be connected as we add more nodes.
395 //
396 // Note that the extension of the forbidden set is required to avoid
397 // enumerating the same set twice; consider e.g. if you have a clique
398 // R1-R2-R3 and want to find complements to {R1} (ie., {R2,R3} is the
399 // neighborhood). When considering the seed {R3}, you don't want to be able
400 // to grow it into R2, since the {R2,R3} combination will be seen later when
401 // using {R2} as the seed. This is analogous to what we do in
402 // EnumerateAllConnectedPartitions(), and the whole reason for iterating
403 // backwards, but the DPhyp paper misses this. The “Building Query
404 // Compilers” document, however, seems to have corrected it.
405 NodeMap new_forbidden =
406 forbidden | subgraph | (neighborhood & TablesBetween(0, seed_idx));
407 NodeMap new_full_neighborhood = 0; // Unused; see comment on TryConnecting.
408 NodeMap new_neighborhood = FindNeighborhood(g, seed, new_forbidden, seed,
409 &cache, &new_full_neighborhood);
410 if (ExpandComplement(g, lowest_node_idx, subgraph, full_neighborhood, seed,
411 new_neighborhood, new_forbidden, receiver)) {
412 return true;
413 }
414 }
415 return false;
416}
417
418// Given a subgraph of g, grow it recursively along the neighborhood.
419// (The subgraph is not necessarily connected, but we hope it eventually
420// will be, or it won't be of much use to us.) If the subgraph is connected,
421// use it as base for enumerating a complement graph before growing it.
422//
423// Called EnumerateCsgRec() in the paper.
424template <class Receiver>
425[[nodiscard]] bool ExpandSubgraph(const Hypergraph &g, size_t lowest_node_idx,
426 NodeMap subgraph, NodeMap full_neighborhood,
427 NodeMap neighborhood, NodeMap forbidden,
428 Receiver *receiver) {
430 "Expanding connected subgraph, subgraph=%s neighborhood=%s "
431 "forbidden=%s\n",
432 PrintSet(subgraph).c_str(), PrintSet(neighborhood).c_str(),
433 PrintSet(forbidden).c_str());
434
435 // Given a neighborhood, try growing our subgraph by all possible
436 // combinations of included/excluded (except the one where all are
437 // excluded).
438 NeighborhoodCache cache(neighborhood);
439 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
441 "Trying to grow-and-complement %s by %s (out of %s) [connected=%d]\n",
442 PrintSet(subgraph).c_str(), PrintSet(grow_by).c_str(),
443 PrintSet(neighborhood).c_str(), receiver->HasSeen(subgraph | grow_by));
444
445 // See if the new subgraph is connected. The candidate subgraphs that are
446 // connected will previously have been seen as csg-cmp-pairs, and thus, we
447 // can ask the receiver!
448 NodeMap grown_subgraph = subgraph | grow_by;
449 if (receiver->HasSeen(grown_subgraph)) {
450 // Find the neighborhood of the new subgraph.
451 NodeMap new_full_neighborhood = full_neighborhood;
452 NodeMap new_neighborhood =
453 FindNeighborhood(g, subgraph | grow_by, forbidden, grow_by, &cache,
454 &new_full_neighborhood);
455
456 // EnumerateComplementsTo() resets the forbidden set, since nodes that
457 // were forbidden under this subgraph may very well be part of the
458 // complement. However, this also means that the neighborhood we just
459 // computed may be incomplete; it just looks at recently-added nodes,
460 // but there are older nodes that may have neighbors that we added to
461 // the forbidden set (X) instead of the subgraph itself (S). However,
462 // this is also the only time we add to the forbidden set, so we know
463 // exactly which nodes they are! Thus, simply add our forbidden set
464 // to the neighborhood for purposes of computing the complement.
465 //
466 // This behavior is tested in the SmallStar unit test.
467 new_neighborhood |= forbidden & ~TablesBetween(0, lowest_node_idx);
468
469 // This node's neighborhood is also part of the new neighborhood
470 // it's just not added to the forbidden set yet, so we missed it in
471 // the previous calculation).
472 new_neighborhood |= neighborhood;
473
474 if (EnumerateComplementsTo(g, lowest_node_idx, grown_subgraph,
475 new_full_neighborhood, new_neighborhood,
476 receiver)) {
477 return true;
478 }
479 }
480 }
481
482 // Now try to grow all the grown subgraphs into larger, connected subgraphs.
483 // Note that we do this even if the grown subgraph isn't connected, since it
484 // might be connected as we add more nodes.
485 //
486 // We need to do this after EnumerateComplementsTo() has run on all of them
487 // (in turn, generating csg-cmp-pairs and calling FoundSubgraphPair()),
488 // to guarantee that we will see any smaller subgraphs before larger ones.
489 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
490 HYPERGRAPH_PRINTF("Trying to grow-and-keep-growing %s by %s (out of %s)\n",
491 PrintSet(subgraph).c_str(), PrintSet(grow_by).c_str(),
492 PrintSet(neighborhood).c_str());
493 NodeMap grown_subgraph = subgraph | grow_by;
494
495 // Recursive calls are not allowed to add any of the nodes from
496 // our current neighborhood, since we're already trying all
497 // combinations of those ourselves.
498 NodeMap new_forbidden = (forbidden | neighborhood) & ~grown_subgraph;
499 assert(!IsSubset(grown_subgraph, new_forbidden));
500
501 // Find the neighborhood of the new subgraph.
502 NodeMap new_full_neighborhood = full_neighborhood;
503 NodeMap new_neighborhood =
504 FindNeighborhood(g, subgraph | grow_by, new_forbidden, grow_by, &cache,
505 &new_full_neighborhood);
506
507 if (ExpandSubgraph(g, lowest_node_idx, grown_subgraph,
508 new_full_neighborhood, new_neighborhood, new_forbidden,
509 receiver)) {
510 return true;
511 }
512 }
513 return false;
514}
515
516// Given a connected subgraph and a connected complement, see if they are
517// connected through some edge, and if so, which edge. (They may be connected
518// through multiple edges if there are loops in the graph.)
519//
520// In order to reduce the amount of searching for a connecting edge, we can use
521// the information about the subgraph's full neighborhood that we've been
522// connecting earlier. (This helps ~20% on the chain benchmark, and more on the
523// hypercycle benchmark.) The edge must touch something that's immediately
524// reachable from the subgraph (pretty much by definition), so we don't need to
525// look in all the nodes in the complement; those not in the subgraph's full
526// neighborhood cannot contain such edges.
527//
528// We could probably have kept full neighborhoods for both the subgraph and the
529// complement, and picked the one with fewest nodes to study, but it doesn't
530// seem to be worth it.
531template <class Receiver>
532[[nodiscard]] bool TryConnecting(const Hypergraph &g, NodeMap subgraph,
533 NodeMap subgraph_full_neighborhood,
534 NodeMap complement, Receiver *receiver) {
535 for (size_t node_idx : BitsSetIn(complement & subgraph_full_neighborhood)) {
536 // Simple edges.
537 if (Overlaps(g.nodes[node_idx].simple_neighborhood, subgraph)) {
538 for (size_t edge_idx : g.nodes[node_idx].simple_edges) {
539 // The tests are really IsSubset(), but Overlaps() is equivalent
540 // here, and slightly faster.
541 const Hyperedge e = g.edges[edge_idx];
542 if (Overlaps(e.right, subgraph) && Overlaps(e.left, complement)) {
543 if (receiver->FoundSubgraphPair(subgraph, complement, edge_idx / 2)) {
544 return true;
545 }
546 }
547 }
548 }
549
550 // Complex edges.
551 NodeMap node = TableBitmap(node_idx);
552 for (size_t edge_idx : g.nodes[node_idx].complex_edges) {
553 const Hyperedge e = g.edges[edge_idx];
554
555 // NOTE: We call IsolateLowestBit() so that we only see the edge once.
556 if (IsolateLowestBit(e.left) == node && IsSubset(e.left, complement) &&
557 IsSubset(e.right, subgraph)) {
558 if (receiver->FoundSubgraphPair(subgraph, complement, edge_idx / 2)) {
559 return true;
560 }
561 }
562 }
563 }
564 return false;
565}
566
567// Very similar to ExpandSubgraph: Given a connected subgraph of g and
568// another subgraph of g (its complement; not necessarily connected),
569// grow the complement recursively along the neighborhood. The former
570// subgraph stays unchanged through the recursion, while the second
571// is grown. If the complement at any point gets connected, see if we
572// can find a connection between the connected subgraph and complement;
573// if so, they form a so-called csg-cmp-pair. We tell the receiver about the
574// csg-cmp-pair, not only because it is the entire goal of the algorithm,
575// but because it will allow us to remember for later that the csg-cmp-pair
576// is connected. (This is used for connectivity testing, both in
577// ExpandSubgraph() and ExpandComplement().)
578//
579// Called EnumerateCmpRec() in the paper.
580template <class Receiver>
581[[nodiscard]] bool ExpandComplement(const Hypergraph &g, size_t lowest_node_idx,
582 NodeMap subgraph,
583 NodeMap subgraph_full_neighborhood,
584 NodeMap complement, NodeMap neighborhood,
585 NodeMap forbidden, Receiver *receiver) {
586 assert(IsSubset(subgraph, forbidden));
587 assert(!IsSubset(complement, forbidden));
588
590 "Trying to expand complement %s (subgraph is %s, forbidden is %s)\n",
591 PrintSet(complement).c_str(), PrintSet(subgraph).c_str(),
592 PrintSet(forbidden).c_str());
593
594 // Given a neighborhood, try growing our subgraph by all possible
595 // combinations of included/excluded (except the one where all are
596 // excluded).
597 //
598 // The only difference from ExpandSubgraph() here is that when we
599 // find a connected complement (and thus have two disjoint, connected
600 // subgraphs), we don't need to recurse to find a third subgraph;
601 // we can just check whether they are connected, and if so, tell
602 // the receiver.
603 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
604 NodeMap grown_complement = complement | grow_by;
605 if (receiver->HasSeen(grown_complement)) {
606 if (TryConnecting(g, subgraph, subgraph_full_neighborhood,
607 grown_complement, receiver)) {
608 return true;
609 }
610 }
611 }
612
613 // Same logic as in ExpandSubgraph:
614 //
615 // Try to grow all the grown complements into larger, connected complements.
616 // Note that we do this even if the grown complement isn't connected, since it
617 // might be connected as we add more nodes.
618 //
619 // We need to do this after FoundSubgraphPair() has run on all of them,
620 // to guarantee that we will see any smaller subgraphs before larger ones.
621 NeighborhoodCache cache(neighborhood);
622 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
623 HYPERGRAPH_PRINTF("Trying to grow complement %s by %s (out of %s)\n",
624 PrintSet(complement).c_str(), PrintSet(grow_by).c_str(),
625 PrintSet(neighborhood).c_str());
626 NodeMap grown_complement = complement | grow_by;
627
628 // Recursive calls are not allowed to add any of the nodes from
629 // our current neighborhood, since we're already trying all
630 // combinations of those ourselves.
631 NodeMap new_forbidden = (forbidden | neighborhood) & ~grown_complement;
632 assert(!IsSubset(grown_complement, new_forbidden));
633
634 // Find the neighborhood of the new complement.
635 NodeMap new_full_neighborhood = 0; // Unused; see comment on TryConnecting.
636 NodeMap new_neighborhood =
637 FindNeighborhood(g, complement | grow_by, new_forbidden, grow_by,
638 &cache, &new_full_neighborhood);
639
640 if (ExpandComplement(g, lowest_node_idx, subgraph,
641 subgraph_full_neighborhood, grown_complement,
642 new_neighborhood, new_forbidden, receiver)) {
643 return true;
644 }
645 }
646 return false;
647}
648
649// Consider increasing subsets of the graph, backwards; first only the
650// last node (say, R6), then {R5,R6} with R5 as the seed, then
651// {R4,R5,R6} with R4 as the seed, and so on. From the single-node
652// seed, we grow the connected subgraph recursively into new connected
653// subgraphs; when we such a new subgraph (the paper calls it a csg),
654// we do two things with it:
655//
656// 1. Keep growing it into new and even larger subgraphs.
657// 2. Look for _another_, separate subgraph (the paper calls it a
658// complement, or cmp) that can be connected to our subgraph.
659// If we find one such pair (a csg-cmp-pair), that's what the
660// algorithm fundamentally is looking for.
661//
662// Called Solve() in the DPhyp paper.
663//
664// If at any point receiver->FoundSingleNode() or receiver->FoundSubgraphPair()
665// returns true, the algorithm will abort, and this function also return true.
666template <class Receiver>
667bool EnumerateAllConnectedPartitions(const Hypergraph &g, Receiver *receiver) {
668 for (int seed_idx = g.nodes.size() - 1; seed_idx >= 0; --seed_idx) {
669 if (receiver->FoundSingleNode(seed_idx)) {
670 return true;
671 }
672
673 NodeMap seed = TableBitmap(seed_idx);
674 HYPERGRAPH_PRINTF("\n\nStarting main iteration at node %s\n",
675 PrintSet(seed).c_str());
676 NodeMap forbidden = TablesBetween(0, seed_idx);
677 NodeMap full_neighborhood = 0;
678 NeighborhoodCache cache(0);
679 NodeMap neighborhood =
680 FindNeighborhood(g, seed, forbidden, seed, &cache, &full_neighborhood);
681 if (EnumerateComplementsTo(g, seed_idx, seed, full_neighborhood,
682 neighborhood, receiver)) {
683 return true;
684 }
685 if (ExpandSubgraph(g, seed_idx, seed, full_neighborhood, neighborhood,
686 forbidden | seed, receiver)) {
687 return true;
688 }
689 }
690 return false;
691}
692
693} // namespace hypergraph
694
695#endif // SUBGRAPH_ENUMERATION_H
uint64_t TablesBetween(unsigned start, unsigned end)
Definition: bit_utils.h:207
BitIteratorAdaptor< CountBitsDescending > BitsSetInDescending(uint64_t state)
Definition: bit_utils.h:126
bool IsSubset(uint64_t x, uint64_t y)
Definition: bit_utils.h:221
constexpr uint64_t TableBitmap(unsigned x)
Definition: bit_utils.h:178
uint64_t IsolateLowestBit(uint64_t x)
Definition: bit_utils.h:217
bool Overlaps(uint64_t x, uint64_t y)
Definition: bit_utils.h:229
BitIteratorAdaptor< CountBitsAscending > BitsSetIn(uint64_t state)
Definition: bit_utils.h:123
Definition: bit_utils.h:139
Definition: subgraph_enumeration.h:163
NodeMap m_last_just_grown_by
Definition: subgraph_enumeration.h:201
NodeMap m_last_neighborhood
Definition: subgraph_enumeration.h:204
const NodeMap m_taboo_bit
Definition: subgraph_enumeration.h:200
NeighborhoodCache(NodeMap neighborhood)
Definition: subgraph_enumeration.h:165
NodeMap InitSearch(NodeMap just_grown_by, NodeMap *neighborhood, NodeMap *full_neighborhood)
Definition: subgraph_enumeration.h:173
NodeMap m_last_full_neighborhood
Definition: subgraph_enumeration.h:203
void Store(NodeMap just_grown_by, NodeMap neighborhood, NodeMap full_neighborhood)
Definition: subgraph_enumeration.h:189
static char buf[MAX_BUF]
Definition: conf_to_src.cc:73
Definition of an undirected (join) hypergraph.
Definition: buf0block_hint.cc:30
Definition: hypergraph.cc:30
NodeMap FindNeighborhood(const Hypergraph &g, NodeMap subgraph, NodeMap forbidden, NodeMap just_grown_by, NeighborhoodCache *cache, NodeMap *full_neighborhood_arg)
Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable f...
Definition: subgraph_enumeration.h:285
std::string PrintSet(NodeMap x)
Definition: subgraph_enumeration.h:96
bool EnumerateComplementsTo(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver)
Definition: subgraph_enumeration.h:351
bool EnumerateAllConnectedPartitions(Receiver *receiver)
bool TryConnecting(const Hypergraph &g, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, Receiver *receiver)
Definition: subgraph_enumeration.h:532
bool ExpandComplement(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
Definition: subgraph_enumeration.h:581
uint64_t NodeMap
Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 6...
Definition: node_map.h:40
bool ExpandSubgraph(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
Definition: subgraph_enumeration.h:425
Definition: hypergraph.h:81
NodeMap left
Definition: hypergraph.h:86
NodeMap right
Definition: hypergraph.h:87
Definition: hypergraph.h:90
Mem_root_array< Node > nodes
Definition: hypergraph.h:93
Mem_root_array< Hyperedge > edges
Definition: hypergraph.h:94
#define HYPERGRAPH_PRINTF(...)
Definition: subgraph_enumeration.h:88