MySQL 8.0.29
Source Code Documentation
subgraph_enumeration.h
Go to the documentation of this file.
1/* Copyright (c) 2020, 2021, Oracle and/or its affiliates.
2
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License, version 2.0,
5 as published by the Free Software Foundation.
6
7 This program is also distributed with certain software (including
8 but not limited to OpenSSL) that is licensed under separate terms,
9 as designated in a particular file or component or in included license
10 documentation. The authors of MySQL hereby grant you an additional
11 permission to link the program and your derivative works with the
12 separately licensed software that they have included with MySQL.
13
14 This program is distributed in the hope that it will be useful,
15 but WITHOUT ANY WARRANTY; without even the implied warranty of
16 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 GNU General Public License, version 2.0, for more details.
18
19 You should have received a copy of the GNU General Public License
20 along with this program; if not, write to the Free Software
21 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
22
23#ifndef SUBGRAPH_ENUMERATION_H
24#define SUBGRAPH_ENUMERATION_H 1
25
26/**
27 @file
28
29 This file implements the DPhyp algorithm for enumerating connected
30 subgraphs of hypergraphs (see hypergraph.h for a hypergraph definition).
31
32 The core idea of the algorithm is that if the join structure of a
33 query is expressed as a hypergraph, where the relations are nodes
34 and the join predicates are hyperedges, one can efficiently find
35 all legal join orders without Cartesian products by finding all
36 possible subpartitions of the hypergraph. (Simple inner joins will
37 have regular edges, but outer joins, antijoins etc., can be encoded
38 as hyperedges to constrain the allowed join orderings, so that we
39 do not join e.g. an inner and outer table together before said inner
40 table has been joined to the entire set. Also, hyper-predicates such
41 as t1.a + t2.b = t3.c will naturally give rise to hyperedges.)
42
43 The algorithm is described in the paper “Dynamic Programming Strikes
44 Back” by Neumann and Moerkotte. There is a somewhat extended version
45 of the paper (that also contains a few corrections) in Moerkotte's
46 treatise “Building Query Compilers”. Some critical details are still
47 missing, which we've had to fill in ourselves. We don't currently
48 implement the extension to generalized hypergraphs, but it should be
49 fairly straightforward to do later. The algorithm is simple in concept
50 but hard to grasp; we will only give a very rough outline here:
51
52 1. Pick a seed node of the graph.
53 2. Grow that seed along hyperedges, taking care never to make an
54 unconnected graph or seeing the same subgraph twice.
55 3. For each connected subgraph (csg): Repeat steps 1–2 independently
56 to create a separate connected subgraph (the so-called complement,
57 cmp), and try to connect the subgraph and its complement to create
58 a larger graph (a so-called csg-cmp-pair).
59 4. When such a csg-cmp-pair is found, call the receiver back with the
60 csg and cmp. This is a valid subjoin that can be costed.
61
62 The entry point for doing this is EnumerateAllConnectedPartitions().
63
64 For complex joins, we may have to run DPhyp multiple times in a mode
65 where we just count the number of partitions over various constrained
66 graphs, and this will be a critical part of query planning time.
67 Thus, it is coded as a template over a receiver type that gets callbacks
68 for each partition. If the receiver just interested in counting,
69 this saves a significant amount of call overhead. The templatization
70 also allows the microbenchmarks to more accurately measure changes in
71 the algorithm itself without having to benchmark the receiver.
72 */
73
74#include <assert.h>
75#include <string>
78
79// You can change 0 to 1 here to get debug traces of the algorithm
80// as it is working.
81#define DEBUGGING_DPHYP 0
82
83#if DEBUGGING_DPHYP
84#include <stdio.h>
85#define HYPERGRAPH_PRINTF printf
86#else
87#define HYPERGRAPH_PRINTF(...)
88#endif
89
90namespace hypergraph {
91
92template <class Receiver>
93bool EnumerateAllConnectedPartitions(Receiver *receiver);
94
95inline std::string PrintSet(NodeMap x) {
96 std::string ret = "{";
97 bool first = true;
98 for (size_t node_idx : BitsSetIn(x)) {
99 if (!first) {
100 ret += ",";
101 }
102 first = false;
103
104 char buf[256];
105 snprintf(buf, sizeof(buf), "R%zu", node_idx + 1);
106 ret += buf;
107 }
108 return ret + "}";
109}
110
111/*
112 FindNeighborhood() (see below) is crucial for speed. We can speed it up
113 somewhat by observing that it is often being called many times with the
114 same forbidden set and subgraphs that keep increasing; e.g., when we
115 have the neighborhood {R1,R2}, we need to calculate the neighborhood of
116 {R1}, {R2} and {R1,R2} -- the latter will start with calculating the
117 neighborhood of {R1} and then add {R2} from there. We cannot just union
118 the two neighborhoods due to hyperedges, but we can reuse the start.
119
120 To this end, NeighborhoodCache implements a simple one-element cache.
121 If we start a neighborhood computation that is a superset of the element
122 in the cache, we can just continue with the neighborhood it calculated
123 and add the missing elements. The overhead of managing the cache is a
124 ~15-20% loss for simple graphs with low degrees (e.g. chains), but a
125 huge speedup (60% or more) for graphs with high degrees, such as stars.
126 Given that the simple graphs are already so fast that their time is
127 hardly noticeable, this seems like a good overall tradeoff.
128
129 The default enumeration of power sets given by NonzeroSubsetsOf
130 (e.g. 000, 001, 010, 011, 100, etc. for three nodes in the neighborhood)
131 is not optimal for caching. E.g., for four bits, we can brute-force the
132 optimal order to be
133
134 0001 *0010 0011 *0100 0101 *0110 0111 1110 *1000 1010 1100 *1001
135 1011 *1110 1111
136
137 where we overwrite the element in the cache every time we process a subset
138 marked by *. This yields an optimal 17 loop iterations saved, leaving only 15.
139 However, it is not clear how to efficiently enumerate these orders and choice
140 of elements to cache realtime without huge precalculated tables (e.g., what is
141 the optimal order for 14 potential neighbors?), so instead, we keep the normal
142 order and add a simple heuristic: Keep every other item. The lowest bit will
143 change between 0 and 1 every iteration, so one that ends in 1 cannot possibly
144 be a subset of the the next enumerated subset. This yields:
145
146 0001 *0010 0011 *0100 0101 *0110 0111 *1000 1001 *1010 1011 *1100
147 1101 *1110 1111
148
149 which saves 16 loop iterations, nearly as good. (This pattern does not seem
150 to change markedly for larger subsets; the best pattern for five bits is
151 1 *2 3 6 *10 11 14 *4 *5 7 21 *13 15 *8 9 12 *24 25 26 *28 29 30 *16 17 20
152 *18 22 *19 23 *27 31, saving 49 bits where the heuristic saves 44. Optimal
153 patterns for more than five bits are not known.)
154
155 The only thing that really matters is keeping track of what the lowest bit
156 is; we call it the “taboo bit”, as when we process such a subset, it
157 signals that the result shouldn't replace whatever is in the cache.
158
159 Note that you cannot reuse the cache across calls with different forbidden
160 subsets; that would yield wrong results.
161 */
163 public:
164 explicit NeighborhoodCache(NodeMap neighborhood)
165 : m_taboo_bit(IsolateLowestBit(neighborhood)) {}
166
167 // Tell the cache we intend to start a neighborhood search.
168 // If the cache can reduce our workload, it will update the
169 // two neighborhoods. Returns the actual set of bits we need
170 // to compute the neighborhood for (whether it could save
171 // anything or not).
172 inline NodeMap InitSearch(NodeMap just_grown_by, NodeMap *neighborhood,
173 NodeMap *full_neighborhood) {
174 if (IsSubset(m_last_just_grown_by, just_grown_by)) {
175 // We can use our cache from the last node and continue the search from
176 // there.
177 *full_neighborhood |= m_last_full_neighborhood;
178 *neighborhood = m_last_neighborhood;
179 return just_grown_by & ~m_last_just_grown_by;
180 }
181
182 // We need to do the entire search as usual.
183 return just_grown_by;
184 }
185
186 // Tell the cache we just computed a neighborhood. It can choose to
187 // store it to accelerate future InitSearch() calls.
188 inline void Store(NodeMap just_grown_by, NodeMap neighborhood,
189 NodeMap full_neighborhood) {
190 assert(IsSubset(neighborhood, full_neighborhood));
191 if (Overlaps(just_grown_by, m_taboo_bit)) return;
192
193 m_last_just_grown_by = just_grown_by;
194 m_last_full_neighborhood = full_neighborhood;
195 m_last_neighborhood = neighborhood;
196 }
197
198 private:
201 ~0; // Don't try to use the cache the first iteration.
204};
205
206/**
207 Find the neighborhood of the given subgraph (S); informally, the set of nodes
208 immediately reachable from that subgraph. There's an additional constraint
209 in that the edges used to do so must not touch the forbidden set of nodes (X).
210 The DPhyp paper calls this function N(S, X) (with a calligraphic N).
211
212 How to calculate the neighborhood efficiently is one of the least explicitly
213 described parts of the paper. The definition goes about as follows:
214
215 1. Find E↓'(S,X), the set of “interesting hypernodes” (outgoing edge
216 destinations from S). These are the (endpoints of) edges that have one
217 side entirely within S, that have the other side entirely _outside_ S,
218 and none of the sides touch the forbidden set X.
219 2. Minimize E↓'(S,X) by removing all “subsumed hypernodes”, giving E↓(S,X).
220 u subsumes v if it is a proper subset; if so, we can never go to where v
221 points to before we've been at u, so it's pointless to keep v.
222 3. For each hypernode in E↓(S,X), pick node with lowest index as a
223 representative, because our subset enumeration algorithms cannot
224 enumerate subsets of hypernodes, only subsets of normal nodes.
225 (Actually, any node that's part of the hypernode would do; it does not
226 even need to be consistent.) These nodes together constitute the
227 neighborhood.
228
229 There are a couple of points to note here:
230
231 First, adding more nodes than needed to the neighborhood does not affect
232 correctness of the algorithm, only speed. We try all combinations of
233 included/excluded for the neighborhood (2^N in the number of nodes),
234 so this covers all potential subgraphs; in theory, we could even just choose
235 all non-forbidden nodes and reduce to the algorithm known as DPhyp, it just
236 wouldn't be very efficient.
237
238 Second, step 3 means that we may very well end up with a non-connected
239 subgraph. This is harmless; we may eventually grow it to a connected one or
240 we may not, we just won't start looking for any complements until we have a
241 connected one (and we know whether it's connected or not based on whether we
242 saw it as a csg-cmp pair in the algorithm earlier).
243
244 Third, due to the way we grow our subgraph, only the nodes that we have just
245 grown by can contribute to the E↓'(S,X). The reason is simple; every node
246 from the previous neighborhood will have been added either to S or to X,
247 and both exclude them from the new neighborhood. (Step 2 doesn't affect this,
248 as any hypernode that was subsumed would also have to touch S or X.
249 But there's an exception in that occasionally, we can remove nodes from X;
250 see ExpandSubgraph().)
251
252 Fourth, perfect minimization seems to be impossible to actually implement
253 efficiently. This is known as the minimum set problem, and the best known
254 algorithms to do this are in O(n² / log n) time (see e.g. Pritchard: ”An Old
255 Sub-Quadratic Algorithm for Finding Extremal Sets”), which can be quite a
256 lot when there are lots of edges. (The trivial O(n²) algorithm is to just test
257 every set against every smaller set, and throw it out if it's a superset.)
258 Since loops in our hypergraphs are presumed to be fairly rare, it would not
259 seem worth it to do full minimization.
260
261 Instead, we pick the low-hanging fruit only: Every _simple_ edge is trivial
262 to test against. We just collect the simple edges into a mask, and any
263 (complex) hyperedge that overlaps with that bitmap can immediately be
264 discarded. Even more, since we don't have to pick min(S) but can pick
265 something arbitrary, we can let {R2,R3} (which gets R2 as its representative
266 node) subsume {R1,R2}, even though it's not an actual subset, by pretending
267 we picked R2 as the representative node for the latter! This is similar to
268 what Moerkotte describes in his “Building Query Compilers” document,
269 which seems to contain a slightly extended version of the DPhyp paper
270 (under a different name). We could have collected all the simple edges in a
271 separate pass first, but the microbenchmarks show that the added loop overhead
272 isn't worth it.
273
274 Note that we also keep E↓'(S,X), the set of interesting hypernodes; we
275 bitwise-or it into “full_neighborhood”. This is useful later when searching
276 for edges to connect the connected subgraph and its complement; we know that
277 only edges into “full_neighborhood” can connect the two.
278
279 This function accounts for roughly 20–70% of the total DPhyp running time,
280 depending on the shape of the graph (~40% average across the microbenchmarks).
281 It is fairly big to inline, but it helps speed significantly, probably due
282 to the large amount of parameters to be passed back and forth.
283 */
284inline NodeMap FindNeighborhood(const Hypergraph &g, NodeMap subgraph,
285 NodeMap forbidden, NodeMap just_grown_by,
286 NeighborhoodCache *cache,
287 NodeMap *full_neighborhood_arg) {
288 assert(IsSubset(just_grown_by, subgraph));
289
290 NodeMap full_neighborhood =
291 *full_neighborhood_arg; // Keep us out of aliasing trouble.
292 NodeMap neighborhood = 0;
293
294 NodeMap to_search =
295 cache->InitSearch(just_grown_by, &neighborhood, &full_neighborhood);
296 assert(IsSubset(neighborhood, full_neighborhood));
297
298 for (size_t node_idx : BitsSetIn(to_search)) {
299 // Simple edges.
300 // NOTE: This node's simple neighborhood will be added lazily to
301 // full_neighborhood below. Forbidden nodes will also be removed below.
302 neighborhood |= g.nodes[node_idx].simple_neighborhood;
303
304 // Now go through the complex edges and see which ones point out of the
305 // subgraph.
306 for (size_t edge_idx : g.nodes[node_idx].complex_edges) {
307 const Hyperedge e = g.edges[edge_idx];
308
309 if (IsSubset(e.left, subgraph) &&
310 !Overlaps(e.right, subgraph | forbidden)) {
311 // e.right is an interesting hypernode (part of E↓'(S,X)).
312 full_neighborhood |= e.right;
313 if (!Overlaps(e.right, neighborhood)) {
314 // e.right is also not subsumed by another edge (ie., it is part of
315 // E↓(S,X)), so add a “representative node” for it to the
316 // neighborhood.
317 //
318 // Is is possible to do the Overlaps() test above branch-free by using
319 // -int64_t(e.right & neighborhood) >> 63 as a mask (assuming we do
320 // not have more than 63 tables) but it seems to do better on some
321 // tests and worse on others, so it's not worth it.
322 neighborhood |= IsolateLowestBit(e.right);
323 }
324 }
325 }
326 }
327
328 neighborhood &= ~(subgraph | forbidden);
329 full_neighborhood |= neighborhood;
330
331 cache->Store(just_grown_by, neighborhood, full_neighborhood);
332
334 "Neighborhood of %s (calculated on %s) with forbidden %s = %s\n",
335 PrintSet(subgraph).c_str(), PrintSet(just_grown_by).c_str(),
336 PrintSet(forbidden).c_str(), PrintSet(neighborhood).c_str());
337
338 *full_neighborhood_arg = full_neighborhood;
339 return neighborhood;
340}
341
342// Given a subgraph of g, enumerate all possible complements that do
343// not include anything from the exclusion subset. Works by looking
344// at every possible node of the _neighborhood_ of the given subgraph
345// (see FindNeighborhood()); these are then used as seeds for growing
346// the complement graph.
347//
348// Called EmitCsg() in the DPhyp paper.
349template <class Receiver>
350[[nodiscard]] bool EnumerateComplementsTo(
351 const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph,
352 NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver) {
353 NodeMap forbidden = TablesBetween(0, lowest_node_idx);
354
355 HYPERGRAPH_PRINTF("Enumerating complements to %s, neighborhood=%s\n",
356 PrintSet(subgraph).c_str(), PrintSet(neighborhood).c_str());
357
358 neighborhood &= ~subgraph;
359
360 // Similar to EnumerateAllConnectedPartitions(), we start at seed nodes
361 // counting _backwards_, so that we consider larger and larger potential
362 // graphs. This is critical for the property that we want to enumerate smaller
363 // subsets before larger ones.
364 NeighborhoodCache cache(neighborhood);
365 for (size_t seed_idx : BitsSetInDescending(neighborhood)) {
366 // First consider a complement consisting solely of the seed node;
367 // see if we can find an edge (or multiple ones) connecting it
368 // to the given subgraph.
369 NodeMap seed = TableBitmap(seed_idx);
370 if (Overlaps(g.nodes[seed_idx].simple_neighborhood, subgraph)) {
371 for (size_t edge_idx : g.nodes[seed_idx].simple_edges) {
372 const Hyperedge e = g.edges[edge_idx];
373 assert(e.left == seed);
374 if (Overlaps(e.right, subgraph)) {
375 if (receiver->FoundSubgraphPair(subgraph, seed, edge_idx / 2)) {
376 return true;
377 }
378 }
379 }
380 }
381 for (size_t edge_idx : g.nodes[seed_idx].complex_edges) {
382 const Hyperedge e = g.edges[edge_idx];
383 if (e.left == seed && IsSubset(e.right, subgraph)) {
384 if (receiver->FoundSubgraphPair(subgraph, seed, edge_idx / 2)) {
385 return true;
386 }
387 }
388 }
389
390 // Grow the complement candidate along the neighborhoods to create
391 // a larger, connected complement. Note that we do this even if the
392 // the seed complement wasn't connected to our subgraph, since it
393 // might be connected as we add more nodes.
394 //
395 // Note that the extension of the forbidden set is required to avoid
396 // enumerating the same set twice; consider e.g. if you have a clique
397 // R1-R2-R3 and want to find complements to {R1} (ie., {R2,R3} is the
398 // neighborhood). When considering the seed {R3}, you don't want to be able
399 // to grow it into R2, since the {R2,R3} combination will be seen later when
400 // using {R2} as the seed. This is analogous to what we do in
401 // EnumerateAllConnectedPartitions(), and the whole reason for iterating
402 // backwards, but the DPhyp paper misses this. The “Building Query
403 // Compilers” document, however, seems to have corrected it.
404 NodeMap new_forbidden =
405 forbidden | subgraph | (neighborhood & TablesBetween(0, seed_idx));
406 NodeMap new_full_neighborhood = 0; // Unused; see comment on TryConnecting.
407 NodeMap new_neighborhood = FindNeighborhood(g, seed, new_forbidden, seed,
408 &cache, &new_full_neighborhood);
409 if (ExpandComplement(g, lowest_node_idx, subgraph, full_neighborhood, seed,
410 new_neighborhood, new_forbidden, receiver)) {
411 return true;
412 }
413 }
414 return false;
415}
416
417// Given a subgraph of g, grow it recursively along the neighborhood.
418// (The subgraph is not necessarily connected, but we hope it eventually
419// will be, or it won't be of much use to us.) If the subgraph is connected,
420// use it as base for enumerating a complement graph before growing it.
421//
422// Called EnumerateCsgRec() in the paper.
423template <class Receiver>
424[[nodiscard]] bool ExpandSubgraph(const Hypergraph &g, size_t lowest_node_idx,
425 NodeMap subgraph, NodeMap full_neighborhood,
426 NodeMap neighborhood, NodeMap forbidden,
427 Receiver *receiver) {
429 "Expanding connected subgraph, subgraph=%s neighborhood=%s "
430 "forbidden=%s\n",
431 PrintSet(subgraph).c_str(), PrintSet(neighborhood).c_str(),
432 PrintSet(forbidden).c_str());
433
434 // Given a neighborhood, try growing our subgraph by all possible
435 // combinations of included/excluded (except the one where all are
436 // excluded).
437 NeighborhoodCache cache(neighborhood);
438 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
440 "Trying to grow-and-complement %s by %s (out of %s) [connected=%d]\n",
441 PrintSet(subgraph).c_str(), PrintSet(grow_by).c_str(),
442 PrintSet(neighborhood).c_str(), receiver->HasSeen(subgraph | grow_by));
443
444 // See if the new subgraph is connected. The candidate subgraphs that are
445 // connected will previously have been seen as csg-cmp-pairs, and thus, we
446 // can ask the receiver!
447 NodeMap grown_subgraph = subgraph | grow_by;
448 if (receiver->HasSeen(grown_subgraph)) {
449 // Find the neighborhood of the new subgraph.
450 NodeMap new_full_neighborhood = full_neighborhood;
451 NodeMap new_neighborhood =
452 FindNeighborhood(g, subgraph | grow_by, forbidden, grow_by, &cache,
453 &new_full_neighborhood);
454
455 // EnumerateComplementsTo() resets the forbidden set, since nodes that
456 // were forbidden under this subgraph may very well be part of the
457 // complement. However, this also means that the neighborhood we just
458 // computed may be incomplete; it just looks at recently-added nodes,
459 // but there are older nodes that may have neighbors that we added to
460 // the forbidden set (X) instead of the subgraph itself (S). However,
461 // this is also the only time we add to the forbidden set, so we know
462 // exactly which nodes they are! Thus, simply add our forbidden set
463 // to the neighborhood for purposes of computing the complement.
464 //
465 // This behavior is tested in the SmallStar unit test.
466 new_neighborhood |= forbidden & ~TablesBetween(0, lowest_node_idx);
467
468 // This node's neighborhood is also part of the new neighborhood
469 // it's just not added to the forbidden set yet, so we missed it in
470 // the previous calculation).
471 new_neighborhood |= neighborhood;
472
473 if (EnumerateComplementsTo(g, lowest_node_idx, grown_subgraph,
474 new_full_neighborhood, new_neighborhood,
475 receiver)) {
476 return true;
477 }
478 }
479 }
480
481 // Now try to grow all the grown subgraphs into larger, connected subgraphs.
482 // Note that we do this even if the grown subgraph isn't connected, since it
483 // might be connected as we add more nodes.
484 //
485 // We need to do this after EnumerateComplementsTo() has run on all of them
486 // (in turn, generating csg-cmp-pairs and calling FoundSubgraphPair()),
487 // to guarantee that we will see any smaller subgraphs before larger ones.
488 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
489 HYPERGRAPH_PRINTF("Trying to grow-and-keep-growing %s by %s (out of %s)\n",
490 PrintSet(subgraph).c_str(), PrintSet(grow_by).c_str(),
491 PrintSet(neighborhood).c_str());
492 NodeMap grown_subgraph = subgraph | grow_by;
493
494 // Recursive calls are not allowed to add any of the nodes from
495 // our current neighborhood, since we're already trying all
496 // combinations of those ourselves.
497 NodeMap new_forbidden = (forbidden | neighborhood) & ~grown_subgraph;
498 assert(!IsSubset(grown_subgraph, new_forbidden));
499
500 // Find the neighborhood of the new subgraph.
501 NodeMap new_full_neighborhood = full_neighborhood;
502 NodeMap new_neighborhood =
503 FindNeighborhood(g, subgraph | grow_by, new_forbidden, grow_by, &cache,
504 &new_full_neighborhood);
505
506 if (ExpandSubgraph(g, lowest_node_idx, grown_subgraph,
507 new_full_neighborhood, new_neighborhood, new_forbidden,
508 receiver)) {
509 return true;
510 }
511 }
512 return false;
513}
514
515// Given a connected subgraph and a connected complement, see if they are
516// connected through some edge, and if so, which edge. (They may be connected
517// through multiple edges if there are loops in the graph.)
518//
519// In order to reduce the amount of searching for a connecting edge, we can use
520// the information about the subgraph's full neighborhood that we've been
521// connecting earlier. (This helps ~20% on the chain benchmark, and more on the
522// hypercycle benchmark.) The edge must touch something that's immediately
523// reachable from the subgraph (pretty much by definition), so we don't need to
524// look in all the nodes in the complement; those not in the subgraph's full
525// neighborhood cannot contain such edges.
526//
527// We could probably have kept full neighborhoods for both the subgraph and the
528// complement, and picked the one with fewest nodes to study, but it doesn't
529// seem to be worth it.
530template <class Receiver>
531[[nodiscard]] bool TryConnecting(const Hypergraph &g, NodeMap subgraph,
532 NodeMap subgraph_full_neighborhood,
533 NodeMap complement, Receiver *receiver) {
534 for (size_t node_idx : BitsSetIn(complement & subgraph_full_neighborhood)) {
535 // Simple edges.
536 if (Overlaps(g.nodes[node_idx].simple_neighborhood, subgraph)) {
537 for (size_t edge_idx : g.nodes[node_idx].simple_edges) {
538 // The tests are really IsSubset(), but Overlaps() is equivalent
539 // here, and slightly faster.
540 const Hyperedge e = g.edges[edge_idx];
541 if (Overlaps(e.right, subgraph) && Overlaps(e.left, complement)) {
542 if (receiver->FoundSubgraphPair(subgraph, complement, edge_idx / 2)) {
543 return true;
544 }
545 }
546 }
547 }
548
549 // Complex edges.
550 NodeMap node = TableBitmap(node_idx);
551 for (size_t edge_idx : g.nodes[node_idx].complex_edges) {
552 const Hyperedge e = g.edges[edge_idx];
553
554 // NOTE: We call IsolateLowestBit() so that we only see the edge once.
555 if (IsolateLowestBit(e.left) == node && IsSubset(e.left, complement) &&
556 IsSubset(e.right, subgraph)) {
557 if (receiver->FoundSubgraphPair(subgraph, complement, edge_idx / 2)) {
558 return true;
559 }
560 }
561 }
562 }
563 return false;
564}
565
566// Very similar to ExpandSubgraph: Given a connected subgraph of g and
567// another subgraph of g (its complement; not necessarily connected),
568// grow the complement recursively along the neighborhood. The former
569// subgraph stays unchanged through the recursion, while the second
570// is grown. If the complement at any point gets connected, see if we
571// can find a connection between the connected subgraph and complement;
572// if so, they form a so-called csg-cmp-pair. We tell the receiver about the
573// csg-cmp-pair, not only because it is the entire goal of the algorithm,
574// but because it will allow us to remember for later that the csg-cmp-pair
575// is connected. (This is used for connectivity testing, both in
576// ExpandSubgraph() and ExpandComplement().)
577//
578// Called EnumerateCmpRec() in the paper.
579template <class Receiver>
580[[nodiscard]] bool ExpandComplement(const Hypergraph &g, size_t lowest_node_idx,
581 NodeMap subgraph,
582 NodeMap subgraph_full_neighborhood,
583 NodeMap complement, NodeMap neighborhood,
584 NodeMap forbidden, Receiver *receiver) {
585 assert(IsSubset(subgraph, forbidden));
586 assert(!IsSubset(complement, forbidden));
587
589 "Trying to expand complement %s (subgraph is %s, forbidden is %s)\n",
590 PrintSet(complement).c_str(), PrintSet(subgraph).c_str(),
591 PrintSet(forbidden).c_str());
592
593 // Given a neighborhood, try growing our subgraph by all possible
594 // combinations of included/excluded (except the one where all are
595 // excluded).
596 //
597 // The only difference from ExpandSubgraph() here is that when we
598 // find a connected complement (and thus have two disjoint, connected
599 // subgraphs), we don't need to recurse to find a third subgraph;
600 // we can just check whether they are connected, and if so, tell
601 // the receiver.
602 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
603 NodeMap grown_complement = complement | grow_by;
604 if (receiver->HasSeen(grown_complement)) {
605 if (TryConnecting(g, subgraph, subgraph_full_neighborhood,
606 grown_complement, receiver)) {
607 return true;
608 }
609 }
610 }
611
612 // Same logic as in ExpandSubgraph:
613 //
614 // Try to grow all the grown complements into larger, connected complements.
615 // Note that we do this even if the grown complement isn't connected, since it
616 // might be connected as we add more nodes.
617 //
618 // We need to do this after FoundSubgraphPair() has run on all of them,
619 // to guarantee that we will see any smaller subgraphs before larger ones.
620 NeighborhoodCache cache(neighborhood);
621 for (NodeMap grow_by : NonzeroSubsetsOf(neighborhood)) {
622 HYPERGRAPH_PRINTF("Trying to grow complement %s by %s (out of %s)\n",
623 PrintSet(complement).c_str(), PrintSet(grow_by).c_str(),
624 PrintSet(neighborhood).c_str());
625 NodeMap grown_complement = complement | grow_by;
626
627 // Recursive calls are not allowed to add any of the nodes from
628 // our current neighborhood, since we're already trying all
629 // combinations of those ourselves.
630 NodeMap new_forbidden = (forbidden | neighborhood) & ~grown_complement;
631 assert(!IsSubset(grown_complement, new_forbidden));
632
633 // Find the neighborhood of the new complement.
634 NodeMap new_full_neighborhood = 0; // Unused; see comment on TryConnecting.
635 NodeMap new_neighborhood =
636 FindNeighborhood(g, complement | grow_by, new_forbidden, grow_by,
637 &cache, &new_full_neighborhood);
638
639 if (ExpandComplement(g, lowest_node_idx, subgraph,
640 subgraph_full_neighborhood, grown_complement,
641 new_neighborhood, new_forbidden, receiver)) {
642 return true;
643 }
644 }
645 return false;
646}
647
648// Consider increasing subsets of the graph, backwards; first only the
649// last node (say, R6), then {R5,R6} with R5 as the seed, then
650// {R4,R5,R6} with R4 as the seed, and so on. From the single-node
651// seed, we grow the connected subgraph recursively into new connected
652// subgraphs; when we such a new subgraph (the paper calls it a csg),
653// we do two things with it:
654//
655// 1. Keep growing it into new and even larger subgraphs.
656// 2. Look for _another_, separate subgraph (the paper calls it a
657// complement, or cmp) that can be connected to our subgraph.
658// If we find one such pair (a csg-cmp-pair), that's what the
659// algorithm fundamentally is looking for.
660//
661// Called Solve() in the DPhyp paper.
662//
663// If at any point receiver->FoundSingleNode() or receiver->FoundSubgraphPair()
664// returns true, the algorithm will abort, and this function also return true.
665template <class Receiver>
666bool EnumerateAllConnectedPartitions(const Hypergraph &g, Receiver *receiver) {
667 for (int seed_idx = g.nodes.size() - 1; seed_idx >= 0; --seed_idx) {
668 if (receiver->FoundSingleNode(seed_idx)) {
669 return true;
670 }
671
672 NodeMap seed = TableBitmap(seed_idx);
673 HYPERGRAPH_PRINTF("\n\nStarting main iteration at node %s\n",
674 PrintSet(seed).c_str());
675 NodeMap forbidden = TablesBetween(0, seed_idx);
676 NodeMap full_neighborhood = 0;
677 NeighborhoodCache cache(0);
678 NodeMap neighborhood =
679 FindNeighborhood(g, seed, forbidden, seed, &cache, &full_neighborhood);
680 if (EnumerateComplementsTo(g, seed_idx, seed, full_neighborhood,
681 neighborhood, receiver)) {
682 return true;
683 }
684 if (ExpandSubgraph(g, seed_idx, seed, full_neighborhood, neighborhood,
685 forbidden | seed, receiver)) {
686 return true;
687 }
688 }
689 return false;
690}
691
692} // namespace hypergraph
693
694#endif // SUBGRAPH_ENUMERATION_H
uint64_t TablesBetween(unsigned start, unsigned end)
Definition: bit_utils.h:208
BitIteratorAdaptor< CountBitsDescending > BitsSetInDescending(uint64_t state)
Definition: bit_utils.h:133
bool IsSubset(uint64_t x, uint64_t y)
Definition: bit_utils.h:222
uint64_t IsolateLowestBit(uint64_t x)
Definition: bit_utils.h:218
bool Overlaps(uint64_t x, uint64_t y)
Definition: bit_utils.h:225
uint64_t TableBitmap(unsigned x)
Definition: bit_utils.h:185
BitIteratorAdaptor< CountBitsAscending > BitsSetIn(uint64_t state)
Definition: bit_utils.h:130
Definition: bit_utils.h:146
Definition: subgraph_enumeration.h:162
NodeMap m_last_just_grown_by
Definition: subgraph_enumeration.h:200
NodeMap m_last_neighborhood
Definition: subgraph_enumeration.h:203
const NodeMap m_taboo_bit
Definition: subgraph_enumeration.h:199
NeighborhoodCache(NodeMap neighborhood)
Definition: subgraph_enumeration.h:164
NodeMap InitSearch(NodeMap just_grown_by, NodeMap *neighborhood, NodeMap *full_neighborhood)
Definition: subgraph_enumeration.h:172
NodeMap m_last_full_neighborhood
Definition: subgraph_enumeration.h:202
void Store(NodeMap just_grown_by, NodeMap neighborhood, NodeMap full_neighborhood)
Definition: subgraph_enumeration.h:188
Definition of an undirected (join) hypergraph.
Definition: buf0block_hint.cc:29
Definition: hypergraph.cc:30
NodeMap FindNeighborhood(const Hypergraph &g, NodeMap subgraph, NodeMap forbidden, NodeMap just_grown_by, NeighborhoodCache *cache, NodeMap *full_neighborhood_arg)
Find the neighborhood of the given subgraph (S); informally, the set of nodes immediately reachable f...
Definition: subgraph_enumeration.h:284
std::string PrintSet(NodeMap x)
Definition: subgraph_enumeration.h:95
bool EnumerateComplementsTo(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, Receiver *receiver)
Definition: subgraph_enumeration.h:350
bool EnumerateAllConnectedPartitions(Receiver *receiver)
bool TryConnecting(const Hypergraph &g, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, Receiver *receiver)
Definition: subgraph_enumeration.h:531
bool ExpandComplement(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap subgraph_full_neighborhood, NodeMap complement, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
Definition: subgraph_enumeration.h:580
uint64_t NodeMap
Since our graphs can never have more than 61 tables, node sets and edge lists are implemented using 6...
Definition: node_map.h:39
bool ExpandSubgraph(const Hypergraph &g, size_t lowest_node_idx, NodeMap subgraph, NodeMap full_neighborhood, NodeMap neighborhood, NodeMap forbidden, Receiver *receiver)
Definition: subgraph_enumeration.h:424
Definition: hypergraph.h:76
NodeMap left
Definition: hypergraph.h:81
NodeMap right
Definition: hypergraph.h:82
Definition: hypergraph.h:85
std::vector< Hyperedge > edges
Definition: hypergraph.h:88
std::vector< Node > nodes
Definition: hypergraph.h:87
#define HYPERGRAPH_PRINTF(...)
Definition: subgraph_enumeration.h:87