MySQL 8.3.0
Source Code Documentation
interesting_orders.h
Go to the documentation of this file.
1/* Copyright (c) 2021, 2023, Oracle and/or its affiliates.
2
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License, version 2.0,
5 as published by the Free Software Foundation.
6
7 This program is also distributed with certain software (including
8 but not limited to OpenSSL) that is licensed under separate terms,
9 as designated in a particular file or component or in included license
10 documentation. The authors of MySQL hereby grant you an additional
11 permission to link the program and your derivative works with the
12 separately licensed software that they have included with MySQL.
13
14 This program is distributed in the hope that it will be useful,
15 but WITHOUT ANY WARRANTY; without even the implied warranty of
16 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 GNU General Public License, version 2.0, for more details.
18
19 You should have received a copy of the GNU General Public License
20 along with this program; if not, write to the Free Software
21 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
22
23#ifndef SQL_JOIN_OPTIMIZER_INTERESTING_ORDERS_H
24#define SQL_JOIN_OPTIMIZER_INTERESTING_ORDERS_H
25
26/**
27 @file
28
29 Tracks which tuple streams follow which orders, and in particular whether
30 they follow interesting orders.
31
32 An interesting order (and/or grouping) is one that we might need to sort by
33 at some point during query execution (e.g. to satisfy an ORDER BY predicate);
34 if the rows already are produced in that order, for instance because we
35 scanned along the right index, we can skip the sort and get a lower cost.
36
37 We generally follow these papers:
38
39 [Neu04] Neumann and Moerkotte: “An efficient framework for order
40 optimization”
41 [Neu04b] Neumann and Moerkotte: “A Combined Framework for
42 Grouping and Order Optimization”
43
44 [Neu04b] is an updated version of [Neu04] that also deals with interesting
45 groupings but omits some details to make more space, so both are needed.
46 A combined and updated version of the same material is available in
47 Moerkotte's “Query compilers” PDF.
48
49 Some further details, like order homogenization, come from
50
51 [Sim96] Simmen et al: “Fundamental Techniques for Order Optimization”
52
53 All three papers deal with the issue of _logical_ orderings, where any
54 row stream may follow more than one order simultaneously, as inferred
55 through functional dependencies (FDs). For instance, if we have an ordering
56 (ab) but also an active FD {a} → c (c is uniquely determined by a,
57 for instance because a is a primary key in the same table as c), this means
58 we also implicitly follow the orders (acb) and (abc). In addition,
59 we trivially follow the orders (a), (ac) and (ab). However, note that we
60 do _not_ necessarily follow the order (cab).
61
62 Similarly, equivalences, such as WHERE conditions and joins, give rise
63 to a stronger form of FDs. If we have an ordering (ab) and the FD b = c,
64 we can be said to follow (ac), (acb) or (abc). The former would not be
65 inferable from {b} → c and {c} → b alone. Equivalences with constants
66 are perhaps even stronger, e.g. WHERE x=3 would give rise to {} → x,
67 which could extend (a) to (xa), (ax) or (x).
68
69 Neumann et al solve this by modelling which ordering we're following as a
70 state in a non-deterministic finite state machine (NFSM). By repeatedly
71 applying FDs (which become edges in the NFSM), we can build up all possible
72 orderings from a base (which can be either the empty ordering, ordering from
73 scanning along an index, or one produced by an explicit sort) and then
74 checking whether we are in a state matching the ordering we are interested
75 in. (There can be quite a bit of states, so we need a fair amount of pruning
76 to keep the count manageable, or else performance will suffer.) Of course,
77 since NFSMs are nondeterministic, a base ordering and a set of FDs can
78 necessarily put us in a number of states, so we need to convert the NFSM
79 to a DFSM (using the standard powerset construction for NFAs; see
80 ConvertNFSMToDFSM()). This means that the ordering state for an access path
81 is only a single integer, the DFSM state number. When we activate more FDs,
82 for instance because we apply joins, we will move throughout the DFSM into
83 more attractive states. By checking simple precomputed lookup tables,
84 we can quickly find whether a given DFSM state follows a given ordering.
85
86 The other kind of edges we follow are from the artificial starting state;
87 they represent setting a specific ordering (e.g. because we sort by that
88 ordering). This is set up in the NFSM and preserved in the DFSM.
89
90 The actual collection of FDs and interesting orders happen outside this
91 class, in the caller.
92
93 A weakness in the approach is that transitive FDs are not always followed
94 correctly. E.g., if we have an ordering (a), and FDs {a} → b and {b} → c,
95 we will create (ab) and (abc), but _not_ (ac). This is not a problem for
96 equivalences, though, and most of the FDs we collect are equivalences.
97 We do have some heuristics to produce a new FD {a} → c where it is relevant,
98 but they are not always effective.
99
100 Neumann and Moerkotte distinguish between “tested-for” (O_T) and
101 “producing” (O_P) orderings, where all orders are interesting but only
102 some can be produced by explicit operators, such as sorts. Our implementation
103 is exactly opposite; we allow every ordering to be produced (by means of
104 sort-ahead), but there are orders that can be produced (e.g. when scanning
105 an index) that are not interesting in themselves. Such orders can be
106 pruned away early if we can show they do not produce anything interesting.
107
108
109 The operations related to interesting orders, in particular the concept
110 of functional dependencies, are related to the ones we are doing when
111 checking ONLY_FULL_GROUP_BY legality in sql/aggregate_check.h. However,
112 there are some key differences as well:
113
114 - Orderings are lexical, while groupings are just a bag of attributes.
115 This increases the state space for orderings significantly; groupings
116 can add elements at will and just grow the set freely, while orderings
117 need more care. In particular, this means that groupings only need
118 FDs on the form S → x (where S is a set), while orderings also benefit
119 from those of the type x = y, which replace an element instead of
120 adding a new one.
121
122 - ONLY_FULL_GROUP_BY is for correctness of rejecting or accepting the
123 query, while interesting orders is just an optimization, so not
124 recognizing rare cases is more acceptable.
125
126 - ONLY_FULL_GROUP_BY testing only cares about the set of FDs that hold
127 at one specific point (GROUP BY testing, naturally), while interesting
128 orders must be tracked throughout the entire operator tree. In particular,
129 if using interesting orders for merge join, the status at nearly every
130 join is relevant. Also, performance matters much more.
131
132 Together, these mean that the code ends up being fairly different,
133 and some cases are recognized by ONLY_FULL_GROUP_BY but not by interesting
134 orders. (The actual FD collection happens in BuildInterestingOrders in
135 join_optimizer.cc; see the comment there for FD differences.)
136
137 A note about nomenclature: Like Neumann et al, we use the term “ordering”
138 (and “grouping”) instead of “order”, with the special exception of the
139 already-established term “interesting order”.
140 */
141
142#include "my_table_map.h"
144#include "sql/key_spec.h"
145#include "sql/mem_root_array.h"
146#include "sql/sql_array.h"
147
148#include <bitset>
149#include <string>
150
151class LogicalOrderings;
152class Window;
153
154/**
155 Represents a (potentially interesting) ordering, rollup or (non-rollup)
156 grouping.
157*/
158class Ordering final {
159 friend bool operator==(const Ordering &a, const Ordering &b);
160
161 public:
162 /// This type hold the individual elements of the ordering.
164
165 /// The kind of ordering that an Ordering instance may represent.
166 enum class Kind : char {
167 /// An ordering with no elements. Such an ordering is not useful in itself,
168 /// but may appear as an intermediate result.
169 kEmpty,
170
171 /// Specific sequence of m_elements, and specific direction of each element.
172 /// Needed for e.g. ORDER BY.
173 kOrder,
174
175 /// Specific sequence of m_elements, but each element may be ordered in any
176 /// direction. Needed for ROLLUP:
177 kRollup,
178
179 /// Elements may appear in any sequence and may be ordered in any direction.
180 /// Needed for GROUP BY (with out ROLLUP), DISCTINCT, semi-join etc.
181 kGroup
182 };
183
184 Ordering() : m_kind{Kind::kEmpty} {}
185
186 Ordering(Elements elements, Kind kind) : m_elements{elements}, m_kind{kind} {
187 assert(Valid());
188 }
189
190 /// Copy constructor. Only defined explicitly to check Valid().
191 Ordering(const Ordering &other)
192 : m_elements{other.m_elements}, m_kind{other.m_kind} {
193 assert(Valid());
194 }
195
196 /// Assignment operator. Only defined explicitly to check Valid().
197 Ordering &operator=(const Ordering &other) {
198 assert(Valid());
199 m_kind = other.m_kind;
200 m_elements = other.m_elements;
201 return *this;
202 }
203
204 /// Make a copy of *this. Allocate new memory for m_elements from mem_root.
206 assert(Valid());
208 }
209
210 Kind GetKind() const {
211 assert(Valid());
212 return m_kind;
213 }
214
215 const Elements &GetElements() const {
216 assert(Valid());
217 return m_elements;
218 }
219
221 assert(Valid());
222 return m_elements;
223 }
224
225 size_t size() const { return m_elements.size(); }
226
227 /**
228 Remove duplicate entries, in-place.
229 */
230 void Deduplicate();
231
232 private:
233 /// The ordering terms.
235
236 /// The kind of this ordering.
238
239 /// @returns true iff *this passes a consistency check.
240 bool Valid() const;
241};
242
243/// Check if 'a' and 'b' has the same kind and contains the same elements.
244inline bool operator==(const Ordering &a, const Ordering &b) {
245 assert(a.Valid());
246 assert(b.Valid());
247 return a.m_kind == b.m_kind &&
250}
251
252inline bool operator!=(const Ordering &a, const Ordering &b) {
253 return !(a == b);
254}
255
257 enum {
258 // A special “empty” kind of edge in the FSM that signifies
259 // adding no functional dependency, ie., a state we can reach
260 // with no further effort. This can happen in two ways:
261 //
262 // 1. An ordering can drop its last element, ie.,
263 // if a tuple stream is ordered on (a,b,c), it is also
264 // ordered on (a,b).
265 // 2. An ordering can be converted to a grouping, i.e,
266 // if a tuple stream is ordered on (a,b,c), it is also
267 // grouped on {a,b,c}.
268 //
269 // head must be empty, tail must be 0. Often called ϵ.
270 // Must be the first in the edge list.
272
273 // A standard functional dependency {a} → b; if a row tuple
274 // is ordered on all elements of a and this FD is applied,
275 // it is also ordered on b. A typical example is if {a}
276 // is an unique key in a table, and b is a column of the
277 // same table. head can be empty.
279
280 // An equivalence a = b; implies a → b and b → a, but is
281 // stronger (e.g. if ordered on (a,c), there is also an
282 // ordering on (b,c), which wouldn't be true just from
283 // applying FDs individually). head must be a single element.
286
289
290 // Whether this functional dependency can always be applied, ie.,
291 // there is never a point during query processing where it does not hold.
292 //
293 // Examples of not-always-active FDs include join conditions;
294 // e.g. for t1.x = t2.x, it is not true before the join has actually
295 // happened (and t1.x won't be the same order as t2.x before that,
296 // and thus cannot be used in e.g. a merge join).
297 //
298 // However, FDs that stem from unique indexes are always true; e.g. if
299 // t1.x is a primary key, {t1.x} → t1.y will always be true, and we can
300 // always reduce t1.y from an order if t1.x is present earlier.
301 // Similarly, WHERE conditions that are applied on the base table
302 // (ie., it is not delayed due to outer joins) will always be true,
303 // if t1.x = 3, we can safely assume {} → t1.x holds even before
304 // joining in t1, so a sort on (t1.x, t2.y) can be satisfied just by
305 // sorting t2 on y.
306 //
307 // Always-active FDs are baked into the DFSM, so that we need to follow
308 // fewer arcs during query processing. They can also be used for reducing
309 // the final order (to get more efficient sorting), but we don't do it yet.
310 bool always_active = false;
311};
312
315
316 public:
317 explicit LogicalOrderings(THD *thd);
318
319 // Maps the Item to an opaque integer handle. Deduplicates items as we go,
320 // inserting new ones if needed.
322
323 Item *item(ItemHandle item) const { return m_items[item].item; }
324 int num_items() const { return m_items.size(); }
325
326 // These are only available before Build() has been called.
327
328 // Mark an interesting ordering (or grouping) as interesting,
329 // returning an index that can be given to SetOrder() later.
330 // Will deduplicate against previous entries; if not deduplicated
331 // away, a copy will be taken.
332 //
333 // Uninteresting orderings are those that can be produced by some
334 // operator (for instance, index scan) but are not interesting to
335 // test for. Orderings may be merged, pruned (if uninteresting)
336 // and moved around after Build(); see RemapOrderingIndex().
337 //
338 // If used_at_end is true, the ordering is assumed to be used only
339 // after all joins have happened, so all FDs are assumed to be
340 // active. This enables reducing the ordering more (which can in
341 // some cases help with better sortahead or the likes), but is not
342 // correct if the ordering wants to be used earlier on, e.g.
343 // in merge join or for semijoin duplicate removal. If it is false,
344 // then it is also only attempted homogenized onto the given set
345 // of tables (otherwise, it is ignored, and homogenization is over
346 // all tables).
347 //
348 // The empty ordering/grouping is always index 0.
349 int AddOrdering(THD *thd, Ordering order, bool interesting, bool used_at_end,
350 table_map homogenize_tables) {
351 return AddOrderingInternal(thd, order,
354 used_at_end, homogenize_tables);
355 }
356
357 // NOTE: Will include the empty ordering.
358 int num_orderings() const { return m_orderings.size(); }
359
360 const Ordering &ordering(int ordering_idx) const {
361 return m_orderings[ordering_idx].ordering;
362 }
363
364 bool ordering_is_relevant_for_sortahead(int ordering_idx) const {
365 return !m_orderings[ordering_idx].ordering.GetElements().empty() &&
366 m_orderings[ordering_idx].type != OrderingWithInfo::UNINTERESTING;
367 }
368
369 // Add a functional dependency that may be applied at some point
370 // during the query planning. Same guarantees as AddOrdering().
371 // The special “decay” FD is always index 0.
373
374 // NOTE: Will include the decay (epsilon) FD.
375 int num_fds() const { return m_fds.size(); }
376
377 // Set the list of GROUP BY expressions, if any. This is used as the
378 // head of the functional dependencies for all aggregate functions
379 // (which by definition are functionally dependent on the GROUP BY
380 // expressions, unless ROLLUP is active -- see below), and must be
381 // valid (ie., not freed or modified) until Build() has run.
382 //
383 // If none is set, and there are aggregates present in orderings,
384 // implicit aggregation is assumed (ie., all aggregate functions
385 // are constant).
387 m_aggregate_head = head;
388 }
389
390 // Set whether ROLLUP is active; if so, we can no longer assume that
391 // aggregate functions are functionally dependent on (nullable)
392 // GROUP BY expressions, as two NULLs may be for different reasons.
393 void SetRollup(bool rollup) { m_rollup = rollup; }
394
395 // Builds the actual FSMs; all information about orderings and FDs is locked,
396 // optimized and then the state machine is built. After this, you can no
397 // longer add new orderings or FDs, ie., you are moving into the actual
398 // planning phase.
399 //
400 // Build() may prune away orderings and FDs, and it may also add homogenized
401 // orderings, ie., orderings derived from given interesting orders but
402 // modified so that they only require a single table (but will become an
403 // actual interesting order later, after the FDs have been applied). These are
404 // usually at the end, but may also be deduplicated against uninteresting
405 // orders, which will then be marked as interesting.
406 //
407 // trace can be nullptr; if not, it get human-readable optimizer trace
408 // appended to it.
409 void Build(THD *thd, std::string *trace);
410
411 // These are only available after Build() has been called.
412 // They are stateless and used in the actual planning phase.
413
414 // Converts an index returned by AddOrdering() to one that can be given
415 // to SetOrder() or DoesFollowOrder(). They don't convert themselves
416 // since it's completely legitimate to iterate over all orderings using
417 // num_orderings() and orderings(), and those indexes should _not_ be
418 // remapped.
419 //
420 // If an ordering has been pruned away, will return zero (the empty ordering),
421 // which is a valid input to SetOrder().
422 int RemapOrderingIndex(int ordering_idx) const {
423 assert(m_built);
424 return m_optimized_ordering_mapping[ordering_idx];
425 }
426
427 using StateIndex = int;
428
429 StateIndex SetOrder(int ordering_idx) const {
430 assert(m_built);
431 return m_orderings[ordering_idx].state_idx;
432 }
433
434 // Get a bitmap representing the given functional dependency. The bitmap
435 // can be all-zero if the given FD is optimized away, or outside the range
436 // of the representable bits. The bitmaps may be ORed together, but are
437 // otherwise to be treated as opaque to the client.
440 int new_fd_idx = m_optimized_fd_mapping[fd_idx];
441 if (new_fd_idx >= 1 && new_fd_idx <= kMaxSupportedFDs) {
442 fd_set.set(new_fd_idx - 1);
443 }
444 return fd_set;
445 }
446
447 // For a given state, see what other (better) state we can move to given a
448 // set of active functional dependencies, e.g. if we are in state ((),a) and
449 // the FD a=b becomes active, we can set its bit (see GetFDSet()) in the FD
450 // mask and use that to move to the state ((),a,b,ab,ba). Note that “fds”
451 // should contain the entire set of active FDs, not just newly-applied ones.
452 // This is because “old” FDs can suddenly become relevant when new logical
453 // orderings are possible, and the DFSM is not always able to bake this in.
455
456 bool DoesFollowOrder(StateIndex state_idx, int ordering_idx) const {
457 assert(m_built);
458 if (ordering_idx == 0) {
459 return true;
460 }
461 if (ordering_idx >= kMaxSupportedOrderings) {
462 return false;
463 }
464 return m_dfsm_states[state_idx].follows_interesting_order.test(
465 ordering_idx);
466 }
467
468 // Whether "a" follows any interesting orders than "b" does not, or could
469 // do so in the future. If !MoreOrderedThan(a, b) && !MoreOrderedThan(b, a)
470 // the states are equal (they follow the same interesting orders, and could
471 // lead to the same interesting orders given the same FDs -- see below).
472 // It is possible to have MoreOrderedThan(a, b) && MoreOrderedThan(b, a), e.g.
473 // if they simply follow disjunct orders.
474 //
475 // This is used in the planner, when pruning access paths -- an AP A can be
476 // kept even if it has higher cost than AP B, if it follows orders that B does
477 // not. Why is it enough to check interesting orders -- must we also not check
478 // uninteresting orders, since they could lead to new interesting orders
479 // later? This is because in the planner, two states will only ever be
480 // compared if the same functional dependencies have been applied to both
481 // sides:
482 //
483 // The set of logical orders, and thus the state, is uniquely determined
484 // by the initial ordering and applied FDs. Thus, if A has _uninteresting_
485 // orders that B does not, the initial ordering must have differed -- but the
486 // initial states only contain (and thus differ in) interesting orders.
487 // Thus, the additional uninteresting orders must have been caused by
488 // additional interesting orders (that do not go away), so testing the
489 // interesting ones really suffices in planner context.
490 //
491 // Note that this also means that in planner context, !MoreOrderedThan(a, b)
492 // && !MoreOrderedThan(b, a) implies that a == b.
494 StateIndex a_idx, StateIndex b_idx,
495 std::bitset<kMaxSupportedOrderings> ignored_orderings) const {
496 assert(m_built);
497 std::bitset<kMaxSupportedOrderings> a =
498 m_dfsm_states[a_idx].follows_interesting_order & ~ignored_orderings;
499 std::bitset<kMaxSupportedOrderings> b =
500 m_dfsm_states[b_idx].follows_interesting_order & ~ignored_orderings;
501 std::bitset<kMaxSupportedOrderings> future_a =
502 m_dfsm_states[a_idx].can_reach_interesting_order & ~ignored_orderings;
503 std::bitset<kMaxSupportedOrderings> future_b =
504 m_dfsm_states[b_idx].can_reach_interesting_order & ~ignored_orderings;
505 return (a & b) != a || (future_a & future_b) != future_a;
506 }
507
508 // See comment in .cc file.
510 Ordering::Elements tmp) const;
511
512 private:
513 struct NFSMState;
514 class OrderWithElementInserted;
515
516 bool m_built = false;
517
518 struct ItemInfo {
519 // Used to translate Item * to ItemHandle and back.
521
522 // Points to the head of this item's equivalence class. (If the item
523 // is not equivalent to anything, points to itself.) The equivalence class
524 // is defined by EQUIVALENCE FDs, transitively, and the head is the one with
525 // the lowest index. So if we have FDs a = b and b = c, all three {a,b,c}
526 // will point to a here. This is useful for pruning and homogenization;
527 // if two elements have the same equivalence class (ie., the same canonical
528 // item), they could become equivalent after applying FDs. See also
529 // m_can_be_added_by_fd, which deals with non-EQUIVALENCE FDs.
530 //
531 // Set by BuildEquivalenceClasses().
533
534 // Whether the given item (after canonicalization by means of
535 // m_canonical_item[]) shows up as the tail of any functional dependency.
536 //
537 // Set by FindElementsThatCanBeAddedByFDs();
538 bool can_be_added_by_fd = false;
539
540 // Whether the given item ever shows up in orderings as ASC or DESC,
541 // respectively. Used to see whether adding the item in that direction
542 // is worthwhile or not. Note that this is propagated through equivalences,
543 // so if a = b and any ordering contains b DESC and a is the head of that
544 // equivalence class, then a is also marked as used_desc = true.
545 bool used_asc = false;
546 bool used_desc = false;
547 bool used_in_grouping = false;
548 };
549 // All items we have seen in use (in orderings or FDs), deduplicated
550 // and indexed by ItemHandle.
552
553 // Head for all FDs generated for aggregate functions.
554 // See SetHeadForAggregates().
556
557 // Whether rollup is active; if so, we need to take care not to create
558 // FDs for aggregates in some cases. See SetHeadForAggregates() and
559 // SetRollup().
560 bool m_rollup = false;
561
562 struct NFSMEdge {
563 // Which FD is required to follow this edge. Index into m_fd, with one
564 // exception; from the initial state (0), we have constructor edges for
565 // setting a specific order without following an FD. Such edges have
566 // required_fd_idx = INT_MIN + order_idx, ie., they are negative.
568
569 // Destination state (index into m_states).
571
573 const LogicalOrderings *orderings) const {
574 return &orderings->m_fds[required_fd_idx];
575 }
576 const NFSMState *state(const LogicalOrderings *orderings) const {
577 return &orderings->m_states[state_idx];
578 }
579 };
580
581 friend bool operator==(const NFSMEdge &a, const NFSMEdge &b);
582 friend bool operator!=(const NFSMEdge &a, const NFSMEdge &b);
583
584 struct NFSMState {
588 int satisfied_ordering_idx; // Only for type == INTERESTING.
589
590 // Indexed by ordering.
591 std::bitset<kMaxSupportedOrderings> can_reach_interesting_order{0};
592
593 // Used during traversal, to keep track of which states we have
594 // already seen (for fast deduplication). We use the standard trick
595 // of using a generational counter instead of a bool, so that we don't
596 // have to clear it every time; we can just increase the generation
597 // and treat everything with lower/different “seen” as unseen.
598 int seen = 0;
599 };
600 struct DFSMState {
601 Mem_root_array<int> outgoing_edges; // Index into dfsm_edges.
602 Mem_root_array<int> nfsm_states; // Index into states.
603
604 // Structures derived from the above, but in forms for faster access.
605
606 // Indexed by FD.
608
609 // Indexed by ordering.
610 std::bitset<kMaxSupportedOrderings> follows_interesting_order{0};
611
612 // Interesting orders that this state can eventually reach,
613 // given that all FDs are applied (a superset of follows_interesting_order).
614 // We track this instead of the producing orders (e.g. which homogenized
615 // order are we following), because it allows for more access paths to
616 // compare equal. See also OrderingWithInfo::Type::HOMOGENIZED.
617 std::bitset<kMaxSupportedOrderings> can_reach_interesting_order{0};
618
619 // Whether applying the given functional dependency will take us to a
620 // different state from this one. Used to quickly intersect with the
621 // available FDs to find out what we can apply.
623 };
624
625 struct DFSMEdge {
628
630 const LogicalOrderings *orderings) const {
631 return &orderings->m_fds[required_fd_idx];
632 }
633 const DFSMState *state(const LogicalOrderings *orderings) const {
634 return &orderings->m_dfsm_states[state_idx];
635 }
636 };
637
640
641 // Status of the ordering. Note that types with higher indexes dominate
642 // lower, ie., two orderings can be collapsed into the one with the higher
643 // type index if they are otherwise equal.
644 enum Type {
645 // An ordering that is interesting in its own right,
646 // e.g. because it is given to ORDER BY.
648
649 // An ordering that is derived from an interesting order, but refers to
650 // one table only (or conceptually, a subset of tables -- but we don't
651 // support that in this implementation). Guaranteed to reach some
652 // interesting order at some point, but we don't track it as interesting
653 // in the FSM states. This means that these orderings don't get state bits
654 // in follows_interesting_order for themselves, but they will always have
655 // one or more interesting orders in can_reach_interesting_order.
656 // This helps us collapse access paths more efficiently; if we have
657 // an interesting order t3.x and create homogenized orderings t1.x
658 // and t2.x (due to some equality with t3.x), an access path following one
659 // isn't better than an access path following the other. They will lead
660 // to the same given the same FDs anyway (see MoreOrderedThan()), and
661 // thus are equally good.
663
664 // An ordering that is just added because it is easy to produce;
665 // e.g. because it is produced by scanning along an index. Such orderings
666 // can be shortened or pruned away entirely (in
667 // PruneUninterestingOrders())
668 // unless we find that they may lead to an interesting order.
669 UNINTERESTING = 0
671
673
674 // Only used if used_at_end = false (see AddOrdering()).
676
677 // Which initial state to use for this ordering (in SetOrder()).
679 };
680
682
683 // The longest ordering in m_orderings.
685
687
688 // NFSM. 0 is the initial state, all others are found by following edges.
690
691 // DFSM. 0 is the initial state, all others are found by following edges.
694
695 // After PruneUninterestingOrders has run, maps from the old indexes to the
696 // new indexes.
698
699 // After PruneFDs() has run, maps from the old indexes to the new indexes.
701
702 /// We may potentially use a lot of Ordering::Elements objects, with short and
703 /// non-overlapping life times. Therefore we have a pool
704 /// to allow reuse and avoid allocating from MEM_ROOT each time.
706
707 // The canonical order for two items in a grouping
708 // (after BuildEquivalenceClasses() has run; enforced by
709 // RecanonicalizeGroupings()). The reason why we sort by
710 // canonical_item first is so that switching out one element
711 // with an equivalent one (ie., applying an EQUIVALENCE
712 // functional dependency) does not change the order of the
713 // elements in the grouing, which could give false negatives
714 // in CouldBecomeInterestingOrdering().
716 if (m_items[a].canonical_item != m_items[b].canonical_item)
717 return m_items[a].canonical_item < m_items[b].canonical_item;
718 return a < b;
719 }
720
721 inline bool ItemBeforeInGroup(const OrderElement &a,
722 const OrderElement &b) const {
723 return ItemHandleBeforeInGroup(a.item, b.item);
724 }
725
726 // Helper for AddOrdering().
728 bool used_at_end, table_map homogenize_tables);
729
730 // See comment in .cc file.
731 void PruneUninterestingOrders(THD *thd);
732
733 // See comment in .cc file.
734 void PruneFDs(THD *thd);
735
736 // See comment in .cc file.
738 bool all_fds) const;
739
740 // Populates ItemInfo::canonical_item.
742
743 // See comment in .cc file.
745
746 // See comment in .cc file.
747 void AddFDsFromComputedItems(THD *thd);
748
749 // See comment in .cc file.
750 void AddFDsFromAggregateItems(THD *thd);
751
752 // See comment in .cc file.
754 THD *thd, ItemHandle argument_item, Window *window);
755
756 // See comment in .cc file.
757 void AddFDsFromConstItems(THD *thd);
758
759 // Populates ItemInfo::can_be_added_by_fd.
761
762 void PreReduceOrderings(THD *thd);
767 THD *thd, const Ordering &reduced_ordering, bool used_at_end,
768 int table_idx,
769 Bounds_checked_array<std::pair<ItemHandle, ItemHandle>>
770 reverse_canonical);
771
772 /// Sort the elements so that a will appear before b if
773 /// ItemBeforeInGroup(a,b)==true.
774 void SortElements(Ordering::Elements elements) const;
775
776 // See comment in .cc file.
778
779 void BuildNFSM(THD *thd);
780 void AddRollupFromOrder(THD *thd, int state_idx, const Ordering &ordering);
781 void AddGroupingFromOrder(THD *thd, int state_idx, const Ordering &ordering);
782 void AddGroupingFromRollup(THD *thd, int state_idx, const Ordering &ordering);
783 void TryAddingOrderWithElementInserted(THD *thd, int state_idx, int fd_idx,
784 Ordering old_ordering,
785 size_t start_point,
786 ItemHandle item_to_add,
787 enum_order direction);
788 void PruneNFSM(THD *thd);
789 bool AlwaysActiveFD(int fd_idx);
790 void FinalizeDFSMState(THD *thd, int state_idx);
792 int *generation, int extra_allowed_fd_idx);
793 void ConvertNFSMToDFSM(THD *thd);
794
795 // Populates state_idx for every ordering in m_ordering.
797
798 // If a state with the given ordering already exists (artificial or not),
799 // returns its index. Otherwise, adds an artificial state with the given
800 // order and returns its index.
801 int AddArtificialState(THD *thd, const Ordering &ordering);
802
803 // Add an edge from state_idx to an state with the given ordering; if there is
804 // no such state, adds an artificial state with it (taking a copy, so does not
805 // need to take ownership).
806 void AddEdge(THD *thd, int state_idx, int required_fd_idx,
807 const Ordering &ordering);
808
809 // Returns true if the given (non-DECAY) functional dependency applies to the
810 // given ordering, and the index of the element from which the FD is active
811 // (ie., the last element that was part of the head). One can start inserting
812 // the tail element at any point _after_ this index; if it is an EQUIVALENCE
813 // FD, one can instead choose to replace the element at start_point entirely.
815 const Ordering &ordering,
816 int *start_point) const;
817
818 /**
819 Fetch an Ordering::Elements object with size()==m_longest_ordering.
820 Get it from m_elements_pool if that is non-empty, otherwise allocate
821 from mem_root.
822 */
824 if (m_elements_pool.empty()) {
826 } else {
828 m_elements_pool.pop_back();
830 }
831 }
832
833 /**
834 Return an Ordering::Elements object with size()==m_longest_ordering
835 to m_elements_pool.
836 */
838 // Overwrite the array with garbage, so that we have a better chance
839 // of detecting it if we by mistake access it afterwards.
840 TRASH(elements.data(), m_longest_ordering * sizeof(elements.data()[0]));
841 m_elements_pool.push_back(elements.data());
842 }
843 // Used for optimizer trace.
844
845 std::string PrintOrdering(const Ordering &ordering) const;
847 bool html) const;
848 void PrintFunctionalDependencies(std::string *trace);
849 void PrintInterestingOrders(std::string *trace);
850 void PrintNFSMDottyGraph(std::string *trace) const;
851 void PrintDFSMDottyGraph(std::string *trace) const;
852};
853
856 return a.required_fd_idx == b.required_fd_idx && a.state_idx == b.state_idx;
857}
858
861 return !(a == b);
862}
863
864#endif // SQL_JOIN_OPTIMIZER_INTERESTING_ORDERS_H
Element_type * data()
Definition: sql_array.h:115
static Bounds_checked_array Alloc(MEM_ROOT *mem_root, size_t size)
Definition: sql_array.h:69
const_iterator cbegin() const
Returns a pointer to the first element in the array.
Definition: sql_array.h:143
size_t size() const
Definition: sql_array.h:153
Bounds_checked_array Clone(MEM_ROOT *mem_root) const
Make a copy of '*this'. Allocate memory for m_array on 'mem_root'.
Definition: sql_array.h:74
const_iterator cend() const
Returns a pointer to the past-the-end element in the array.
Definition: sql_array.h:145
Base class that is used to represent any kind of expression in a relational query.
Definition: item.h:933
Definition: interesting_orders.h:313
void SetHeadForAggregates(Bounds_checked_array< ItemHandle > head)
Definition: interesting_orders.h:386
void AddRollupFromOrder(THD *thd, int state_idx, const Ordering &ordering)
Definition: interesting_orders.cc:1702
void AddEdge(THD *thd, int state_idx, int required_fd_idx, const Ordering &ordering)
Definition: interesting_orders.cc:1298
bool ItemHandleBeforeInGroup(ItemHandle a, ItemHandle b) const
Definition: interesting_orders.h:715
void CreateOrderingsFromGroupings(THD *thd)
We don't currently have any operators that only group and do not sort (e.g.
Definition: interesting_orders.cc:931
void AddFDsFromComputedItems(THD *thd)
Try to add new FDs from items that are not base items; e.g., if we have an item (a + 1),...
Definition: interesting_orders.cc:621
Mem_root_array< OrderingWithInfo > m_orderings
Definition: interesting_orders.h:681
bool ImpliedByEarlierElements(ItemHandle item, Ordering::Elements prefix, bool all_fds) const
Checks whether the given item is redundant given previous elements in the ordering; ie....
Definition: interesting_orders.cc:824
Bounds_checked_array< int > m_optimized_fd_mapping
Definition: interesting_orders.h:700
StateIndex SetOrder(int ordering_idx) const
Definition: interesting_orders.h:429
int num_orderings() const
Definition: interesting_orders.h:358
bool DoesFollowOrder(StateIndex state_idx, int ordering_idx) const
Definition: interesting_orders.h:456
bool m_rollup
Definition: interesting_orders.h:560
int m_longest_ordering
Definition: interesting_orders.h:684
Mem_root_array< DFSMState > m_dfsm_states
Definition: interesting_orders.h:692
bool m_built
Definition: interesting_orders.h:516
void AddHomogenizedOrderingIfPossible(THD *thd, const Ordering &reduced_ordering, bool used_at_end, int table_idx, Bounds_checked_array< std::pair< ItemHandle, ItemHandle > > reverse_canonical)
Helper function for CreateHomogenizedOrderings().
Definition: interesting_orders.cc:1109
friend bool operator==(const NFSMEdge &a, const NFSMEdge &b)
Definition: interesting_orders.h:854
std::string PrintOrdering(const Ordering &ordering) const
Definition: interesting_orders.cc:2123
Ordering ReduceOrdering(Ordering ordering, bool all_fds, Ordering::Elements tmp) const
Remove redundant elements using the functional dependencies that we have, to give a more canonical fo...
Definition: interesting_orders.cc:1092
int RemapOrderingIndex(int ordering_idx) const
Definition: interesting_orders.h:422
void FinalizeDFSMState(THD *thd, int state_idx)
Definition: interesting_orders.cc:1898
void AddGroupingFromRollup(THD *thd, int state_idx, const Ordering &ordering)
Definition: interesting_orders.cc:1682
friend bool operator!=(const NFSMEdge &a, const NFSMEdge &b)
Definition: interesting_orders.h:859
Mem_root_array< OrderElement * > m_elements_pool
We may potentially use a lot of Ordering::Elements objects, with short and non-overlapping life times...
Definition: interesting_orders.h:705
int num_items() const
Definition: interesting_orders.h:324
void PrintDFSMDottyGraph(std::string *trace) const
Definition: interesting_orders.cc:2274
bool ordering_is_relevant_for_sortahead(int ordering_idx) const
Definition: interesting_orders.h:364
void PreReduceOrderings(THD *thd)
Do safe reduction on all orderings (some of them may get merged by PruneUninterestingOrders() later),...
Definition: interesting_orders.cc:891
void SetRollup(bool rollup)
Definition: interesting_orders.h:393
void PrintFunctionalDependencies(std::string *trace)
Definition: interesting_orders.cc:2176
void CreateHomogenizedOrderings(THD *thd)
For each interesting ordering, see if we can homogenize it onto each table.
Definition: interesting_orders.cc:1019
Mem_root_array< NFSMState > m_states
Definition: interesting_orders.h:689
void SortElements(Ordering::Elements elements) const
Sort the elements so that a will appear before b if ItemBeforeInGroup(a,b)==true.
Definition: interesting_orders.cc:1175
int AddOrderingInternal(THD *thd, Ordering order, OrderingWithInfo::Type type, bool used_at_end, table_map homogenize_tables)
Definition: interesting_orders.cc:211
void AddFDsFromConstItems(THD *thd)
Try to add FDs from items that are constant by themselves, e.g.
Definition: interesting_orders.cc:735
int AddFunctionalDependency(THD *thd, FunctionalDependency fd)
Definition: interesting_orders.cc:266
void PrintNFSMDottyGraph(std::string *trace) const
Definition: interesting_orders.cc:2240
bool FunctionalDependencyApplies(const FunctionalDependency &fd, const Ordering &ordering, int *start_point) const
Definition: interesting_orders.cc:1316
Mem_root_array< DFSMEdge > m_dfsm_edges
Definition: interesting_orders.h:693
void ReturnElements(Ordering::Elements elements)
Return an Ordering::Elements object with size()==m_longest_ordering to m_elements_pool.
Definition: interesting_orders.h:837
bool MoreOrderedThan(StateIndex a_idx, StateIndex b_idx, std::bitset< kMaxSupportedOrderings > ignored_orderings) const
Definition: interesting_orders.h:493
bool AlwaysActiveFD(int fd_idx)
Definition: interesting_orders.cc:1893
void BuildNFSM(THD *thd)
Definition: interesting_orders.cc:1475
void BuildEquivalenceClasses()
Definition: interesting_orders.cc:518
int AddArtificialState(THD *thd, const Ordering &ordering)
Definition: interesting_orders.cc:1282
int StateIndex
Definition: interesting_orders.h:427
void PruneNFSM(THD *thd)
Try to prune away irrelevant nodes from the NFSM; it is worth spending some time on this,...
Definition: interesting_orders.cc:1762
Item * item(ItemHandle item) const
Definition: interesting_orders.h:323
const Ordering & ordering(int ordering_idx) const
Definition: interesting_orders.h:360
void FindElementsThatCanBeAddedByFDs()
Definition: interesting_orders.cc:796
void PrintInterestingOrders(std::string *trace)
Definition: interesting_orders.cc:2193
StateIndex ApplyFDs(StateIndex state_idx, FunctionalDependencySet fds) const
Definition: interesting_orders.cc:359
Bounds_checked_array< int > m_optimized_ordering_mapping
Definition: interesting_orders.h:697
int AddOrdering(THD *thd, Ordering order, bool interesting, bool used_at_end, table_map homogenize_tables)
Definition: interesting_orders.h:349
Ordering::Elements RetrieveElements(MEM_ROOT *mem_root)
Fetch an Ordering::Elements object with size()==m_longest_ordering.
Definition: interesting_orders.h:823
void ExpandThroughAlwaysActiveFDs(Mem_root_array< int > *nfsm_states, int *generation, int extra_allowed_fd_idx)
Definition: interesting_orders.cc:1915
void PruneFDs(THD *thd)
Definition: interesting_orders.cc:449
void PruneUninterestingOrders(THD *thd)
Try to get rid of uninteresting orders, possibly by discarding irrelevant suffixes and merging them w...
Definition: interesting_orders.cc:391
void AddFDsFromAggregateItems(THD *thd)
Definition: interesting_orders.cc:758
Bounds_checked_array< ItemHandle > CollectHeadForStaticWindowFunction(THD *thd, ItemHandle argument_item, Window *window)
Definition: interesting_orders.cc:580
Mem_root_array< ItemInfo > m_items
Definition: interesting_orders.h:551
ItemHandle GetHandle(Item *item)
Definition: interesting_orders.cc:1186
void FindInitialStatesForOrdering()
Definition: interesting_orders.cc:2111
void AddGroupingFromOrder(THD *thd, int state_idx, const Ordering &ordering)
Definition: interesting_orders.cc:1659
LogicalOrderings(THD *thd)
Definition: interesting_orders.cc:189
Mem_root_array< FunctionalDependency > m_fds
Definition: interesting_orders.h:686
void ConvertNFSMToDFSM(THD *thd)
From the NFSM, convert an equivalent DFSM.
Definition: interesting_orders.cc:1959
Bounds_checked_array< ItemHandle > m_aggregate_head
Definition: interesting_orders.h:555
void TryAddingOrderWithElementInserted(THD *thd, int state_idx, int fd_idx, Ordering old_ordering, size_t start_point, ItemHandle item_to_add, enum_order direction)
FunctionalDependencySet GetFDSet(int fd_idx) const
Definition: interesting_orders.h:438
void RecanonicalizeGroupings()
Definition: interesting_orders.cc:568
std::string PrintFunctionalDependency(const FunctionalDependency &fd, bool html) const
Definition: interesting_orders.cc:2143
void CreateOrderingsFromRollups(THD *thd)
bool ItemBeforeInGroup(const OrderElement &a, const OrderElement &b) const
Definition: interesting_orders.h:721
bool CouldBecomeInterestingOrdering(const Ordering &ordering) const
For a given ordering, check whether it ever has the hope of becoming an interesting ordering.
Definition: interesting_orders.cc:1224
int num_fds() const
Definition: interesting_orders.h:375
void Build(THD *thd, std::string *trace)
Definition: interesting_orders.cc:297
A typesafe replacement for DYNAMIC_ARRAY.
Definition: mem_root_array.h:425
A scope-guard class for allocating an Ordering::Elements instance which is automatically returned to ...
Definition: interesting_orders.cc:65
Represents a (potentially interesting) ordering, rollup or (non-rollup) grouping.
Definition: interesting_orders.h:158
Elements & GetElements()
Definition: interesting_orders.h:220
size_t size() const
Definition: interesting_orders.h:225
Kind
The kind of ordering that an Ordering instance may represent.
Definition: interesting_orders.h:166
@ kOrder
Specific sequence of m_elements, and specific direction of each element.
@ kGroup
Elements may appear in any sequence and may be ordered in any direction.
@ kRollup
Specific sequence of m_elements, but each element may be ordered in any direction.
@ kEmpty
An ordering with no elements.
Ordering & operator=(const Ordering &other)
Assignment operator. Only defined explicitly to check Valid().
Definition: interesting_orders.h:197
Kind GetKind() const
Definition: interesting_orders.h:210
Ordering()
Definition: interesting_orders.h:184
Ordering(Elements elements, Kind kind)
Definition: interesting_orders.h:186
void Deduplicate()
Remove duplicate entries, in-place.
Definition: interesting_orders.cc:153
Bounds_checked_array< OrderElement > Elements
This type hold the individual elements of the ordering.
Definition: interesting_orders.h:163
Kind m_kind
The kind of this ordering.
Definition: interesting_orders.h:237
Ordering Clone(MEM_ROOT *mem_root) const
Make a copy of *this. Allocate new memory for m_elements from mem_root.
Definition: interesting_orders.h:205
Elements m_elements
The ordering terms.
Definition: interesting_orders.h:234
friend bool operator==(const Ordering &a, const Ordering &b)
Check if 'a' and 'b' has the same kind and contains the same elements.
Definition: interesting_orders.h:244
Ordering(const Ordering &other)
Copy constructor. Only defined explicitly to check Valid().
Definition: interesting_orders.h:191
bool Valid() const
Definition: interesting_orders.cc:164
const Elements & GetElements() const
Definition: interesting_orders.h:215
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:35
Represents the (explicit) window of a SQL 2003 section 7.11 <window clause>, or the implicit (inlined...
Definition: window.h:104
static MEM_ROOT mem_root
Definition: client_plugin.cc:113
static bool equal(const Item *i1, const Item *i2, const Field *f2)
Definition: sql_select.cc:3835
bool operator!=(const Ordering &a, const Ordering &b)
Definition: interesting_orders.h:252
bool operator==(const Ordering &a, const Ordering &b)
Check if 'a' and 'b' has the same kind and contains the same elements.
Definition: interesting_orders.h:244
std::bitset< kMaxSupportedFDs > FunctionalDependencySet
Definition: interesting_orders_defs.h:62
static constexpr int kMaxSupportedFDs
Definition: interesting_orders_defs.h:61
static constexpr int kMaxSupportedOrderings
Definition: interesting_orders_defs.h:64
int ItemHandle
Definition: interesting_orders_defs.h:38
enum_order
Definition: key_spec.h:64
void TRASH(void *ptr, size_t length)
Put bad content in memory to be sure it will segfault if dereferenced.
Definition: memory_debugging.h:70
uint64_t table_map
Definition: my_table_map.h:29
mutable_buffer buffer(void *p, size_t n) noexcept
Definition: buffer.h:417
required string type
Definition: replication_group_member_actions.proto:33
Definition: interesting_orders.h:256
ItemHandle tail
Definition: interesting_orders.h:288
enum FunctionalDependency::@106 type
@ FD
Definition: interesting_orders.h:278
@ DECAY
Definition: interesting_orders.h:271
@ EQUIVALENCE
Definition: interesting_orders.h:284
bool always_active
Definition: interesting_orders.h:310
Bounds_checked_array< ItemHandle > head
Definition: interesting_orders.h:287
Definition: interesting_orders.h:625
int state_idx
Definition: interesting_orders.h:627
int required_fd_idx
Definition: interesting_orders.h:626
const DFSMState * state(const LogicalOrderings *orderings) const
Definition: interesting_orders.h:633
const FunctionalDependency * required_fd(const LogicalOrderings *orderings) const
Definition: interesting_orders.h:629
Definition: interesting_orders.h:600
FunctionalDependencySet can_use_fd
Definition: interesting_orders.h:622
Mem_root_array< int > nfsm_states
Definition: interesting_orders.h:602
std::bitset< kMaxSupportedOrderings > follows_interesting_order
Definition: interesting_orders.h:610
Mem_root_array< int > outgoing_edges
Definition: interesting_orders.h:601
std::bitset< kMaxSupportedOrderings > can_reach_interesting_order
Definition: interesting_orders.h:617
Bounds_checked_array< int > next_state
Definition: interesting_orders.h:607
Definition: interesting_orders.h:518
bool used_desc
Definition: interesting_orders.h:546
bool used_in_grouping
Definition: interesting_orders.h:547
ItemHandle canonical_item
Definition: interesting_orders.h:532
bool can_be_added_by_fd
Definition: interesting_orders.h:538
bool used_asc
Definition: interesting_orders.h:545
Item * item
Definition: interesting_orders.h:520
Definition: interesting_orders.h:562
const NFSMState * state(const LogicalOrderings *orderings) const
Definition: interesting_orders.h:576
int required_fd_idx
Definition: interesting_orders.h:567
const FunctionalDependency * required_fd(const LogicalOrderings *orderings) const
Definition: interesting_orders.h:572
int state_idx
Definition: interesting_orders.h:570
Definition: interesting_orders.h:584
Mem_root_array< NFSMEdge > outgoing_edges
Definition: interesting_orders.h:586
std::bitset< kMaxSupportedOrderings > can_reach_interesting_order
Definition: interesting_orders.h:591
int satisfied_ordering_idx
Definition: interesting_orders.h:588
int seen
Definition: interesting_orders.h:598
@ ARTIFICIAL
Definition: interesting_orders.h:585
@ DELETED
Definition: interesting_orders.h:585
@ INTERESTING
Definition: interesting_orders.h:585
Ordering satisfied_ordering
Definition: interesting_orders.h:587
enum LogicalOrderings::NFSMState::@107 type
Definition: interesting_orders.h:638
StateIndex state_idx
Definition: interesting_orders.h:678
Ordering ordering
Definition: interesting_orders.h:639
bool used_at_end
Definition: interesting_orders.h:672
Type
Definition: interesting_orders.h:644
@ UNINTERESTING
Definition: interesting_orders.h:669
@ INTERESTING
Definition: interesting_orders.h:647
@ HOMOGENIZED
Definition: interesting_orders.h:662
enum LogicalOrderings::OrderingWithInfo::Type type
table_map homogenize_tables
Definition: interesting_orders.h:675
The MEM_ROOT is a simple arena, where allocations are carved out of larger blocks.
Definition: my_alloc.h:82
Definition: interesting_orders_defs.h:43
ItemHandle item
Definition: interesting_orders_defs.h:44