MySQL 8.0.32
Source Code Documentation

Tracks which tuple streams follow which orders, and in particular whether they follow interesting orders. More...
#include "my_table_map.h"
#include "sql/join_optimizer/interesting_orders_defs.h"
#include "sql/mem_root_array.h"
#include "sql/sql_array.h"
#include <bitset>
#include <string>
Go to the source code of this file.
Classes  
struct  FunctionalDependency 
class  LogicalOrderings 
struct  LogicalOrderings::ItemInfo 
struct  LogicalOrderings::NFSMState 
struct  LogicalOrderings::DFSMState 
struct  LogicalOrderings::NFSMEdge 
struct  LogicalOrderings::DFSMEdge 
struct  LogicalOrderings::OrderingWithInfo 
Typedefs  
using  Ordering = Bounds_checked_array< OrderElement > 
Tracks which tuple streams follow which orders, and in particular whether they follow interesting orders.
An interesting order (and/or grouping) is one that we might need to sort by at some point during query execution (e.g. to satisfy an ORDER BY predicate); if the rows already are produced in that order, for instance because we scanned along the right index, we can skip the sort and get a lower cost.
We generally follow these papers:
[Neu04] Neumann and Moerkotte: “An efficient framework for order optimization” [Neu04b] Neumann and Moerkotte: “A Combined Framework for Grouping and Order Optimization”
[Neu04b] is an updated version of [Neu04] that also deals with interesting groupings but omits some details to make more space, so both are needed. A combined and updated version of the same material is available in Moerkotte's “Query compilers” PDF.
Some further details, like order homogenization, come from
[Sim96] Simmen et al: “Fundamental Techniques for Order Optimization”
All three papers deal with the issue of logical orderings, where any row stream may follow more than one order simultaneously, as inferred through functional dependencies (FDs). For instance, if we have an ordering (ab) but also an active FD {a} → c (c is uniquely determined by a, for instance because a is a primary key in the same table as c), this means we also implicitly follow the orders (acb) and (abc). In addition, we trivially follow the orders (a), (ac) and (ab). However, note that we do not necessarily follow the order (cab).
Similarly, equivalences, such as WHERE conditions and joins, give rise to a stronger form of FDs. If we have an ordering (ab) and the FD b = c, we can be said to follow (ac), (acb) or (abc). The former would not be inferable from {b} → c and {c} → b alone. Equivalences with constants are perhaps even stronger, e.g. WHERE x=3 would give rise to {} → x, which could extend (a) to (xa), (ax) or (x).
Neumann et al solve this by modelling which ordering we're following as a state in a nondeterministic finite state machine (NFSM). By repeatedly applying FDs (which become edges in the NFSM), we can build up all possible orderings from a base (which can be either the empty ordering, ordering from scanning along an index, or one produced by an explicit sort) and then checking whether we are in a state matching the ordering we are interested in. (There can be quite a bit of states, so we need a fair amount of pruning to keep the count manageable, or else performance will suffer.) Of course, since NFSMs are nondeterministic, a base ordering and a set of FDs can necessarily put us in a number of states, so we need to convert the NFSM to a DFSM (using the standard powerset construction for NFAs; see ConvertNFSMToDFSM()). This means that the ordering state for an access path is only a single integer, the DFSM state number. When we activate more FDs, for instance because we apply joins, we will move throughout the DFSM into more attractive states. By checking simple precomputed lookup tables, we can quickly find whether a given DFSM state follows a given ordering.
The other kind of edges we follow are from the artificial starting state; they represent setting a specific ordering (e.g. because we sort by that ordering). This is set up in the NFSM and preserved in the DFSM.
The actual collection of FDs and interesting orders happen outside this class, in the caller.
A weakness in the approach is that transitive FDs are not always followed correctly. E.g., if we have an ordering (a), and FDs {a} → b and {b} → c, we will create (ab) and (abc), but not (ac). This is not a problem for equivalences, though, and most of the FDs we collect are equivalences. We do have some heuristics to produce a new FD {a} → c where it is relevant, but they are not always effective.
Neumann and Moerkotte distinguish between “testedfor” (O_T) and “producing” (O_P) orderings, where all orders are interesting but only some can be produced by explicit operators, such as sorts. Our implementation is exactly opposite; we allow every ordering to be produced (by means of sortahead), but there are orders that can be produced (e.g. when scanning an index) that are not interesting in themselves. Such orders can be pruned away early if we can show they do not produce anything interesting.
The operations related to interesting orders, in particular the concept of functional dependencies, are related to the ones we are doing when checking ONLY_FULL_GROUP_BY legality in sql/aggregate_check.h. However, there are some key differences as well:
Together, these mean that the code ends up being fairly different, and some cases are recognized by ONLY_FULL_GROUP_BY but not by interesting orders. (The actual FD collection happens in BuildInterestingOrders in join_optimizer.cc; see the comment there for FD differences.)
A note about nomenclature: Like Neumann et al, we use the term “ordering” (and “grouping”) instead of “order”, with the special exception of the alreadyestablished term “interesting order”.
using Ordering = Bounds_checked_array<OrderElement> 