MySQL 8.2.0
Source Code Documentation
Go to the documentation of this file.
1/* Copyright (c) 2020, 2023, Oracle and/or its affiliates.
3 This program is free software; you can redistribute it and/or modify
4 it under the terms of the GNU General Public License, version 2.0,
5 as published by the Free Software Foundation.
7 This program is also distributed with certain software (including
8 but not limited to OpenSSL) that is licensed under separate terms,
9 as designated in a particular file or component or in included license
10 documentation. The authors of MySQL hereby grant you an additional
11 permission to link the program and your derivative works with the
12 separately licensed software that they have included with MySQL.
14 This program is distributed in the hope that it will be useful,
15 but WITHOUT ANY WARRANTY; without even the implied warranty of
17 GNU General Public License, version 2.0, for more details.
19 You should have received a copy of the GNU General Public License
20 along with this program; if not, write to the Free Software
21 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
27 @file
29 The hypergraph join optimizer takes a query block and decides how to
30 execute it as fast as possible (within a given cost model), based on
31 the idea of expressing the join relations as edges in a hypergraph.
32 (See subgraph_enumeration.h for more details on the core algorithm,
33 or FindBestQueryPlan() for more information on overall execution.)
35 It is intended to eventually take over completely from the older join
36 optimizer based on prefix search ( and related code),
37 and is nearly feature complete, but is currently in the early stages
38 with a very simplistic cost model and certain limitations.
39 The most notable ones are that we do not support:
41 - Hints (except STRAIGHT_JOIN).
42 - TRADITIONAL and JSON formats for EXPLAIN (use FORMAT=tree).
43 - UPDATE.
45 There are also have many optimization features it does not yet support;
46 among them:
48 - Aggregation through a temporary table.
49 - Some range optimizer features (notably MIN/MAX optimization).
50 - Materialization of arbitrary access paths (note that nested loop
51 joins against these can enable a limited form of hash join
52 that preserves ordering on the left side).
53 */
55#include <string>
57class Query_block;
58class THD;
59struct AccessPath;
60struct JoinHypergraph;
63 The main entry point for the hypergraph join optimizer; takes in a query
64 block and returns an access path to execute it (or nullptr, for error).
65 It works as follows:
67 1. Convert the query block from MySQL's Table_ref structures into
68 a hypergraph (see make_join_hypergraph.h).
69 2. Find all legal subplans in the hypergraph, calculate costs for
70 them and create access paths -- if there are multiple ways to make a
71 given subplan (e.g. multiple join types, or joining {t1,t2,t3} can be
72 made through either {t1}-{t2,t3} or {t1,t2}-{t3}), keep only the cheapest
73 one. Filter predicates (from WHERE and pushed-down join conditions)
74 are added as soon down as it is legal, which is usually (but not
75 universally) optimal. The algorithm works so that we always see smaller
76 subplans first and then end at the complete join plan containing all the
77 tables in the query block.
78 3. Add an access path for non-pushable filter predicates.
79 4. Add extra access paths for operations done after the joining,
80 such as ORDER BY, GROUP BY, LIMIT, etc..
81 5. Make access paths for the filters in nodes made by #2
82 (see ExpandFilterAccessPaths()).
84 Materializing subqueries need some extra care. (These are typically IN
85 subqueries that for whatever reason could not be rewritten to semijoin,
86 e.g. because they have GROUP BY.) The decision on whether to materialize
87 or not needs to be done cost-based, and depends both on the inner and outer
88 query block, so it needs to be done cost-based. (Materializiation gives
89 a high up-front cost, but each execution is cheaper, so it will depend on
90 how many times we expect to execute the subquery and now expensive it is
91 to run unmaterialized.) Following the flow through the different steps:
93 First of all, these go through a stage known as in2exists, rewriting them
94 from e.g.
96 WHERE t1_outer.x IN ( SELECT t2.y FROM t2 GROUP BY ... )
98 to
100 WHERE EXISTS ( SELECT 1 FROM t2 GROUP BY ... HAVING t2.y = t1_outer.x )
102 This happens before the join optimizer, and the idea is that the HAVING
103 condition (known as a “created_by_in2exists condition”, possibly in WHERE
104 instead of HAVING) can be attempted pushed down into an index or similar,
105 giving more efficient execution. However, if we want to materialize the
106 subquery, these extra conditions need to be removed before materialization;
107 not only do they give the wrong result, but they can also need to wrong
108 costs and a suboptimal join order.
110 Thus, whenever we plan such a subquery, we plan it twice; once as usual,
111 and then a second time with all in2exists conditions removed. This gives
112 EstimateFilterCost() precise cost information for both cases, or at least
113 as precise as the cost model itself is. In the outer query block, we can
114 then weigh the two alternatives against each other when we add a filter
115 with such a subquery; we can choose to materialize it or not, and propose
116 both alternatives as with any other subplan. When we've decided on the
117 final plan, we go through all access paths and actually materialize the
118 subqueries it says to materialize.
120 There are lots of places these conditions can show up; to reduce complexity,
121 we only consider materialization in the most common places (filters on
122 base tables, filters after joins, filters from HAVING) -- in particular,
123 we don't bother checking on join conditions. It is never wrong to not
124 materialize a subquery, though it may be suboptimal.
127 Note that the access path returned by FindBestQueryPlan() is not ready
128 for immediate conversion to iterators; see FinalizePlanForQueryBlock().
129 You may call FindBestQueryPlan() any number of times for a query block,
130 but FinalizePlanForQueryBlock() only once, as finalization generates
131 temporary tables and may rewrite expressions in ways that are incompatible
132 with future planning. The difference is most striking with the planning
133 done twice by in2exists (see above).
135 @param thd Thread handle.
136 @param query_block The query block to find a plan for.
137 @param trace If not nullptr, will be filled with human-readable optimizer
138 trace showing some of the inner workings of the code.
139 */
141 std::string *trace);
143// See comment in .cc file.
146// Exposed for unit testing only.
147void FindSargablePredicates(THD *thd, std::string *trace,
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1161
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_lexer_thd.h:35
bool FinalizePlanForQueryBlock(THD *thd, Query_block *query_block)
void FindSargablePredicates(THD *thd, std::string *trace, JoinHypergraph *graph)
void EstimateAggregateCost(AccessPath *path)
void EstimateMaterializeCost(THD *thd, AccessPath *path)
AccessPath * FindBestQueryPlan(THD *thd, Query_block *query_block, std::string *trace)
The main entry point for the hypergraph join optimizer; takes in a query block and returns an access ...
static char * path
Access paths are a query planning structure that correspond 1:1 to iterators, in that an access path ...
Definition: access_path.h:193
A struct containing a join hypergraph of a single query block, encapsulating the constraints given by...
Definition: make_join_hypergraph.h:77
hypergraph::Hypergraph graph
Definition: make_join_hypergraph.h:86
const Query_block * query_block() const
Returns a pointer to the query block that is being planned.
Definition: make_join_hypergraph.h:149