MySQL  8.0.27
Source Code Documentation
join_optimizer.h
Go to the documentation of this file.
1 /* Copyright (c) 2020, 2021, Oracle and/or its affiliates.
2 
3  This program is free software; you can redistribute it and/or modify
4  it under the terms of the GNU General Public License, version 2.0,
5  as published by the Free Software Foundation.
6 
7  This program is also distributed with certain software (including
8  but not limited to OpenSSL) that is licensed under separate terms,
9  as designated in a particular file or component or in included license
10  documentation. The authors of MySQL hereby grant you an additional
11  permission to link the program and your derivative works with the
12  separately licensed software that they have included with MySQL.
13 
14  This program is distributed in the hope that it will be useful,
15  but WITHOUT ANY WARRANTY; without even the implied warranty of
16  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17  GNU General Public License, version 2.0, for more details.
18 
19  You should have received a copy of the GNU General Public License
20  along with this program; if not, write to the Free Software
21  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
22 
23 #ifndef SQL_JOIN_OPTIMIZER_JOIN_OPTIMIZER_H
24 #define SQL_JOIN_OPTIMIZER_JOIN_OPTIMIZER_H
25 
26 /**
27  @file
28 
29  The hypergraph join optimizer takes a query block and decides how to
30  execute it as fast as possible (within a given cost model), based on
31  the idea of expressing the join relations as edges in a hypergraph.
32  (See subgraph_enumeration.h for more details on the core algorithm,
33  or FindBestQueryPlan() for more information on overall execution.)
34 
35  It is intended to eventually take over completely from the older join
36  optimizer based on prefix search (sql_planner.cc and related code),
37  and is nearly feature complete, but is currently in the early stages
38  with a very simplistic cost model and certain limitations.
39  The most notable ones are that we do not support:
40 
41  - Hints (except STRAIGHT_JOIN).
42  - TRADITIONAL and JSON formats for EXPLAIN (use FORMAT=tree).
43  - Too large queries (too many possible subgraphs).
44 
45  For unsupported queries, we will return an error; every valid SQL
46  query should either give such an error a correct result set.
47 
48  There are also have many optimization features it does not yet support;
49  among them:
50 
51  - Aggregation through a temporary table.
52  - Queries with a very large amount of possible orderings, e.g. 30-way
53  star joins. (Less extreme queries, such as 30-way chain joins,
54  will be fine.) They will receive a similar error message as with
55  unsupported SQL features, instead of timing out.
56  */
57 
58 #include <string>
59 
60 class Query_block;
61 class THD;
62 struct AccessPath;
63 struct JoinHypergraph;
64 struct TABLE;
65 
66 /**
67  The main entry point for the hypergraph join optimizer; takes in a query
68  block and returns an access path to execute it (or nullptr, for error).
69  It works as follows:
70 
71  1. Convert the query block from MySQL's TABLE_LIST structures into
72  a hypergraph (see make_join_hypergraph.h).
73  2. Find all legal subplans in the hypergraph, calculate costs for
74  them and create access paths -- if there are multiple ways to make a
75  given subplan (e.g. multiple join types, or joining {t1,t2,t3} can be
76  made through either {t1}-{t2,t3} or {t1,t2}-{t3}), keep only the cheapest
77  one. Filter predicates (from WHERE and pushed-down join conditions)
78  are added as soon down as it is legal, which is usually (but not
79  universally) optimal. The algorithm works so that we always see smaller
80  subplans first and then end at the complete join plan containing all the
81  tables in the query block.
82  3. Add an access path for non-pushable filter predicates.
83  4. Add extra access paths for operations done after the joining,
84  such as ORDER BY, GROUP BY, LIMIT, etc..
85  5. Make access paths for the filters in nodes made by #2
86  (see ExpandFilterAccessPaths()).
87 
88  Materializing subqueries need some extra care. (These are typically IN
89  subqueries that for whatever reason could not be rewritten to semijoin,
90  e.g. because they have GROUP BY.) The decision on whether to materialize
91  or not needs to be done cost-based, and depends both on the inner and outer
92  query block, so it needs to be done cost-based. (Materializiation gives
93  a high up-front cost, but each execution is cheaper, so it will depend on
94  how many times we expect to execute the subquery and now expensive it is
95  to run unmaterialized.) Following the flow through the different steps:
96 
97  First of all, these go through a stage known as in2exists, rewriting them
98  from e.g.
99 
100  WHERE t1_outer.x IN ( SELECT t2.y FROM t2 GROUP BY ... )
101 
102  to
103 
104  WHERE EXISTS ( SELECT 1 FROM t2 GROUP BY ... HAVING t2.y = t1_outer.x )
105 
106  This happens before the join optimizer, and the idea is that the HAVING
107  condition (known as a “created_by_in2exists condition”, possibly in WHERE
108  instead of HAVING) can be attempted pushed down into an index or similar,
109  giving more efficient execution. However, if we want to materialize the
110  subquery, these extra conditions need to be removed before materialization;
111  not only do they give the wrong result, but they can also need to wrong
112  costs and a suboptimal join order.
113 
114  Thus, whenever we plan such a subquery, we plan it twice; once as usual,
115  and then a second time with all in2exists conditions removed. This gives
116  EstimateFilterCost() precise cost information for both cases, or at least
117  as precise as the cost model itself is. In the outer query block, we can
118  then weigh the two alternatives against each other when we add a filter
119  with such a subquery; we can choose to materialize it or not, and propose
120  both alternatives as with any other subplan. When we've decided on the
121  final plan, we go through all access paths and actually materialize the
122  subqueries it says to materialize.
123 
124  There are lots of places these conditions can show up; to reduce complexity,
125  we only consider materialization in the most common places (filters on
126  base tables, filters after joins, filters from HAVING) -- in particular,
127  we don't bother checking on join conditions. It is never wrong to not
128  materialize a subquery, though it may be suboptimal.
129 
130 
131  Note that the access path returned by FindBestQueryPlan() is not ready
132  for immediate conversion to iterators; see FinalizePlanForQueryBlock().
133  You may call FindBestQueryPlan() any number of times for a query block,
134  but FinalizePlanForQueryBlock() only once, as finalization generates
135  temporary tables and may rewrite expressions in ways that are incompatible
136  with future planning. The difference is most striking with the planning
137  done twice by in2exists (see above).
138 
139  @param thd Thread handle.
140  @param query_block The query block to find a plan for.
141  @param trace If not nullptr, will be filled with human-readable optimizer
142  trace showing some of the inner workings of the code.
143  */
145  std::string *trace);
146 
147 // See comment in .cc file.
148 bool FinalizePlanForQueryBlock(THD *thd, Query_block *query_block,
149  AccessPath *root_path);
150 
151 // Exposed for unit testing only.
152 void FindSargablePredicates(THD *thd, std::string *trace,
153  JoinHypergraph *graph);
154 
157 
158 #endif // SQL_JOIN_OPTIMIZER_JOIN_OPTIMIZER_H
This class represents a query block, aka a query specification, which is a query consisting of a SELE...
Definition: sql_lex.h:1123
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:821
void FindSargablePredicates(THD *thd, std::string *trace, JoinHypergraph *graph)
AccessPath * FindBestQueryPlan(THD *thd, Query_block *query_block, std::string *trace)
The main entry point for the hypergraph join optimizer; takes in a query block and returns an access ...
void EstimateAggregateCost(AccessPath *path)
Definition: cost_model.cc:231
bool FinalizePlanForQueryBlock(THD *thd, Query_block *query_block, AccessPath *root_path)
Definition: finalize_plan.cc:525
void EstimateMaterializeCost(THD *thd, AccessPath *path)
Definition: cost_model.cc:179
static char * path
Definition: mysqldump.cc:130
Access paths are a query planning structure that correspond 1:1 to iterators, in that an access path ...
Definition: access_path.h:176
A struct containing a join hypergraph of a single query block, encapsulating the constraints given by...
Definition: make_join_hypergraph.h:70
Definition: table.h:1394