MySQL  8.0.18
Source Code Documentation
row_iterator.h
Go to the documentation of this file.
1 #ifndef SQL_ROW_ITERATOR_H_
2 #define SQL_ROW_ITERATOR_H_
3 
4 /* Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights reserved.
5 
6  This program is free software; you can redistribute it and/or modify
7  it under the terms of the GNU General Public License, version 2.0,
8  as published by the Free Software Foundation.
9 
10  This program is also distributed with certain software (including
11  but not limited to OpenSSL) that is licensed under separate terms,
12  as designated in a particular file or component or in included license
13  documentation. The authors of MySQL hereby grant you an additional
14  permission to link the program and your derivative works with the
15  separately licensed software that they have included with MySQL.
16 
17  This program is distributed in the hope that it will be useful,
18  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20  GNU General Public License, version 2.0, for more details.
21 
22  You should have received a copy of the GNU General Public License
23  along with this program; if not, write to the Free Software
24  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */
25 
26 #include <string>
27 #include <vector>
28 
29 #include "my_dbug.h"
30 
31 class Item;
32 class JOIN;
33 class THD;
34 struct TABLE;
35 
36 /**
37  A context for reading through a single table using a chosen access method:
38  index read, scan, etc, use of cache, etc.. It is mostly meant as an interface,
39  but also contains some private member functions that are useful for many
40  implementations, such as error handling.
41 
42  A RowIterator is a simple iterator; you initialize it, and then read one
43  record at a time until Read() returns EOF. A RowIterator can read from
44  other Iterators if you want to, e.g., SortingIterator, which takes in records
45  from another RowIterator and sorts them.
46 
47  The abstraction is not completely tight. In particular, it still leaves some
48  specifics to TABLE, such as which columns to read (the read_set). This means
49  it would probably be hard as-is to e.g. sort a join of two tables.
50 
51  Use by:
52 @code
53  unique_ptr<RowIterator> iterator(new ...);
54  if (iterator->Init())
55  return true;
56  while (iterator->Read() == 0) {
57  ...
58  }
59 @endcode
60  */
61 class RowIterator {
62  public:
63  // NOTE: Iterators should typically be instantiated using NewIterator,
64  // in sql/timing_iterator.h.
66  virtual ~RowIterator() {}
67 
68  /**
69  Initialize or reinitialize the iterator. You must always call Init()
70  before trying a Read() (but Init() does not imply Read()).
71 
72  You can call Init() multiple times; subsequent calls will rewind the
73  iterator (or reposition it, depending on whether the iterator takes in
74  e.g. a TABLE_REF) and allow you to read the records anew.
75  */
76  virtual bool Init() = 0;
77 
78  /**
79  Read a single row. The row data is not actually returned from the function;
80  it is put in the table's (or tables', in case of a join) record buffer, ie.,
81  table->records[0].
82 
83  @retval
84  0 OK
85  @retval
86  -1 End of records
87  @retval
88  1 Error
89  */
90  virtual int Read() = 0;
91 
92  /**
93  Mark the current row buffer as containing a NULL row or not, so that if you
94  read from it and the flag is true, you'll get only NULLs no matter what is
95  actually in the buffer (typically some old leftover row). This is used
96  for outer joins, when an iterator hasn't produced any rows and we need to
97  produce a NULL-complemented row. Init() or Read() won't necessarily
98  reset this flag, so if you ever set is to true, make sure to also set it
99  to false when needed.
100 
101  Note that this can be called without Init() having been called first.
102  For example, NestedLoopIterator can hit EOF immediately on the outer
103  iterator, which means the inner iterator doesn't get an Init() call,
104  but will still forward SetNullRowFlag to both inner and outer iterators.
105 
106  TODO: We shouldn't need this. See the comments on AggregateIterator for
107  a bit more discussion on abstracting out a row interface.
108  */
109  virtual void SetNullRowFlag(bool is_null_row) = 0;
110 
111  // In certain queries, such as SELECT FOR UPDATE, UPDATE or DELETE queries,
112  // reading rows will automatically take locks on them. (This means that the
113  // set of locks taken will depend on whether e.g. the optimizer chose a table
114  // scan or used an index, due to InnoDB's row locking scheme with “gap locks”
115  // for B-trees instead of full predicate locks.)
116  //
117  // However, under some transaction isolation levels (READ COMMITTED or
118  // less strict), it is possible to release such locks if and only if the row
119  // failed a WHERE predicate, as only the returned rows are protected,
120  // not _which_ rows are returned. Thus, if Read() returned a row that you did
121  // not actually use, you should call UnlockRow() afterwards, which allows the
122  // storage engine to release the row lock in such situations.
123  //
124  // TableRowIterator has a default implementation of this; other iterators
125  // should usually either forward the call to their source iterator (if any)
126  // or just ignore it. The right behavior depends on the iterator.
127  virtual void UnlockRow() = 0;
128 
129  struct Child {
131 
132  // Normally blank. If not blank, a heading for this iterator
133  // saying what kind of role it has to the parent if it is not
134  // obvious. E.g., FilterIterator can print iterators that are
135  // children because they come out of subselect conditions.
136  std::string description;
137  };
138 
139  /// List of zero or more iterators which are direct children of this one.
140  /// By convention, if there are multiple ones (ie., we're doing a join),
141  /// the outer iterator is listed first. So for a LEFT JOIN b, we'd list
142  /// a before b.
143  virtual std::vector<Child> children() const { return std::vector<Child>(); }
144 
145  /// Returns a short string (used for EXPLAIN FORMAT=tree) with user-readable
146  /// information for this iterator. When implementing these, try to avoid
147  /// internal jargon (e.g. “eq_ref”); prefer things that read like normal,
148  /// technical English (e.g. “single-row index lookup”).
149  ///
150  /// For certain complex operations, such as MaterializeIterator, there can be
151  /// multiple strings. If so, they are interpreted as nested operations,
152  /// with the outermost, last-done operation first and the other ones indented
153  /// as if they were child iterators.
154  ///
155  /// Callers should use FullDebugString() below, which adds costs
156  /// (see set_estimated_cost() etc.) if present.
157  virtual std::vector<std::string> DebugString() const = 0;
158 
159  virtual std::string TimingString() const {
160  // Valid for TimingIterator only.
161  DBUG_ASSERT(false);
162  return "";
163  }
164 
165  // If this is the root iterator of a join, points back to the join object.
166  // This has one single purpose: EXPLAIN uses it to be able to get the SELECT
167  // list and print out any subselects in it; they are not children of
168  // the iterator per se, but need to be printed with it.
169  //
170  // We could have stored the list of these extra subselect iterators directly
171  // on the iterator (it breaks the abstraction a bit to refer to JOIN here),
172  // but setting a single pointer is cheaper, especially considering that most
173  // queries are not EXPLAIN queries and we don't want the overhead for them.
175 
176  // Should be called by JOIN::create_iterators() only.
178 
179  /**
180  Start performance schema batch mode, if supported (otherwise ignored).
181 
182  PFS batch mode is a mitigation to reduce the overhead of performance schema,
183  typically applied at the innermost table of the entire join. If you start
184  it before scanning the table and then end it afterwards, the entire set
185  of handler calls will be timed only once, as a group, and the costs will
186  be distributed evenly out. This reduces timer overhead.
187 
188  If you start PFS batch mode, you must also take care to end it at the
189  end of the scan, one way or the other. Do note that this is true even
190  if the query ends abruptly (LIMIT is reached, or an error happens).
191  The easiest workaround for this is to simply call EndPSIBatchModeIfStarted()
192  on the root iterator at the end of the scan. See the PFSBatchMode class for
193  a useful helper.
194 
195  The rules for starting batch and ending mode are:
196 
197  1. If you are an iterator with exactly one child (FilterIterator etc.),
198  forward any StartPSIBatchMode() calls to it.
199  2. If you drive an iterator (read rows from it using a for loop
200  or similar), use PFSBatchMode as described above.
201  3. If you have multiple children, ignore the call and do your own
202  handling of batch mode as appropriate. For materialization,
203  #2 would typically apply. For joins, it depends on the join type
204  (e.g., NestedLoopIterator applies batch mode only when scanning
205  the innermost table).
206 
207  The upshot of this is that when scanning a single table, batch mode
208  will typically be activated for that table (since we call
209  StartPSIBatchMode() on the root iterator, and it will trickle all the way
210  down to the table iterator), but for a join, the call will be ignored
211  and the join iterator will activate batch mode by itself as needed.
212  */
213  virtual void StartPSIBatchMode() {}
214 
215  /**
216  Ends performance schema batch mode, if started. It's always safe to
217  call this.
218 
219  Iterators that have children (composite iterators) must forward the
220  EndPSIBatchModeIfStarted() call to every iterator they could conceivably
221  have called StartPSIBatchMode() on. This ensures that after such a call
222  to on the root iterator, all handlers are out of batch mode.
223  */
224  virtual void EndPSIBatchModeIfStarted() {}
225 
226  // The information below is used for EXPLAIN only. We store it on the
227  // iterators, because it corresponds naturally 1:1 to the them.
228  // However, RowIterator is an execution structure, and as such, estimated
229  // costs don't really belong here. When we go to an optimizer that plans
230  // natively using iterators, we should have a class setup where
231  // each execution iterator has a corresponding planning structure
232  // (e.g. TableScanIterator vs. PlannedTableScan), and the costs should move
233  // to the planning structures.
234 
237  }
238  double estimated_cost() const { return m_estimated_cost; }
239 
242  }
243  double expected_rows() const { return m_expected_rows; }
244 
245  /**
246  If this iterator is wrapping a different iterator (e.g. TimingIterator<T>)
247  and you need to down_cast<> to a specific iterator type, this allows getting
248  at the wrapped iterator.
249  */
250  virtual RowIterator *real_iterator() { return this; }
251  virtual const RowIterator *real_iterator() const { return this; }
252 
253  protected:
254  THD *thd() const { return m_thd; }
255 
256  private:
257  THD *const m_thd;
259  double m_estimated_cost = -1.0;
260  double m_expected_rows = -1.0;
261 };
262 
264  public:
266 
267  void UnlockRow() override;
268  void SetNullRowFlag(bool is_null_row) override;
269  void StartPSIBatchMode() override;
270  void EndPSIBatchModeIfStarted() override;
271 
272  protected:
273  int HandleError(int error);
274  void PrintError(int error);
275  TABLE *table() const { return m_table; }
276 
277  private:
278  TABLE *const m_table;
279 
280  friend class AlternativeIterator;
281 };
282 
283 // Return iterator.DebugString(), but with cost and timing information appended
284 // in textual form, if available.
285 std::vector<std::string> FullDebugString(const THD *thd,
286  const RowIterator &iterator);
287 
288 #endif // SQL_ROW_ITERATOR_H_
std::string description
Definition: row_iterator.h:136
double m_estimated_cost
Definition: row_iterator.h:259
virtual void StartPSIBatchMode()
Start performance schema batch mode, if supported (otherwise ignored).
Definition: row_iterator.h:213
int HandleError(int error)
Definition: records.cc:295
Definition: sql_optimizer.h:177
Definition: row_iterator.h:263
virtual ~RowIterator()
Definition: row_iterator.h:66
TABLE *const m_table
Definition: row_iterator.h:278
virtual void UnlockRow()=0
double m_expected_rows
Definition: row_iterator.h:260
virtual RowIterator * real_iterator()
If this iterator is wrapping a different iterator (e.g.
Definition: row_iterator.h:250
A context for reading through a single table using a chosen access method: index read...
Definition: row_iterator.h:61
void SetNullRowFlag(bool is_null_row) override
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
Definition: records.cc:287
virtual std::vector< std::string > DebugString() const =0
Returns a short string (used for EXPLAIN FORMAT=tree) with user-readable information for this iterato...
std::string join(Container cont, const std::string &delim)
join elements of an container into a string seperated by a delimiter.
Definition: string.h:144
RowIterator(THD *thd)
Definition: row_iterator.h:65
THD *const m_thd
Definition: row_iterator.h:257
void PrintError(int error)
Definition: records.cc:310
Definition: table.h:1301
double expected_rows() const
Definition: row_iterator.h:243
#define DBUG_ASSERT(A)
Definition: my_dbug.h:197
virtual const RowIterator * real_iterator() const
Definition: row_iterator.h:251
std::vector< std::string > FullDebugString(const THD *thd, const RowIterator &iterator)
Definition: opt_explain.cc:2000
virtual std::string TimingString() const
Definition: row_iterator.h:159
TABLE * table() const
Definition: row_iterator.h:275
virtual int Read()=0
Read a single row.
JOIN * m_join_for_explain
Definition: row_iterator.h:258
void EndPSIBatchModeIfStarted() override
Ends performance schema batch mode, if started.
Definition: records.cc:318
Definition: item.h:668
virtual void EndPSIBatchModeIfStarted()
Ends performance schema batch mode, if started.
Definition: row_iterator.h:224
TableRowIterator(THD *thd, TABLE *table)
Definition: row_iterator.h:265
void set_estimated_cost(double estimated_cost)
Definition: row_iterator.h:235
void StartPSIBatchMode() override
Start performance schema batch mode, if supported (otherwise ignored).
Definition: records.cc:314
An iterator that switches between another iterator (typically a RefIterator or similar) and a TableSc...
Definition: ref_row_iterators.h:245
RowIterator * iterator
Definition: row_iterator.h:130
THD * thd() const
Definition: row_iterator.h:254
double estimated_cost() const
Definition: row_iterator.h:238
Definition: row_iterator.h:129
void set_join_for_explain(JOIN *join)
Definition: row_iterator.h:177
virtual void SetNullRowFlag(bool is_null_row)=0
Mark the current row buffer as containing a NULL row or not, so that if you read from it and the flag...
JOIN * join_for_explain() const
Definition: row_iterator.h:174
virtual bool Init()=0
Initialize or reinitialize the iterator.
void UnlockRow() override
The default implementation of unlock-row method of RowIterator, used in all access methods except EQR...
Definition: records.cc:285
void set_expected_rows(double expected_rows)
Definition: row_iterator.h:240
virtual std::vector< Child > children() const
List of zero or more iterators which are direct children of this one.
Definition: row_iterator.h:143
For each client connection we create a separate thread with THD serving as a thread/connection descri...
Definition: sql_class.h:778